Laxmi Parida | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Laxmi Parida is active.

Explore More

Publication

Featured researches published by Laxmi Parida.

Journal in Computer Virology | 2005

Malware Phylogeny Generation using Permutations of Code

Md. Enamul Karim; Andrew Walenstein; Arun Lakhotia; Laxmi Parida

Malicious programs, such as viruses and worms, are frequently related to previous programs through evolutionary relationships. Discovering those relationships and constructing a phylogeny model is expected to be helpful for analyzing new malware and for establishing a principled naming scheme. Matching permutations of code may help build better models in cases where malware evolution does not keep things in the same order. We describe methods for constructing phylogeny models that uses features called n-perms to match possibly permuted codes. An experiment was performed to compare the relative effectiveness of vector similarity measures using n-perms and n-grams when comparing permuted variants of programs. The similarity measures using n-perms maintained a greater separation between the similarity scores of permuted families of specimens versus unrelated specimens. A subsequent study using a tree generated through n-perms suggests that phylogeny models based on n-perms may help forensic analysts investigate new specimens, and assist in reconciling malware naming inconsistenciesAbstraktŠkodlivé programy, jako viry a červy (malware), jsou zřídka psány narychlo, jen tak. Obvykle jsou výsledkem svých evolučních vztahů. Zjištěním těchto vztahů a tvorby v přesné fylogenezi se předpokládá užitečná pomoc v analýze nového malware a ve vytvoření zásad pojmenovacího schématu. Porovnávání permutací kódu uvnitř malware mů že nabídnout výhody pro fylogenní generování, protože evoluční kroky implementované autory malware nemohou uchovat posloupnosti ve sdíleném kódu. Popisujeme rodinu fylogenních generátorů, které provádějí clustering pomocí PQ stromově založených extrakčních vlastností. Byl vykonán experiment v němž výstup stromu z těchto generátorů byl vyhodnocen vzhledem k fylogenezím generovaným pomocí vážených n-gramů. Výsledky ukazují výhody přístupu založeného na permutacích ve fylogenním generování malware.RésuméLes codes malveillants, tels que les virus et les vers, sont rarement écrits de zéro; en conséquence, il existe des relations de nature évolutive entre ces différents codes. Etablir ces relations et construire une phylogénie précise permet d’espérer une meilleure capacité d’analyse de nouveaux codes malveillants et de disposer d’une méthode de fait de nommage de ces codes. La concordance de permutations de code avec des parties de codes malveillants sont susceptibles d’être très intéressante dans l’établissement d’une phylogénie, dans la mesure où les étapes évolutives réalisées par les auteurs de codes malveillants ne conservent généralement pas l’ordre des instructions présentes dans le code commun. Nous décrivons ici une famille de générateurs phylogénétiques réalisant des regroupements à l’aide de caractéristiques extraites d’arbres PQ. Une expérience a été réalisée, dans laquelle l’arbre produit par ces générateurs est évalué d’une part en le comparant avec les classificiations de références utilisées par les antivirus par scannage, et d’autre part en le comparant aux phylogénies produites à l’aide de polygrammes de taille n (n-grammes), pondérés. Les résultats démontrent l’intérêt de l’approche utilisant les permutations dans la génération phylogénétique des codes malveillants.AbstraktiHaitalliset ohjelmat, kuten tietokonevirukset ja -madot, kirjoitetaan harvoin alusta alkaen. Tämän seurauksena niistä on löydettävissä evoluution kaltaista samankaltaisuutta. Samankaltaisuuksien löytämisellä sekä rakentamalla tarkka evoluutioon perustuva malli voidaan helpottaa uusien haitallisten ohjelmien analysointia sekä toteuttaa nimeämiskäytäntöjä. Permutaatioiden etsiminen koodista saattaa antaa etuja evoluutiomallin muodostamiseen, koska haitallisten ohjelmien kirjoittajien evolutionääriset askeleet eivät välttämättä säilytä jaksoittaisuutta ohjelmakoodissa. Kuvaamme joukon evoluutiomallin muodostajia, jotka toteuttavat klusterionnin käyttämällä PQ-puuhun perustuvia ominaisuuksia. Teimme myös kokeen, jossa puun tulosjoukkoa verrattiin virustentorjuntaohjelman muodostamaan viitejoukkoon sekä evoluutiomalleihin, jotka oli muodostettu painotetuilla n-grammeilla. Tulokset viittaavat siihen, että permutaatioon perustuvaa lähestymistapaa voidaan menestyksekkäästi käyttää evoluutiomallien muodostamineen.ZusammenfassungMaliziöse Programme, wie z.B. Viren und Würmer, werden nur in den seltensten Fällen komplett neu geschrieben; als Ergebnis können zwischen verschiedenen maliziösen Codes Abhängigkeiten gefunden werden.Im Hinblick auf Klassifizierung und wissenschaftlichen Aufarbeitung neuer maliziöser Codes kann es sehr hilfreich erweisen, Abhängigkeiten zu bestehenden maliziösen Codes darzulegen und somit einen Stammbaum zu erstellen.In dem Artikel wird u.a. auf moderne Ansätze innerhalb der Staumbaumgenerierung anhand ausgewählter Win32 Viren eingegangen.AstrattoI programmi maligni, quali virus e worm, sono raramente scritti da zero; questo significa che vi sono delle relazioni di evoluzione tra di loro. Scoprire queste relazioni e costruire una filogenia accurata puo’aiutare sia nell’analisi di nuovi programmi di questo tipo, sia per stabilire una nomenclatura avente una base solida. Cercare permutazioni di codice tra vari programmi puo’ dare un vantaggio per la generazione delle filogenie, dal momento che i passaggi evolutivi implementati dagli autori possono non aver preservato la sequenzialita’ del codice originario. In questo articolo descriviamo una famiglia di generatori di filogenie che effettuano clustering usando feature basate su alberi PQ. In un esperimento l’albero di output dei generatori viene confrontato con una classificazione di rifetimento ottenuta da un programma anti-virus, e con delle filogenie generate usando n-grammi pesati. I risultati indicano i risultati positivi dell’approccio basato su permutazioni nella generazione delle filogenie del malware.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1998

Junctions: detection, classification, and reconstruction

Laxmi Parida; Davi Geiger; Robert A. Hummel

Junctions are important features for image analysis and form a critical aspect of image understanding tasks such as object recognition. We present a unified approach to detecting, classifying, and reconstructing junctions in images. Our main contribution is a modeling of the junction which is complex enough to handle all these issues and yet simple enough to admit an effective dynamic programming solution. We use a template deformation framework along with a gradient criterium to detect radial partitions of the template. We use the minimum description length principle to obtain the optimal number of partitions that best describes the junction. The Kona detector presented by Parida et al. (1997) is an implementation of this model. We demonstrate the stability and robustness of the detector by analyzing its behavior in the presence of noise, using synthetic/controlled apparatus. We also present a qualitative study of its behavior on real images.

Genome Biology | 2013

The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

Juan Carlos Motamayor; Keithanne Mockaitis; Jeremy Schmutz; Niina Haiminen; Donald Livingstone; Omar E. Cornejo; Seth D. Findley; Ping Zheng; Filippo Utro; Stefan Royaert; Christopher A. Saski; Jerry Jenkins; Ram Podicheti; Meixia Zhao; Brian E. Scheffler; Joseph C Stack; Frank Alex Feltus; Guiliana Mustiga; Freddy Amores; Wilbert Phillips; Jean Philippe Marelli; Gregory D. May; Howard Shapiro; Jianxin Ma; Carlos Bustamante; Raymond J. Schnell; Dorrie Main; Don Gilbert; Laxmi Parida; David N. Kuhn

BackgroundTheobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders.ResultsWe describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation.ConclusionsWe report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.

Journal of Computational Biology | 2004

Incremental Paradigms of Motif Discovery

Alberto Apostolico; Laxmi Parida

We examine the problem of extracting maximal irredundant motifs from a string. A combinatorial argument poses a linear bound on the total number of such motifs, thereby opening the way to the quest for the fastest and most efficient methods of extraction. The basic paradigm explored here is that of iterated updates of the set of irredundant motifs in a string under consecutive unit symbol extensions of the string itself. This approach exposes novel characterizations for the base set of motifs in a string, hinged on notions of partial order. Such properties support the design of ad hoc data structures and constructs, and lead to develop an O(n(3)) time incremental discovery algorithm.

Proteins | 1999

Dictionary building via unsupervised hierarchical motif discovery in the sequence space of natural proteins.

Isidore Rigoutsos; Aris Floratos; Christos A. Ouzounis; Yuan Gao; Laxmi Parida

Using TEIRESIAS, a pattern discovery method that identifies all motifs present in any given set of protein sequences without requiring alignment or explicit enumeration of the solution space, we have explored the GenPept sequence database and built a dictionary of all sequence patterns with two or more instances. The entries of this dictionary, henceforth named seqlets, cover 98.12% of all amino acid positions in the input database and in essence provide a comprehensive finite set of descriptors for protein sequence space. As such, seqlets can be effectively used to describe almost every naturally occurring protein. In fact, seqlets can be thought of as building blocks of protein molecules that are a necessary (but not sufficient) condition for function or family equivalence memberships. Thus, seqlets can either define conserved family signatures or cut across molecular families and previously undetected sequence signals deriving from functional convergence. Moreover, we show that seqlets also can capture structurally conserved motifs. The availability of a dictionary of seqlets that has been derived in such an unsupervised, hierarchical manner is generating new opportunities for addressing problems that range from reliable classification and the correlation of sequence fragments with functional categories to faster and sensitive engines for homology searches, evolutionary studies, and protein structure prediction. Proteins 1999;37:264–277. ©1999 Wiley‐Liss, Inc.

Journal of Virology | 2003

In Silico Pattern-Based Analysis of the Human Cytomegalovirus Genome

Isidore Rigoutsos; Jiri Novotny; Tien Huynh; Stephen T. Chin-Bow; Laxmi Parida; Daniel E. Platt; David Coleman; Thomas Shenk

ABSTRACT More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/ ).

Computer-aided Design | 1993

Constraint-satisfying planar development of complex surfaces

Laxmi Parida; Sudhir P. Mudur

Abstract Composite laminates are made of multiple layers of fibrous material, each layer being formed by laying tapes of various widths on subregions of the planar development of the surface. The tapes are laid only along a certain direction, thus requiring the cuts and overlaps of the planar development to lie along that direction. The paper presents algorithms that have been implemented to obtain planar developments (within acceptable tolerances) of complex surfaces with cuts and overlaps only in specified orientations. The algorithm is based on the novel approach of first obtaining an approximate planar development, and then reorienting the cracks and overlaps in the plane of development to satisfy the orientation constraint.

Journal of Computational Biology | 2005

Gene Proximity Analysis across Whole Genomes via PQ Trees1

Gad M. Landau; Laxmi Parida; Oren Weimann

Permutations on strings representing gene clusters on genomes have been studied earlier by Uno and Yagiura (2000), Heber and Stoye (2001), Bergeron et al. (2002), Eres et al. (2003), and Schmidt and Stoye (2004) and the idea of a maximal permutation pattern was introduced by Eres et al. (2003). In this paper, we present a new tool for representation and detection of gene clusters in multiple genomes, using PQ trees (Booth and Leuker, 1976): this describes the inner structure and the relations between clusters succinctly, aids in filtering meaningful from apparently meaningless clusters, and also gives a natural and meaningful way of visualizing complex clusters. We identify a minimal consensus PQ tree and prove that it is equivalent to a maximal pi pattern (Eres et al., 2003) and each subgraph of the PQ tree corresponds to a nonmaximal permutation pattern. We present a general scheme to handle multiplicity in permutations and also give a linear time algorithm to construct the minimal consensus PQ tree. Further, we demonstrate the results on whole genome datasets. In our analysis of the whole genomes of human and rat, we found about 1.5 million common gene clusters but only about 500 minimal consensus PQ trees, with E. Coli K-12 and B. Subtilis genomes, we found only about 450 minimal consensus PQ trees out of about 15,000 gene clusters, and when comparing eight different Chloroplast genomes, we found only 77 minimal consensus PQ trees out of about 6,700 gene clusters. Further, we show specific instances of functionally related genes in two of the cases.

Proceedings of the National Academy of Sciences of the United States of America | 2012

Y-chromosome analysis reveals genetic divergence and new founding native lineages in Athapaskan- and Eskimoan-speaking populations

Matthew C. Dulik; Amanda C. Owings; Jill B. Gaieski; Miguel Vilar; Alestine Andre; Crystal Lennie; Mary Adele Mackenzie; Ingrid Kritsch; Sharon Snowshoe; Ruth Wright; James F. Martin; Nancy Gibson; Thomas D. Andrews; Theodore G. Schurr; Syama Adhikarla; Christina J. Adler; Elena Balanovska; Oleg Balanovsky; Jaume Bertranpetit; Andrew C. Clarke; David Comas; Alan Cooper; Clio Der Sarkissian; ArunKumar GaneshPrasad; Wolfgang Haak; Marc Haber; Angela Hobbs; Asif Javed; Li Jin; Matthew E. Kaplan

For decades, the peopling of the Americas has been explored through the analysis of uniparentally inherited genetic systems in Native American populations and the comparison of these genetic data with current linguistic groupings. In northern North America, two language families predominate: Eskimo-Aleut and Na-Dene. Although the genetic evidence from nuclear and mtDNA loci suggest that speakers of these language families share a distinct biological origin, this model has not been examined using data from paternally inherited Y chromosomes. To test this hypothesis and elucidate the migration histories of Eskimoan- and Athapaskan-speaking populations, we analyzed Y-chromosomal data from Inuvialuit, Gwich’in, and Tłįchǫ populations living in the Northwest Territories of Canada. Over 100 biallelic markers and 19 chromosome short tandem repeats (STRs) were genotyped to produce a high-resolution dataset of Y chromosomes from these groups. Among these markers is an SNP discovered in the Inuvialuit that differentiates them from other Aboriginal and Native American populations. The data suggest that Canadian Eskimoan- and Athapaskan-speaking populations are genetically distinct from one another and that the formation of these groups was the result of two population expansions that occurred after the initial movement of people into the Americas. In addition, the population history of Athapaskan speakers is complex, with the Tłįchǫ being distinct from other Athapaskan groups. The high-resolution biallelic data also make clear that Y-chromosomal diversity among the first Native Americans was greater than previously recognized.

Journal of Computational Biology | 2004

Permutation Pattern Discovery in Biosequences

Revital Eres; Gad M. Landau; Laxmi Parida

Functionally related genes often appear in each others neighborhood on the genome; however, the order of the genes may not be the same. These groups or clusters of genes may have an ancient evolutionary origin or may signify some other critical phenomenon and may also aid in function prediction of genes. Such gene clusters also aid toward solving the problem of local alignment of genes. Similarly, clusters of protein domains, albeit appearing in different orders in the protein sequence, suggest common functionality in spite of being nonhomologous. In the paper, we address the problem of automatically discovering clusters of entities, be they genes or domains: we formalize the abstract problem as a discovery problem called the (pi)pattern problem and give an algorithm that automatically discovers the clusters of patterns in multiple data sequences. We take a model-less approach and introduce a notation for maximal patterns that drastically reduces the number of valid cluster patterns, without any loss of information, We demonstrate the automatic pattern discovery tool on motifs on E. Coli protein sequences.

Explore More