Is this you? Create Your Porfile

Andrew Walenstein

University of Louisiana at Lafayette

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrew Walenstein is active.

Explore More

Publication

Featured researches published by Andrew Walenstein.

Journal in Computer Virology | 2005

Malware Phylogeny Generation using Permutations of Code

Md. Enamul Karim; Andrew Walenstein; Arun Lakhotia; Laxmi Parida

Malicious programs, such as viruses and worms, are frequently related to previous programs through evolutionary relationships. Discovering those relationships and constructing a phylogeny model is expected to be helpful for analyzing new malware and for establishing a principled naming scheme. Matching permutations of code may help build better models in cases where malware evolution does not keep things in the same order. We describe methods for constructing phylogeny models that uses features called n-perms to match possibly permuted codes. An experiment was performed to compare the relative effectiveness of vector similarity measures using n-perms and n-grams when comparing permuted variants of programs. The similarity measures using n-perms maintained a greater separation between the similarity scores of permuted families of specimens versus unrelated specimens. A subsequent study using a tree generated through n-perms suggests that phylogeny models based on n-perms may help forensic analysts investigate new specimens, and assist in reconciling malware naming inconsistenciesAbstraktŠkodlivé programy, jako viry a červy (malware), jsou zřídka psány narychlo, jen tak. Obvykle jsou výsledkem svých evolučních vztahů. Zjištěním těchto vztahů a tvorby v přesné fylogenezi se předpokládá užitečná pomoc v analýze nového malware a ve vytvoření zásad pojmenovacího schématu. Porovnávání permutací kódu uvnitř malware mů že nabídnout výhody pro fylogenní generování, protože evoluční kroky implementované autory malware nemohou uchovat posloupnosti ve sdíleném kódu. Popisujeme rodinu fylogenních generátorů, které provádějí clustering pomocí PQ stromově založených extrakčních vlastností. Byl vykonán experiment v němž výstup stromu z těchto generátorů byl vyhodnocen vzhledem k fylogenezím generovaným pomocí vážených n-gramů. Výsledky ukazují výhody přístupu založeného na permutacích ve fylogenním generování malware.RésuméLes codes malveillants, tels que les virus et les vers, sont rarement écrits de zéro; en conséquence, il existe des relations de nature évolutive entre ces différents codes. Etablir ces relations et construire une phylogénie précise permet d’espérer une meilleure capacité d’analyse de nouveaux codes malveillants et de disposer d’une méthode de fait de nommage de ces codes. La concordance de permutations de code avec des parties de codes malveillants sont susceptibles d’être très intéressante dans l’établissement d’une phylogénie, dans la mesure où les étapes évolutives réalisées par les auteurs de codes malveillants ne conservent généralement pas l’ordre des instructions présentes dans le code commun. Nous décrivons ici une famille de générateurs phylogénétiques réalisant des regroupements à l’aide de caractéristiques extraites d’arbres PQ. Une expérience a été réalisée, dans laquelle l’arbre produit par ces générateurs est évalué d’une part en le comparant avec les classificiations de références utilisées par les antivirus par scannage, et d’autre part en le comparant aux phylogénies produites à l’aide de polygrammes de taille n (n-grammes), pondérés. Les résultats démontrent l’intérêt de l’approche utilisant les permutations dans la génération phylogénétique des codes malveillants.AbstraktiHaitalliset ohjelmat, kuten tietokonevirukset ja -madot, kirjoitetaan harvoin alusta alkaen. Tämän seurauksena niistä on löydettävissä evoluution kaltaista samankaltaisuutta. Samankaltaisuuksien löytämisellä sekä rakentamalla tarkka evoluutioon perustuva malli voidaan helpottaa uusien haitallisten ohjelmien analysointia sekä toteuttaa nimeämiskäytäntöjä. Permutaatioiden etsiminen koodista saattaa antaa etuja evoluutiomallin muodostamiseen, koska haitallisten ohjelmien kirjoittajien evolutionääriset askeleet eivät välttämättä säilytä jaksoittaisuutta ohjelmakoodissa. Kuvaamme joukon evoluutiomallin muodostajia, jotka toteuttavat klusterionnin käyttämällä PQ-puuhun perustuvia ominaisuuksia. Teimme myös kokeen, jossa puun tulosjoukkoa verrattiin virustentorjuntaohjelman muodostamaan viitejoukkoon sekä evoluutiomalleihin, jotka oli muodostettu painotetuilla n-grammeilla. Tulokset viittaavat siihen, että permutaatioon perustuvaa lähestymistapaa voidaan menestyksekkäästi käyttää evoluutiomallien muodostamineen.ZusammenfassungMaliziöse Programme, wie z.B. Viren und Würmer, werden nur in den seltensten Fällen komplett neu geschrieben; als Ergebnis können zwischen verschiedenen maliziösen Codes Abhängigkeiten gefunden werden.Im Hinblick auf Klassifizierung und wissenschaftlichen Aufarbeitung neuer maliziöser Codes kann es sehr hilfreich erweisen, Abhängigkeiten zu bestehenden maliziösen Codes darzulegen und somit einen Stammbaum zu erstellen.In dem Artikel wird u.a. auf moderne Ansätze innerhalb der Staumbaumgenerierung anhand ausgewählter Win32 Viren eingegangen.AstrattoI programmi maligni, quali virus e worm, sono raramente scritti da zero; questo significa che vi sono delle relazioni di evoluzione tra di loro. Scoprire queste relazioni e costruire una filogenia accurata puo’aiutare sia nell’analisi di nuovi programmi di questo tipo, sia per stabilire una nomenclatura avente una base solida. Cercare permutazioni di codice tra vari programmi puo’ dare un vantaggio per la generazione delle filogenie, dal momento che i passaggi evolutivi implementati dagli autori possono non aver preservato la sequenzialita’ del codice originario. In questo articolo descriviamo una famiglia di generatori di filogenie che effettuano clustering usando feature basate su alberi PQ. In un esperimento l’albero di output dei generatori viene confrontato con una classificazione di rifetimento ottenuta da un programma anti-virus, e con delle filogenie generate usando n-grammi pesati. I risultati indicano i risultati positivi dell’approccio basato su permutazioni nella generazione delle filogenie del malware.

working conference on reverse engineering | 2000

Reverse engineering tools as media for imperfect knowledge

Jens H. Jahnke; Andrew Walenstein

Reverse engineering is an imperfect process driven by imperfect knowledge. Most current reverse engineering tools do not adequately consider these inherent characteristics. They focus an representing precise, complete and consistent knowledge and work towards enforcing predefined structures on the processes. According to our experience, this design paradigm seriously limits human-centred reverse engineering tools. An altogether different approach is to directly support the statement and subsequent resolution of imperfections. Doing so requires the imperfect knowledge be represented and imperfect procedures accommodated for. We argue that effective tools need to act as a manipulable medium for imperfect knowledge and, based on our experiences with a prototype, elaborate requirements for such tools.

workshop on program comprehension | 2003

Towards a clone detection benchmark suite and results archive

Arun Lakhotia; Junwei Li; Andrew Walenstein; Yun Yang

Source code clones are copies or near-copies of other portions of code, often created by copying and pasting portions of source code. This working session is concerned with building a communal research infrastructure for clone detection. The intention of this working session is to try to build a consensus on how to continue to build a benchmark suite and results archive for clone- and source comparison-related research and development. The working session is structured to foster discussion and debates over what should be collected in the archive, and how best to do it.

Journal in Computer Virology | 2009

Evaluation of malware phylogeny modelling systems using automated variant generation

Matthew Hayes; Andrew Walenstein; Arun Lakhotia

A malware phylogeny model is an estimation of the derivation relationships between a set of malware samples. Systems that construct phylogeny models are expected to be useful for malware analysts. While several such systems have been proposed, little is known about the consistency of their results on different data sets, about their generalizability across different types of malware evolution. This paper explores these issues using two artificial malware history generators: systems that simulate malware evolution according to different evolution models. A quantitative study was conducted using two phylogeny model construction systems and multiple samples of artificial evolution. High variability was found in the quality of their results on different data sets, and the systems were shown to be sensitive to the characteristics of evolution in the data sets. The results call into question the adequacy of evaluations typical in the field, raise pragmatic concerns about tool choice for malware analysts, and underscore the important role that model-based simulation is expected to play in evaluating and selecting suitable malware phylogeny construction systems.

workshop on program comprehension | 2003

Observing and measuring cognitive support: steps toward systematic tool evaluation and engineering

Andrew Walenstein

A key desideratum for many software comprehension tools is to reduce the mental burdens of software engineers. That is, the tools should support cognition. This key benefit is difficult to directly observe and measure, so evaluating such tools has been problematic. This paper describes an investigation into the application of distributed cognition theories to analyzing and observing cognitive support. Theories of cognitive support are used to generate an analysis of potential cognitive benefits provided by the compilation-error tracking facilities of a commercial software development environment. This analysis is used to generate a scheme for coding user observations such that cognitive support related activity can be tracked. Experiences in applying the technique on data from a field study are reported. The study also serves to provide a glimpse into the ways that programmers and tools cooperate. Implications are drawn for future practices of tool evaluation and engineering.

Journal of Computer Virology and Hacking Techniques | 2013

VILO: a rapid learning nearest-neighbor classifier for malware triage

Arun Lakhotia; Andrew Walenstein; Craig Miles; Anshuman Singh

VILO is a lazy learner system designed for malware classification and triage. It implements a nearest neighbor (NN) algorithm with similarities computed over Term Frequency

international conference on malicious and unwanted software | 2010

Header information in malware families and impact on automated classifiers

Andrew Walenstein; Daniel J. Hefner; Jeffery Wichers

Journal in Computer Virology | 2008

Constructing malware normalizers using term rewriting

Andrew Walenstein; Rachit Mathur; Mohamed R. Chouchane; Arun Lakhotia

\times

international conference on malicious and unwanted software | 2012

A transformation-based model of malware derivation

Andrew Walenstein; Arun Lakhotia

international conference on malicious and unwanted software | 2008

Using Markov chains to filter machine-morphed variants of malicious programs

Mohamed R. Chouchane; Andrew Walenstein; Arun Lakhotia

Inverse Document Frequency (TFIDF) weighted opcode mnemonic permutation features (N-perms). Being an NN-classifier, VILO makes minimal structural assumptions about class boundaries, and thus is well suited for the constantly changing malware population. This paper presents an extensive study of application of VILO in malware analysis. Our experiments demonstrate that (a) VILO is a rapid learner of malware families, i.e., VILO’s learning curve stabilizes at high accuracies quickly (training on less than 20 variants per family is sufficient); (b) similarity scores derived from TDIDF weighted features should primarily be treated as ordinal measurements; and (c) VILO with N-perm feature vectors outperforms traditional N-gram feature vectors when used to classify real-world malware into their respective families.

Explore More