Anil Francis Thomas
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anil Francis Thomas.
international conference on acoustics, speech, and signal processing | 2015
Razvan Pascanu; Jack W. Stokes; Hermineh Sanossian; Mady Marinescu; Anil Francis Thomas
Attackers often create systems that automatically rewrite and reorder their malware to avoid detection. Typical machine learning approaches, which learn a classifier based on a handcrafted feature vector, are not sufficiently robust to such reorderings. We propose a different approach, which, similar to natural language modeling, learns the language of malware spoken through the executed instructions and extracts robust, time domain features. Echo state networks (ESNs) and recurrent neural networks (RNNs) are used for the projection stage that extracts the features. These models are trained in an unsupervised fashion. A standard classifier uses these features to detect malicious files. We explore a few variants of ESNs and RNNs for the projection stage, including Max-Pooling and Half-Frame models which we propose. The best performing hybrid model uses an ESN for the recurrent model, Max-Pooling for non-linear sampling, and logistic regression for the final classification. Compared to the standard trigram of events model, it improves the true positive rate by 98.3% at a false positive rate of 0.1%.
international conference on detection of intrusions and malware and vulnerability assessment | 2012
Nikos Karampatziakis; Jack W. Stokes; Anil Francis Thomas; Mady Marinescu
Typical malware classification methods analyze unknown files in isolation. However, this ignores valuable relationships between malware files, such as containment in a zip archive, dropping, or downloading. We present a new malware classification system based on a graph induced by file relationships, and, as a proof of concept, analyze containment relationships, for which we have much available data. However our methodology is general, relying only on an initial estimate for some of the files in our data and on propagating information along the edges of the graph. It can thus be applied to other types of file relationships. We show that since malicious files are often included in multiple malware containers, the systems detection accuracy can be significantly improved, particularly at low false positive rates which are the main operating points for automated malware classifiers. For example at a false positive rate of 0.2%, the false negative rate decreases from 42.1% to 15.2%. Finally, the new system is highly scalable; our basic implementation can learn good classifiers from a large, bipartite graph including over 719 thousand containers and 3.4 million files in a total of 16 minutes.
european symposium on research in computer security | 2012
Jack W. Stokes; John Platt; Helen J. Wang; Joe Faulhaber; Jonathan M. Keller; Mady Marinescu; Anil Francis Thomas; Marius Gheorghe Gheorghescu
Industry reports and blogs have estimated the amount of malware based on known malicious files. This paper extends this analysis to the amount of unknown malware. The study is based on 26.7 million files referenced in telemetry reports from 50 million computers running commercial anti-malware (AM) products. To estimate the undetected malware, a classifier predicts the underlying nature of unknown files recorded in the telemetry reports. The telemetry classifier predicts that 69.6% (4.27 million) of the unknown files are malicious. Assuming the unknown files predicted to be malicious by the classifier are malware, the telemetry classifier also allows us to estimate the efficacy of the AM system indicating that signatures detected 82.8% (20.6 million) of the malicious files. We have validated our system by conducting a longitudinal study to measure the false positive and false negative rates over a period of thirteen months.
military communications conference | 2017
Shayok Chakraborty; Jack W. Stokes; Lin Xiao; Dengyong Zhou; Mady Marinescu; Anil Francis Thomas
Despite widespread use of commercial anti-virus products, the number of malicious files detected on home and corporate computers continues to increase at a significant rate. Recently, anti-virus companies have started investing in machine learning solutions to augment signatures manually designed by analysts. A malicious files determination is often represented as a hierarchical structure consisting of a type (e.g. Worm, Backdoor), a platform (e.g. Win32, Win64), a family (e.g. Rbot, Rugrat) and a family variant (e.g. A, B). While there has been substantial research in automated malware classification, the aforementioned hierarchical structure, which can provide additional information to the classification models, has been ignored. In this paper, we propose the novel idea and study the performance of employing hierarchical learning algorithms for automated classification of malicious files. To the best of our knowledge, this is the first research effort which incorporates the hierarchical structure of the malware label in its automated classification and in the security domain, in general. It is important to note that our method does not require any additional effort by analysts because they typically assign these hierarchical labels today. Our empirical results on a real world, industrial-scale malware dataset of 3.6 million files demonstrate that incorporation of the label hierarchy achieves a significant reduction of 33.1% in the binary error rate as compared to a non-hierarchical classifier which is traditionally used in such problems.
Archive | 2008
Anil Francis Thomas; George Cristian Chicioreanu; Adrian M. Marinescu
Archive | 2003
Brijesh S. Krishnaswami; Anil Francis Thomas; Avronil Bhattacharjee; Gregory Irving Thiel; John Charles Delo; Kanwaljit S. Marok; Santanu Chakraboty; Justin Yoo Kwak
Archive | 2005
Anil Francis Thomas; Michael Kramer; Scott A. Field
Archive | 2006
Anil Francis Thomas; Michael Kramer; Mihai Costea; Efim Hudis; Pradeep Bahl; Rajesh K. Dadhia; Yigal Edery
Archive | 2004
Mihai Costea; David A. Goebel; Adrian M. Marinescu; Anil Francis Thomas
Archive | 2005
Mihai Costea; Adrian M. Marinescu; Anil Francis Thomas; Gheorghe Marius Gheorghescu; Kyle A. Larsen; Vadim N. Bluvstein