Is this you? Create Your Porfile

Randy Smith

University of Wisconsin-Madison

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Randy Smith is active.

Explore More

Publication

Featured researches published by Randy Smith.

data and knowledge engineering | 1999

Conceptual-model-based data extraction from multiple-record Web pages

David W. Embley; Douglas M. Campbell; Y. S. Jiang; Stephen W. Liddle; Deryle Lonsdale; Yiu-Kai Ng; Randy Smith

Abstract Electronically available data on the Web is exploding at an ever increasing pace. Much of this data is unstructured, which makes searching hard and traditional database querying impossible. Many Web documents, however, contain an abundance of recognizable constants that together describe the essence of a documents content. For these kinds of data-rich, multiple-record documents (e.g., advertisements, movie reviews, weather reports, travel information, sports summaries, financial statements, obituaries, and many others) we can apply a conceptual-modeling approach to extract and structure data automatically. The approach is based on an ontology – a conceptual model instance – that describes the data of interest, including relationships, lexical appearance, and context keywords. By parsing the ontology, we can automatically produce a database scheme and recognizers for constants and keywords, and then invoke routines to recognize and extract data from unstructured documents and structure it according to the generated database scheme. Experiments show that it is possible to achieve good recall and precision ratios for documents that are rich in recognizable constants and narrow in ontological breadth. Our approach is less labor-intensive than other approaches that manually or semiautomatically generate wrappers, and it is generally insensitive to changes in Web-page format.

acm special interest group on data communication | 2008

Deflating the big bang: fast and scalable deep packet inspection with extended finite automata

Randy Smith; Cristian Estan; Somesh Jha; Shijin Kong

Deep packet inspection is playing an increasingly important role in the design of novel network services. Regular expressions are the language of choice for writing signatures, but standard DFA or NFA representations are unsuitable for high-speed environments, requiring too much memory, too much time, or too much per-flow state. DFAs are fast and can be readily combined, but doing so often leads to state-space explosion. NFAs, while small, require large per-flow state and are slow. We propose a solution that simultaneously addresses all these problems. We start with a first-principles characterization of state-space explosion and give conditions that eliminate it when satisfied. We show how auxiliary variables can be used to transform automata so that they satisfy these conditions, which we codify in a formal model that augments DFAs with auxiliary variables and simple instructions for manipulating them. Building on this model, we present techniques, inspired by principles used in compiler optimization, that systematically reduce runtime and per-flow state. In our experiments, signature sets from Snort and Cisco Systems achieve state-space reductions of over four orders of magnitude, per-flow state reductions of up to a factor of six, and runtimes that approach DFAs.

conference on information and knowledge management | 1998

Ontology-based extraction and structuring of information from data-rich unstructured documents

David W. Embley; Douglas M. Campbell; Randy Smith; Stephen W. Liddle

We present a new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest. Starting with such an ontology, we formulate rules to extract constants and context keywords from unstructured documents. For each unstructured document of interest, we extract its constants and keywords and apply a recognizer to organize extracted constants as attribute values of tuples in a generated database schema. To make our approach general, we fix all the processes and change only the ontological description for a different application domain. In experiments we conducted on two different types of unstructured documents taken from the Web, our approach attained recall ratios in the 80% and 90% range and precision ratios near 98%.

ieee symposium on security and privacy | 2008

XFA: Faster Signature Matching with Extended Automata

Randy Smith; Cristian Estan; Somesh Jha

Automata-based representations and related algorithms have been applied to address several problems in information security, and often the automata had to be augmented with additional information. For example, extended finite-state automata (EFSA) augment finite- state automata (FSA) with variables to track dependencies between arguments of system calls. In this paper, we introduce extended finite automata (XFAs) which augment FSAs with finite scratch memory and instructions to manipulate this memory. Our primary motivation for introducing XFAs is signature matching in Network Intrusion Detection Systems (NIDS). Representing NIDS signatures as deterministic finite-state automata (DFAs) results in very fast signature matching but for several classes of signatures DFAs can blowup in space. Using nondeterministic finite-state automata (NFA) to represent NIDS signatures results in a succinct representation but at the expense of higher time complexity for signature matching. In other words, DFAs are time-efficient but space-inefficient, and NFAs are space- efficient but time-inefficient. In our experiments we have noticed that for a large class of NIDS signatures XFAs have time complexity similar to DFAs and space complexity similar to NFAs. For our test set, XFAs use 10 times less memory than a DFA-based solution, yet achieve 20 times higher matching speeds.

international symposium on performance analysis of systems and software | 2009

Evaluating GPUs for network packet signature matching

Randy Smith; Neelam Goyal; Justin Ormont; Karthikeyan Sankaralingam; Cristian Estan

Modern network devices employ deep packet inspection to enable sophisticated services such as intrusion detection, traffic shaping, and load balancing. At the heart of such services is a signature matching engine that must match packet payloads to multiple signatures at line rates. However, the recent transition to complex regular-expression based signatures coupled with ever-increasing network speeds has rapidly increased the performance requirements of signature matching. Solutions to meet these requirements range from hardwarecentric ASIC/FPGA implementations to software implementations using high-performance microprocessors. In this paper, we propose a programmable signature matching system prototyped on an Nvidia G80 GPU. We first present a detailed architectural and microarchitectural analysis, showing that signature matching is well suited for SIMD processing because of regular control flow and parallelism available at the packet level. Next, we examine two approaches for matching signatures: standard deterministic finite automata (DFAs) and extended finite automata (XFAs), which use far less memory than DFAs but require specialized auxiliary memory and small amounts of computation in most states. We implement a fully functional prototype on the SIMD-based G80 GPU. This system out-performs a Pentium4 by up to 9X and a Niagara-based 32-threaded system by up to 2.3X and shows that GPUs are a promising candidate for signature matching.

international conference on conceptual modeling | 1998

A Conceptual-Modeling Approach to Extracting Data from the Web

David W. Embley; Douglas M. Campbell; Y. S. Jiang; Stephen W. Liddle; Yiu-Kai Ng; Dallan Quass; Randy Smith

Electronically available data on the Web is exploding at an ever increasing pace. Much of this data is unstructured, which makes searching hard and traditional database querying impossible. Many Web documents, however, contain an abundance of recognizable constants that together describe the essence of a document’s content. For these kinds of data-rich documents (e.g., advertisements, movie reviews, weather reports, travel information, sports summaries, financial statements, obituaries, and many others) we can apply a conceptual-modeling approach to extract and structure data. The approach is based on an ontology – a conceptual model instance – that describes the data of interest, including relationships, lexical appearance, and context keywords. By parsing the ontology, we can automatically produce a database scheme and recognizers for constants and keywords, and then invoke routines to recognize and extract data from unstructured documents and structure it according to the generated database scheme. Experiments show that it is possible to achieve good recall and precision ratios for documents that are rich in recognizable constants and narrow in ontological breadth.

annual computer security applications conference | 2006

Backtracking Algorithmic Complexity Attacks against a NIDS

Randy Smith; Cristian Estan; Somesh Jha

Network Intrusion Detection Systems (NIDS) have become crucial to securing modern networks. To be effective, a NIDS must be able to counter evasion attempts and operate at or near wire-speed. Failure to do so allows malicious packets to slip through a NIDS undetected. In this paper, we explore NIDS evasion through algorithmic complexity attacks. We present a highly effective attack against the Snort NIDS, and we provide a practical algorithmic solution that successfully thwarts the attack. This attack exploits the behavior of rule matching, yielding inspection times that are up to 1.5 million times slower than that of benign packets. Our analysis shows that this attack is applicable to many rules in Snort¿s ruleset, rendering vulnerable the thousands of networks protected by it. Our countermeasure confines the inspection time to within one order of magnitude of benign packets. Experimental results using a live system show that an attacker needs only 4.0 kbps of bandwidth to perpetually disable an unmodified NIDS, whereas all intrusions are detected when our countermeasure is used.

international workshop on security | 2008

Efficient signature matching with multiple alphabet compression tables

Shijin Kong; Randy Smith; Cristian Estan

Signature matching is a performance critical operation in intrusion prevention systems. Modern systems express signatures as regular expressions and use Deterministic Finite Automata (DFAs) to efficiently match them against the input. In principle, DFAs can be combined so that all signatures can be examined in a single pass over the input. In practice, however, combining DFAs corresponding to intrusion prevention signatures results in memory requirements that far exceed feasible sizes. We observe for such signatures that distinct input symbols often have identical behavior in the DFA. In these cases, an Alphabet Compression Table (ACT) can be used to map such groups of symbols to a single symbol to reduce the memory requirements. In this paper, we explore the use of multiple alphabet compression tables as a lightweight method for reducing the memory requirements of DFAs. We evaluate this method on signature sets used in Cisco IPS and Snort. Compared to uncompressed DFAs, multiple ACTs achieve memory savings between a factor of 4 and a factor of 70 at the cost of an increase in run time that is typically between 35% and 85%. Compared to another recent compression technique, D2FAs, ACTs are between 2 and 3.5 times faster in software, and in some cases use less than one tenth of the memory used by D2FAs. Overall, for all signature sets and compression methods evaluated, multiple ACTs offer the best memory versus run-time trade-offs.

recent advances in intrusion detection | 2009

Multi-byte Regular Expression Matching with Speculation

Daniel Luchaup; Randy Smith; Cristian Estan; Somesh Jha

Intrusion prevention systems determine whether incoming traffic matches a database of signatures, where each signature in the database represents an attack or a vulnerability. IPSs need to keep up with ever-increasing line speeds, which leads to the use of custom hardware. A major bottleneck that IPSs face is that they scan incoming packets one byte at a time, which limits their throughput and latency. In this paper, we present a method for scanning multiple bytes in parallel using speculation. We break the packet in several chunks, opportunistically scan them in parallel and if the speculation is wrong, correct it later. We present algorithms that apply speculation in single-threaded software running on commodity processors as well as algorithms for parallel hardware. Experimental results show that speculation leads to improvements in latency and throughput in both cases.

Proceedings IEEE Advances in Digital Libraries 2000 | 2000

Copy detection systems for digital documents

Douglas M. Campbell; Wendy R. Chen; Randy Smith

Partial or total duplication of document content is common to large digital libraries. We present a copy detection system to automate the detection of application in digital documents. The system we present is sentence-based and makes three contributions: it proposes an intuitive definition of similarity between documents; it produces the distribution of overlap that exists between overlapping documents; it is resistant to inaccuracy due to large variations in document size. We report the results of several experiments that illustrate the behavior and functionality of the system.

Explore More