Shai Fine | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shai Fine is active.

Explore More

Publication

Featured researches published by Shai Fine.

Machine Learning | 1998

The Hierarchical Hidden Markov Model: Analysis and Applications

Shai Fine; Yoram Singer; Naftali Tishby

We introduce, analyze and demonstrate a recursive hierarchical generalization of the widely used hidden Markov models, which we name Hierarchical Hidden Markov Models (HHMM). Our model is motivated by the complex multi-scale structure which appears in many natural sequences, particularly in language, handwriting and speech. We seek a systematic unsupervised approach to the modeling of such structures. By extending the standard Baum-Welch (forward-backward) algorithm, we derive an efficient procedure for estimating the model parameters from unlabeled data. We then use the trained model for automatic hierarchical parsing of observation sequences. We describe two applications of our model and its parameter estimation procedure. In the first application we show how to construct hierarchical models of natural English text. In these models different levels of the hierarchy correspond to structures on different length scales in the text. In the second application we demonstrate how HHMMs can be used to automatically identify repeated strokes that represent combination of letters in cursive handwriting.

international acm sigir conference on research and development in information retrieval | 2005

Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval

Elad Yom-Tov; Shai Fine; David Carmel; Adam Darlow

In this article we present novel learning methods for estimating the quality of results returned by a search engine in response to a query. Estimation is based on the agreement between the top results of the full query and the top results of its sub-queries. We demonstrate the usefulness of quality estimation for several applications, among them improvement of retrieval, detecting queries for which no relevant content exists in the document collection, and distributed information retrieval. Experiments on TREC data demonstrate the robustness and the effectiveness of our learning algorithms.

design automation conference | 2003

Coverage directed test generation for functional verification using Bayesian networks

Shai Fine; Avi Ziv

Functional verification is widely acknowledged as the bottleneck in the hardware design cycle. This paper addresses one of the main challenges of simulation based verification (or dynamic verification), by providing a new approach for coverage directed test generation (CDG). This approach is based on Bayesian networks and computer learning techniques. It provides an efficient way for closing a feedback loop from the coverage domain back to a generator that produces new stimuli to the tested design. In this paper, we show how to apply Bayesian networks to the CDG problem. Applying Bayesian networks to the CDG framework has been tested in several experiments, exhibiting encouraging results and indicating that the suggested approach can be used to achieve CDG goals.

international conference on acoustics, speech, and signal processing | 2001

A hybrid GMM/SVM approach to speaker identification

Shai Fine; Jiri Navratil; Ramesh A. Gopinath

Proposes a classification scheme that incorporates statistical models and support vector machines. A hybrid system which appropriately combines the advantages of both the generative and discriminant model paradigms is described and experimentally evaluated on a text-independent speaker recognition task in matched and mismatched training and test conditions. Our results prove that the combination is beneficial in terms of performance and practical in terms of computation. We report relative improvements of up to 25% reduction in identification error rate compared to the baseline statistical model.

design automation conference | 2004

Probabilistic regression suites for functional verification

Shai Fine; Shmuel Ur; Avi Ziv

Random test generators are often used to create regression suites on-the-fly. Regression suites are commonly generated by choosing several specifications and generating a number of tests from each one, without reasoning which specification should he used and how many tests should he generated from each specification. This paper describes a technique for building high quality random regression suites. The proposed technique uses information about the probablity of each test specification covering each coverage task. This probability is used, in tun, to determine which test specifications should be included in the regression suite and how many tests should, be generated from each specification. Experimental results show that this practical technique can he used to improve the quality, and reduce the cost, of regression suites. Moreover, it enables better informed decisions regarding the size and distribution of the regression suites, and the risk involved.

Theoretical Computer Science | 2002

Query by committee, linear separation and random walks

Shai Fine; Ran Gilad-Bachrach; Eli Shamir

A long-standing goal in the realm of Machine Learning is to minimize sample-complexity, i.e. to reduce as much as possible the number of examples used in the course of learning. The Active Learning paradigm is one such method aimed at achieving this goal by transforming the learner from a passive participant in the information gathering process to an active one. Vaguely speaking, the learner tries to minimize the number of labeled instances used in the course of learning, relaying also on unlabelled instances in order to acquire the needed information whenever possible. The reasoning comes from many real-life problems where the teachers activity is an expensive resource (e.g. text categorization, part of speech tagging). The Query By Committee (QBC) (Seung et al., Query by committee, Proceedings of the Fifth Workshop on Computational Learning theory, Morgan Kaufman, San Mateo, CA, 1992, pp. 287-294) is an Active Learning algorithm acting in the Bayesian model of concept learning (Haussler et al., Mach. Learning 14 (1994) 83), i.e. it assumes that the concept to be learned is chosen according to some fixed and known distribution. Trying to apply the QBC algorithm for learning the class of linear separators, one faces the problem of implementing the mechanism of sampling hypotheses (the Gibbs oracle). The major problem is computational-complexity, since the straightforward Monte Carlo method takes exponential time. In this paper we address the problems involved in the implementation of such a mechanism. We show how to convert them to questions about sampling from convex bodies or approximating the volume of such bodies. Similar problems have recently been solved in the field of computational geometry based on random walks. These techniques enable us to device efficient implementations of the QBC algorithm. We also give few improvements and corrections to the QBC algorithm, the most important one is dropping the Bayes assumption when the concept classes possess a sort of symmetry property (which holds for linear separators). We draw attention to a useful geometric lemma which bounds the maximal radius of a ball contained in a convex body. Finally, this paper exhibits a connection between random walks and certain Machine Learning notions such as e-net and support vector machines.

high level design validation and test | 2005

Harnessing machine learning to improve the success rate of stimuli generation

Shai Fine; Ari Freund; Itai Jaeger; Yishay Mansour; Yehuda Naveh; Avi Ziv

The initial state of a design under verification has a major impact on the ability of stimuli generators to successfully generate the requested stimuli. For complexity reasons, most stimuli generators use sequential solutions without planning ahead. Therefore, in many cases, they fail to produce a consistent stimuli due to an inadequate selection of the initial state. We propose a new method, based on machine learning techniques, to improve generation success by learning the relationship between the initial state vector and generation success. We applied the proposed method in two different settings, with the objective of improving generation success and coverage in processor and system level generation. In both settings, the proposed method significantly reduced generation failures and enabled faster coverage

EURASIP Journal on Advances in Signal Processing | 2003

Discriminative feature selection via multiclass variable memory Markov model

Noam Slonim; Gill Bejerano; Shai Fine; Naftali Tishby

We propose a novel feature selection method based on a variable memory Markov (VMM) model. The VMM was originally proposed as a generative model trying to preserve the original source statistics from training data. We extend this technique to simultaneously handle several sources, and further apply a new criterion to prune out nondiscriminative features out of the model. This results in a multiclass discriminative VMM (DVMM), which is highly efficient, scaling linearly with data size. Moreover, we suggest a natural scheme to sort the remaining features based on their discriminative power with respect to the sources at hand. We demonstrate the utility of our method for text and protein classification tasks.

symposium on theoretical aspects of computer science | 2006

Combining multiple heuristics

Tzur Sayag; Shai Fine; Yishay Mansour

In this work we introduce and study the question of combining multiple heuristics. Given a problem instance, each of the multiple heuristics is capable of computing the correct solution, but has a different cost. In our models the user executes multiple heuristics until one of them terminates with a solution. Given a set of problem instances, we show how to efficiently compute an optimal fixed schedule for a constant number of heuristics, and show that in general, the problem is computationally hard even to approximate (to within a constant factor). We also discuss a probabilistic configuration, in which the problem instances are drawn from some unknown fixed distribution, and show how to compute a near optimal schedule for this setup.

international conference on acoustics, speech, and signal processing | 2002

Digit recognition in noisy environments via a sequential GMM/SVM system

Shai Fine; George Saon; Ramesh A. Gopinath

This paper exploits the fact that when GMM and SVM classifiers with roughly the same level of performance exhibit uncorrelated errors they can be combined to produce a better classifier. The gain accrues from combining the descriptive strength of GMM models with the discriminative power of SVM classifiers. This idea, first exploited in the context of speaker recognition [1, 2], is applied to speech recognition - specifically to a digit recognition task in a noisy environment - with significant gains in performance.

Explore More