Joel Ratsaby | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joel Ratsaby is active.

Explore More

Publication

Featured researches published by Joel Ratsaby.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1998

Incremental learning with sample queries

Joel Ratsaby

The classical theory of pattern recognition assumes labeled examples appear according to unknown underlying class conditional probability distributions where the pattern classes are picked randomly in a passive manner according to their a priori probabilities. This paper presents experimental results for an incremental nearest-neighbor learning algorithm which actively selects samples from different pattern classes according to a querying rule as opposed to the a priori probabilities. The amount of improvement of this query-based approach over the passive batch approach depends on the complexity of the Bayes rule.

Discrete Applied Mathematics | 1998

The degree of approximation of sets in Euclidean space using sets with bounded Vapnik-Chervonenkis dimension

Vitaly Maiorov; Joel Ratsaby

Abstract The degree of approximation of infinite-dimensional function classes using finite n-dimensional manifolds has been the subject of a classical field of study in the area of mathematical approximation theory. In Ratsaby and Maiorov (1997), a new quantity ρn(F, Lq) which measures the degree of approximation of a function class F by the best manifold Hn of pseudo-dimension less than or equal to n in the Lq-metric has been introduced. For sets F ⊂ R m it is defined as ρn(F, lmq) = infHn dist(F, Hn), where dist(F, Hn) = supxϵF infyϵHn∥x−y ∥lmq and H n ⊂ R m is any set of VC-dimension less than or equal to n where n H n ⊂ R m of VC-dimension less than or equal to n in the lmq-metric. In this paper we compute ρn(F, lmq) for F being the unit ball B m p = {x ϵ R m : ∥x∥ l m p ⩽ 1} for any 1 ⩽ p, q ⩽ ∞, and for F being any subset of the boolean m-cube of size larger than 2mγ, for any 1 2 .

Journal of Complexity | 1997

On the Value of Partial Information for Learning from Examples

Joel Ratsaby; Vitaly Maiorov

The PAC model of learning and its extension to real valued function classes provides a well-accepted theoretical framework for representing the problem of learning a target functiong(x) using a random sample {(xi,g(xi))}i=1m. Based on the uniform strong law of large numbers the PAC model establishes the sample complexity, i.e., the sample sizemwhich is sufficient for accurately estimating the target function to within high confidence. Often, in addition to a random sample, some form of prior knowledge is available about the target. It is intuitive that increasing the amount of information should have the same effect on the error as increasing the sample size. But quantitatively how does the rate of error with respect to increasing information compare to the rate of error with increasing sample size? To answer this we consider a new approach based on a combination of information-based complexity of Traubet al.and Vapnik?Chervonenkis (VC) theory. In contrast to VC-theory where function classes of finite pseudo-dimension are used only for statistical-based estimation, we let such classes play a dual role of functional estimation as well as approximation. This is captured in a newly introduced quantity, ?d(F), which represents a nonlinear width of a function class F. We then extend the notion of thenth minimal radius of information and define a quantityIn,d(F) which measures the minimal approximation error of the worst-case targetg? F by the family of function classes having pseudo-dimensiondgiven partial information ongconsisting of values taken bynlinear operators. The error rates are calculated which leads to a quantitative notion of the value of partial information for the paradigm of learning from examples.

Entropy | 2008

An Algorithmic Complexity Interpretation of Lin's Third Law of Information Theory

Joel Ratsaby

Instead of static entropy we assert that the Kolmogorov complexity of a static structure such as a solid is the proper measure of disorder (or chaoticity). A static struc- ture in a surrounding perfectly-random universe acts as an interfering entity which introduces local disruption in randomness. This is modeled by a selection rule R which selects a subse- quence of the random input sequence that hits the structure. Through the inequality that relates stochasticity and chaoticity of random binary sequences we maintain that Lins notion of sta- bility corresponds to the stability of the frequency of 1s in the selected subsequence. This explains why more complex static structures are less stable. Lins third law is represented as the inevitable change that static structure undergo towards conforming to the universes perfect randomness.

conference on learning theory | 1996

Towards robust model selection using estimation and approximation error bounds

Joel Ratsaby; Ron Meir; Vitaly Maiorov

One of the main problems in machine learning and statistical inference is selecting an appropriate model by which a set of data can be explained. In the absense of any structured prior information aa to the data generating mechanism, one is often forced to consider a range of models, attempting to select the model which best explains the data, based on some quality criterion. While there have been many proposals for various criteria for model selection, most of these approaches suffer from some form of bias built into the construction of the criterion. Moreover, many of the standard methods are only guaranteed to work well asymptotically, leaving their behavior in the face of a finite amount of data completely unknown. In this paper we extend on previous work [17] and introduce a novel model selection criterion, based on combining two recent chains of thought. In particular we make use of the powerful framework of uniform convergence of empirical processes pioneered by Vapnik and Chernovenkins [23], combined with recent results concerning the approximation ability of non-linear manifolds of functions, focusing in particular on feedforward neural networks. The main contributions of this work are twofold: (i) Conceptual elucidating a coherent and robust framework for model selection, (ii) Technical the main contribution here is a lower bound on the approximation error (Theorem 10), which holds in a well specified sense for most functions of interest. As far as we are aware, this result is new in the field of function approximation.

ieee convention of electrical and electronics engineers in israel | 2012

Universal distance measure for images

Uzi Chester; Joel Ratsaby

We introduce an algorithm for measuring the distance between two images based on computing the complexity of two strings of characters that encode the images. Given a pair of images, our algorithm transforms each one into a text-based sequence (strings) of characters. For each string, it computes the LZ-complexity and then uses the string-distance measure of [1] to obtain a distance value between the images. The main advantages of our algorithm are that it is universal, that is, it neither needs nor assumes any spatial or spectral information about the images, it can measure the distance between two images of different sizes, it works for black and white, grayscale and color images, and it can be implemented efficiently on an embedded computer system. We present successful experimental results on clustering images of different sizes into categories based on their similarities as measured by our algorithm.

convention of electrical and electronics engineers in israel | 2010

An FPGA-based pattern classifier using data compression

Joel Ratsaby; Denis Zavielov

We implement a text-classification engine on a single FPGA chip running on a 50 Mhz clock. It is based on arithmetic coding data compression. The text classifier is based on the non-parametric nearest-neighbor algorithm. It computes a compression-based distance between two text files. We have devised a parallel hardware architecture for the computation of the tag-interval that encodes the data sequence in arithmetic coding. This architecture achieves a large speedup factor. Even with a relatively slow 50 Mhz clock the hardware solution performs 26 times faster than a software-based implementation of this classifier in C++ on a Pentium® D CPU running on a 3 Ghz clock. There are many applications where such a hardware-based classifier is an advantage not only because of its high speed of execution but because it can be embedded as a single chip into small special-purpose systems with limited computational resources. For instance, on a communication board (passively monitoring network traffic and classifying anomalous patterns), on a CCTV camera (classifying abnormal behavior for homeland security), on a satellite to do real-time classification of high resolution images and on a small-scale weapon that requires real-time target classification. Since we use a universal-distance computed by data compression once a corpus of labeled texts is uploaded onto the chip there is no need for any feature extraction or machine learning.

european conference on computational learning theory | 1997

Generalization of the PAC-Model for Learning with Partial Information

Joel Ratsaby; Vitaly Maiorov

The PAC model of learning and its extension to real valued function classes provides a well accepted theoretical framework for representing the problem of machine learning using randomly drawn examples. Quite often in practice some form of a priori partial information about the target is available in addition to randomly drawn examples. In this paper we extend the PAC model to a scenario of learning with partial information in addition to randomly drawn examples. According to this model partial information effectively reduces the complexity of the hypothesis class used to learn the target thereby reducing the sample complexity of the learning problem. This leads to a clear quantitative tradeoff between the amount of partial information and the sample complexity of the problem. The underlying framework is based on a combination of information-based complexity theory (cf. Traub et. al. [18]) and Vapnik-Chervonenkis theory. A new quantity In,d(F) which plays an important role in determining the worth of partial information is introduced. It measures the minimal approximation error of a target in a class F by the family of all function classes of pseudo-dimension d under a given partial information which consists of any n measurements which may be expressed as linear operators. As an application, we consider the problem of learning a Sobolev target class. The tradeoff between the amount of partial information and the sample complexity is calculated and by obtaining fairly tight upper and lower bounds on In,d we identify an almost optimal way of providing partial information.

Discrete Applied Mathematics | 2012

Analysis of a multi-category classifier

Martin Anthony; Joel Ratsaby

The use of boxes for pattern classification has been widespread and is a fairly natural way in which to partition data into different classes or categories. In this paper we consider multi-category classifiers which are based on unions of boxes. The classification method studied may be described as follows: find boxes such that all points in the region enclosed by each box are assumed to belong to the same category, and then classify remaining points by considering their distances to these boxes, assigning to a point the category of the nearest box. This extends the simple method of classifying by unions of boxes by incorporating a natural way (based on proximity) of classifying points outside the boxes. We analyze the generalization accuracy of such classifiers and we obtain generalization error bounds that depend on a measure of how definitive is the classification of training points.

similarity search and applications | 2013

Machine Learning for Image Classification and Clustering Using a Universal Distance Measure

Uzi Chester; Joel Ratsaby

We present a new method for image feature-extraction which is based on representing an image by a finite-dimensional vector of distances that measure how different the image is from a set of image prototypes. We use the recently introduced Universal Image Distance UID [1] to compare the similarity between an image and a prototype image. The advantage in using the UID is the fact that no domain knowledge nor any image analysis need to be done. Each image is represented by a finite dimensional feature vector whose components are the UID values between the image and a finite set of image prototypes from each of the feature categories. The method is automatic since once the user selects the prototype images, the feature vectors are automatically calculated without the need to do any image analysis. The prototype images can be of different size, in particular, different than the image size. Based on a collection of such cases any supervised or unsupervised learning algorithm can be used to train and produce an image classifier or image cluster analysis. In this paper we present the image feature-extraction method and use it on several supervised and unsupervised learning experiments for satellite image data. The feature-extraction method is scalable and is easily implementable on multi-core computing resources.

Explore More