Carlos Domingo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carlos Domingo is active.

Explore More

Publication

Featured researches published by Carlos Domingo.

Data Mining and Knowledge Discovery | 2002

Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms

Carlos Domingo; Ricard Gavaldà; Osamu Watanabe

Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest research challenges is to develop methods that allow to use large amounts of data. One possible approach for dealing with huge amounts of data is to take a random sample and do data mining on it, since for many data mining applications approximate answers are acceptable. However, as argued by several researchers, random sampling is difficult to use due to the difficulty of determining an appropriate sample size. In this paper, we take a sequential sampling approach for solving this difficulty, and propose an adaptive sampling method that solves a general problem covering many actual problems arising in applications of discovery science. An algorithm following this method obtains examples sequentially in an on-line fashion, and it determines from the obtained examples whether it has already seen a large enough number of examples. Thus, sample size is not fixed a priori; instead, it adaptively depends on the situation. Due to this adaptiveness, if we are not in a worst case situation as fortunately happens in many practical applications, then we can solve the problem with a number of examples much smaller than required in the worst case. We prove the correctness of our method and estimates its efficiency theoretically. For illustrating its usefulness, we consider one concrete task requiring sampling, provide an algorithm based on our method, and show its efficiency experimentally.

pacific asia conference on knowledge discovery and data mining | 2000

Scaling Up a Boosting-Based Learner via Adaptive Sampling

Carlos Domingo; Osamu Watanabe

In this paper we present a experimental evaluation of a boosting based learning system and show that can be run efficiently over a large dataset. The system uses as base learner decision stumps, single atribute decision trees with only two terminal nodes. To select the best decision stump at each iteration we use an adaptive sampling method. As a boosting algorithm, we use a modification of AdaBoost that is suitable to be combined with a base learner that does not use all the dataset. We provide experimental evidence that our method is as accurate as the equivalent algorithm that uses all the dataset but much faster.

conference on computational complexity | 1999

Non-automatizability of bounded-depth Frege proofs

Maria Luisa Bonet; Carlos Domingo; Ricard Gavaldà; Alexis Maciel; Toniann Pitassi

Abstract.In this paper, we show how to extend the argument due to Bonet, Pitassi and Raz to show that bounded-depth Frege proofs do not have feasible interpolation, assuming that factoring of Blum integers or computing the Diffie–Hellman function is sufficiently hard. It follows as a corollary that bounded-depth Frege is not automatizable; in other words, there is no deterministic polynomial-time algorithm that will output a short proof if one exists. A notable feature of our argument is its simplicity.

discovery science | 1998

Practical Algorithms for On-line Sampling

Carlos Domingo; Ricard Gavaldà; Osamu Watanabe

One of the core applications of machine learning to knowledge discovery is building a hypothesis (such as a decision tree or neural network) from a given amount of data, so that we can later use it to predict new instances of the data. In this paper, we focus on a particular situation where we assume that the hypothesis we want to use for prediction is a very simple one so the hypotheses class is of feasible size. We study the problem of how to determine which of the hypotheses in the class is almost the best one. We present two on-line sampling algorithms for selecting a hypothesis, give theoretical bounds on the number of examples needed, and analyze them experimentally. We compare them with the simple batch sampling approach commonly used and show that in most of the situations our algorithms use a much smlaler number of examples.

international conference on algorithms and complexity | 1997

Polynominal Time Algorithms for Some Self-Duality Problems

Carlos Domingo

Consider the problem of deciding whether a Boolean formula f is self-dual, i.e. f is logically equivalent to its dual formula fd, defined by fd(x)=¯f(¯x). This problem is a well-studied problem in several areas like theory of coteries, database theory, hypergraph theory or computational learning theory. In this paper we exhibit polynomial time algorithms for testing self-duality for several natural classes of formulas where the problem was not known to be solvable. Some of the results are obtained by means of a new characterization of self-dual formulas in terms of its Fourier spectrum.

algorithmic learning theory | 1999

Faster Near-Optimal Reinforcement Learning: Adding Adaptiveness to the E3 Algorithm

Carlos Domingo

Recently, Kearns and Singh presented the first provably efficient and near-optimal algorithm for reinforcement learning in general Markov decision processes. One of the key contributions of the algorithm is its explicit treatment of the exploration-exploitation trade off. In this paper, we show how the algorithm can be improved by substituting the exploration phase, that builds a model of the underlying Markov decision process by estimating the transition probabilities, by an adaptive sampling method more suitable for the problem. Our improvement is two-folded. First, our theoretical bound on the worst case time needed to converge to an almost optimal policy is significatively smaller. Second, due to the adaptiveness of the sampling method we use, we discuss how our algorithm might perform better in practice than the previous one.

randomization and approximation techniques in computer science | 1998

A Role of Constraint in Self-Organization

Carlos Domingo; Osamu Watanabe; Tadashi Yamazaki

In this paper we study a neural network model of self-organization. This model uses a variation of a Hebb rule for updating its synaptic weights, and surely converges to the equilibrium status. The key point of the convergence is the update rule that constrains the total synaptic weight and this seems to make the model stable. We investigate the role of the constraint and show that it is the constraint that makes the model stable. For analyzing this setting, we propose a simple probabilistic game that abstracts the neural network and the self-organization process. Then, we investigate the characteristics of this game, namely, the probability that the game becomes stable and the number of the steps it takes.

computing and combinatorics conference | 1996

Exact Learning of Subclasses of CDNF Formulas with Membership Queries

Carlos Domingo

We consider the exact, learuability of subclasses of Boolean formulas from membership queries alone. We show how to combine known learning algorithms that use membership and equivalence queries to obtain new learning results only with memberships. In particular we show the exact learuability of read-k monotone formulas, Sat-k\(\mathcal{O}\)(log n)-CDNF, and \(\mathcal{O}(\sqrt {\log n} )\)-size CDNF from membership queries only.

algorithmic learning theory | 1995

The Complexity of Learning Minor Closed Graph Classes

Carlos Domingo; John Shawe-Taylor

The paper considers the problem of learning classes of graphs closed under taking minors. It is shown that any such class can be properly learned in polynomial time using membership and equivalence queries. The representation of the class is in terms of a set of minimal excluded minors (obstruction set). Moreover, a negative result for learning such classes using only equivalence queries is also provided, after introducing a notion of reducibility among query learning problems.

computing and combinatorics conference | 1997

Corrigendum: Exact learning of subclasses of CDNF formulas with membership queries

Carlos Domingo

1 I n t r o d u c t i o n The reader should refer to the original paper for problem statement and definitions. In Section 3 it is shown the exact learnability from membership queries of the class of read-k monotone DNF formulas in time and number of queries polynomial in the DNF and the CNF size. The result is proved by showing that we can decide whether a read-k monotone DNF formula f and a monotone CNF g are logically equivalent or not in polynomial time (Lemma 5 in the paper). The proved was based in the notion of positive sensitive assignments and some properties stated in Propositions 3 and 4. Although Proposition 3 and 4 are correct, it is not true, as claimed in Lemma 5, that we can find a positive sensitive assignment that witnesses the non-equivalence between f and g since in general there could be an exponential number of them. Again, we refer the reader to the original paper for details on this. In the following section we give a different proof of Lemma 5 of the original paper. Thus, the main result in Section 3 of the original paper, namely that we can exactly learn read-k monotone DNF with membership queries in time and number of queries polynomial in the DNF and CNF size of the target (Theorem 6 in the original paper) is still correct. Recently, Mishra and Pitt IMP97] have proved the following related result. Given a monotone read-k DNF one can obtain in output polynomial time its equivalent monotone CNF formula. They have also proved that any read-k monotone DNF formula f is exactly learnable with membership queries using time and number of queries polynomial in the DNF and the CNF size of f 2. Their result is slightly weaker than ours in the following sense. It is known [BI95] that an algorithm for deciding the equivalence between a monotone read-k DNF and This research done while the author was at the Department of Computer Science, Tokyo Institute of Technology, Japan. Actually, they give the dual results for monotone CNF formulas

Explore More