Jose Picado
Oregon State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jose Picado.
international conference on management of data | 2017
Jose Picado; Arash Termehchy; Alan Fern; Parisa Ataei
Learning novel relations from relational databases is an important problem with many applications. Relational learning algorithms learn the definition of a new relation in terms of existing relations in the database. Nevertheless, the same database may be represented under different schemas for various reasons, such as data quality, efficiency and usability. The output of current relational learning algorithms tends to vary quite substantially over the choice of schema. This variation complicates their off-the-shelf application. We introduce and formalize the property of schema independence of relational learning algorithms, and study both the theoretical and empirical dependence of existing algorithms on the common class of (de) composition schema transformations. We show that current algorithms are not schema independent. We propose Castor, a relational learning algorithm that achieves schema independence by leveraging data dependencies.
Knowledge and Information Systems | 2017
Sriraam Natarajan; Vishal Bangera; Tushar Khot; Jose Picado; Anurag Wazalwar; Vítor Santos Costa; David C. Page; Michael D. Caldwell
Adverse drug events (ADEs) are a major concern and point of emphasis for the medical profession, government, and society. A diverse set of techniques from epidemiology, statistics, and computer science are being proposed and studied for ADE discovery from observational health data (e.g., EHR and claims data), social network data (e.g., Google and Twitter posts), and other information sources. Methodologies are needed for evaluating, quantitatively measuring and comparing the ability of these various approaches to accurately discover ADEs. This work is motivated by the observation that text sources such as the Medline/Medinfo library provide a wealth of information on human health. Unfortunately, ADEs often result from unexpected interactions, and the connection between conditions and drugs is not explicit in these sources. Thus, in this work, we address the question of whether we can quantitatively estimate relationships between drugs and conditions from the medical literature. This paper proposes and studies a state-of-the-art NLP-based extraction of ADEs from text.
inductive logic programming | 2014
Sriraam Natarajan; Jose Picado; Tushar Khot; Kristian Kersting; Christopher Ré; Jude W. Shavlik
One of the challenges to information extraction is the requirement of human annotated examples, commonly called gold-standard examples. Many successful approaches alleviate this problem by employing some form of distant supervision, i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision methods rely on a hand-coded background knowledge that explicitly looks for patterns in text. For example, they assume all sentences containing Person X and Person Y are positive examples of the relation marriedX, Y. In this work, we take a different approach --- we infer weakly supervised examples for relations from models learned by using knowledge outside the natural language task. We argue that this method creates more robust examples that are particularly useful when learning the entire information-extraction model the structure and parameters. We demonstrate on three domains that this form of weak supervision yields superior results when learning structure compared to using distant supervision labels or a smaller set of gold-standard labels.
very large data bases | 2018
Jose Picado; Arash Termehchy; Sudhanshu Pathak
Given a relational database and training examples for a target relation, relational learning algorithms learn a Datalog program that defines the target relation in terms of the existing relations in the database. We demonstrate CastorX, a relational learning system that performs relational learning over heterogeneous databases. The user specifies matching attributes between (heterogeneous) databases through matching dependencies. Because the content in these attributes may not match exactly, CastorX uses similarity operators to find matching values in these attributes. As the learning process may become expensive, CastorX implements sampling techniques that allow it to learn efficiently and output accurate definitions. PVLDB Reference Format: Jose Picado, Arash Termehchy, and Sudhanshu Pathak. Learning Efficiently Over Heterogeneous Databases. PVLDB, 11 (12): 2066 2069, 2018. DOI: https://doi.org/10.14778/3229863.3236261
international conference on management of data | 2018
Jose Picado; Willis Lang; Edward C. Thayer
Public cloud database providers observe all sorts of different usage patterns and behaviors while operating their services. Service providers such as Microsoft try to understand and characterize these behaviors in order to improve the quality of their service, provide new features for customers, and/or increase the efficiency of the operations. While there are many types of patterns of behavior that are of interest to providers, such as query types, workload intensity, and temporal activity, in this paper, we focus on the lowest level of behavior -- how long do public cloud databases survive before being dropped? Given the large and diverse relational database population that Azure SQL DB has, we present a large-scale survivability study of our service and identify some factors that can demonstrably help predict the lifespan of cloud databases. The results of this study are being used to influence how Azure SQL DB operates in order to increase efficiency as well as improve customer experience.
international conference on management of data | 2018
Jose Picado; Arash Termehchy; Sudhanshu Pathak
Given a relational database and training examples for a target relation, relational learning algorithms learn a definition for the target relation in terms of the existing relations in the database. We propose a relational learning system called CastorX, which learns efficiently across multiple heterogeneous databases. The user specifies connections and relationships between different databases using a set of declarative constraints called matching dependencies (MDs). Each MD connects tuples across multiple databases that are related and can meaningfully join but the values of their join attributes may not be equal due to the different representations of these values in different databases. CastorX leverages these constraints during learning to find the information relevant to the training data and target definition across multiple databases. Since each tuple in a database may be connected to too many tuples in other databases according to an MD, the learning process will become very slow. Hence, CastorX uses sampling techniques to learn efficiently and output accurate definitions.
Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning | 2017
Jose Picado; Arash Termehchy; Alan Fern; Sudhanshu Pathak
Relational databases are valuable resources for learning novel and interesting relations and concepts. Relational learning algorithms learn the definition of new relations in terms of the existing relations in the database. In order to constraint the search through the large space of candidate definitions, users must specify a language bias. Unfortunately, specifying the language bias is done via trial and error and is guided by the experts intuitions. Hence, it normally takes a great deal of time and effort to effectively use these algorithms. We report our on-going work on building AutoMode, a system that leverages information in the schema and content of the database to automatically induce the language bias used by popular relational learning algorithms.
very large data bases | 2016
Jose Picado; Parisa Ataei; Arash Termehchy; Alan Fern
Learning novel relations from relational databases is an important problem with many applications in database systems and machine learning. Relational learning algorithms leverage the properties of the database schema to find the definition of the target relation in terms of the existing relations in the database. However, the same data set may be represented under different schemas for various reasons, such as efficiency and data quality. Unfortunately, current relational learning algorithms tend to vary quite substantially over the choice of schema, which complicates their off-the-shelf application. We demonstrate Castor, a relational learning system that efficiently learns the same definitions over common schema variations. The results of Castor are more accurate than well-known learning systems over large data.
international conference on data mining | 2016
Yodsawalai Chodpathumwan; Jose Picado; Arash Termehchy; Alan Fern; Yizhou Sun
Database analytics algorithms leverage quantifiable structural properties of the data to predict interesting concepts and relationships. The same information, however, can be represented using many different structures and the structural properties observed over particular representations do not necessarily hold for alternative structures. Because these algorithms tend to be highly effective over some choices of structure, such as that of the databases used to validate them, but not so effective with others, database analytics has largely remained the province of experts who can find the desired forms for these algorithms. We argue that in order to make database analytics usable, we should use or develop algorithms that are effective over a wide range of choices of structural organizations. We introduce the notion of representation independence and empirically analyze the amount of representation independence of some popular database analytics algorithms. Our results indicate that most algorithms are not generally representation independent and find some characteristics of more representation independent heuristics.
Proceedings of Workshop on GRAph Data management Experiences and Systems | 2014
Yodsawalai Chodpathumwan; Arash Termehchy; Yizhou Sun; Amirhossein Aleyasin; Jose Picado
Finding similar entities over data graphs is an important problem with many applications. Current similarity search algorithms use intuitively appealing heuristics that leverage the link information in the data graph to quantify the degree of similarity between its entities. In this paper, using examples from real-world data sets, we show that people represent the same information using data graphs with different shapes. We argue that in order for a similarity search algorithm to be usable and effective, it should be representation independent: it should return essentially the same answers for a query over different graphs that represent the same information. We formalize this property and show that the outcome of current similarity search algorithms depend highly on data representation. Hence, they may be effective on some datasets and ineffective over others. We also perform an empirical study and analyze the sensitivity of current methods against changes in data representation. Our results indicate that the output of these algorithms are highly affected by changes in data representation.