Antonio Badia | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Antonio Badia is active.

Explore More

Publication

Featured researches published by Antonio Badia.

international conference on management of data | 2004

Entity-Relationship modeling revisited

Antonio Badia

In this position paper, we argue the modern applications require databases to capture and enforce more domain semantics than traditional applications. We also argue that the best way to incorporate additional semantics into database systems is by capturing the added information in conceptual models and then using it for database design. In this light, we revisit Entity-Relationship models and investigate ways in which such models could be extended to play a role in the process. Inspired by a paper by Rafael Camps Pare ([2]), we suggest avenues of research in the issue.

web information systems engineering | 2002

Conceptual modeling for semistructured data

Antonio Badia

We review the more widely used models in ConceptualModeling for Information Systems (Entity-Relationship andUML), and argue that they do not support effectively modelingof semistructured data. As a consequence, structuredand semistructured data cannot be treated in an integrated,holistic way during requirements specification. We proposea set of minimal extensions to E-R models which allow usto capture information in an XML DTD, and provide a sim-plealgorithm to go from the extended E-R model to a relationaldesign, to an XML DTD or to a hybrid model withboth structured and semistructured parts. This last choice isnovel in the literature and opens the door for the possibilityof better designed and integrated information systems.

Journal of the Association for Information Science and Technology | 2014

Data, information, knowledge: An information science analysis

Antonio Badia

I analyze the text of an article that appeared in this journal in 2007 that published the results of a questionnaire in which a number of experts were asked to define the concepts of data, information, and knowledge. I apply standard information retrieval techniques to build a list of the most frequent terms in each set of definitions. I then apply information extraction techniques to analyze how the top terms are used in the definitions. As a result, I draw data‐driven conclusions about the aggregate opinion of the experts. I contrast this with the original analysis of the data to provide readers with an alternative viewpoint on what the data tell us.

ACM Transactions on Database Systems | 2007

SQL query optimization through nested relational algebra

Bin Cao; Antonio Badia

Most research work on optimization of nested queries focuses on aggregate subqueries. In this article, we show that existing approaches are not adequate for nonaggregate subqueries, especially for those having multiple subqueries and certain comparison operators. We then propose a new efficient approach, the nested relational approach, based on the nested relational algebra. The nested relational approach treats all subqueries in a uniform manner, being able to deal with nested queries of any type and any level. We report on experimental work that confirms that existing approaches have difficulties dealing with nonaggregate subqueries, and that the nested relational approach offers better performance. We also discuss algebraic optimization rules for further optimizing the nested relational approach and the issue of integrating it into relational database systems.

international world wide web conferences | 2006

Focused crawling: experiences in a real world project

Antonio Badia; Tulay Muezzinoglu; Olfa Nasraoui

Focused crawling is the act of examining a collection of hyperlinked documents (i.e. the Web) to find out those that are about a certain topic ([2, 1]). In contrast, general (unrestricted) crawling examines the whole collection, gathering some information (keywords) about each document. Here we report on our experiences building a focused crawler as part of a larger project. The National Surface Treatment Center (NSTCenter) is an organization run for the U.S. Navy by Innovative Productivity Inc., a non-profit company that provides innovative technology-enhanced services and solutions for National Defense, business, and work force customers. The NSTCenter web site was created with the goal to become a premier forum for Navy officers, independent consultants, researchers and companies offering products and/or services involved in the process of servicing Navy ships. In order to help generate content, we developed a focused web crawler that searched the web for information relevant to the NSTCenter. The team has developed a system that achieves significant precision; real recall is extremely difficult to establish in an open, dynamic environment like the Web, although some limited testing suggest that recall is also quite positive (see later). Focused crawling has attracted considerable attention recently ([1, 4, 5, 2, 3, 6]). Mos methods use primarily link structure to identify pages about a topic, or combine several measures from text analysis and link analysis to better characterize the page. [5] proposes a method similar to the one used here, in that a knowledge structure (an ontology) is used to identify relevant pages. We use a thesaurus, which does not have as much information but is much easier to build and maintain. Focused crawling on the real web can be extremely difficult for several reasons. First, the concept of topic is not formal or formalized (and perhaps not formalizable). As a consequence, the relationship of being about a topic, already difficult to determine, is even more difficult. Second, on a networked collection, the network itself is used to determine page aboutness, on the assumption that a page content can

international conference on management of data | 2005

A nested relational approach to processing SQL subqueries

Bin Cao; Antonio Badia

One of the most powerful features of SQL is the use of nested queries. Most research work on the optimization of nested queries focuses on aggregate subqueries. However, the solutions proposed for non-aggregate subqueries are still limited, especially for queries having multiple subqueries and null values. In this paper, we show that existing approaches to queries containing non-aggregate subqueries proposed in the literature (including rewrites) are not adequate. We then propose a new efficient approach, the nested relational approach, based on the nested relational algebra. Our approach directly unnests non-aggregate subqueries using hash joins, and treats all subqueries in a uniform manner, being able to deal with nested queries of any type and any level. We report on experimental work that confirms that existing approaches have difficulties dealing with non-aggregate subqueries, and that our approach offers better performance. We also discuss some possibilities for algebraic optimization and the issue of integrating our approach in a relational database system.

Journal of Applied Logic | 2007

Question answering and database querying: Bridging the gap with generalized quantification

Antonio Badia

Abstract Even though Questions Answering and Database Querying have very different goals and frameworks, collaboration between the two fields could be mutually beneficial. However, the different assumptions in each field makes such collaboration difficult. In this paper, we introduce a query language with generalized quantifiers (QLGQ) and show how it could be used to help bridge the gap between the two fields.

Proceedings of the 3rd international workshop on Link discovery | 2005

Graph building as a mining activity: finding links in the small

Antonio Badia; Mehmed Kantardzic

Many analysis of data proceed by building a graph out of the data set and then using social network theory and similar tools on the result. However, there is no theory concerning the construction of the graph itself, even though this is a very important process. In this paper, we attempt to provide a framework in which the graph building process is formalized and studied. We show the parameters (choices) involved in constructing a graph from raw data, and propose some new ways to combine and analyze the data. We also argue the importance of this approach in several domain applications, including criminal/terrorist investigations.

Knowledge and Information Systems | 2009

Exploiting maximal redundancy to optimize SQL queries

Bin Cao; Antonio Badia

Detecting and dealing with redundancy is an ubiquitous problem in query optimization, which manifests itself in many areas of research such as materialized views, multi-query optimization, and query-containment algorithms. In this paper, we focus on the issue of intra-query redundancy, redundancy present within a query. We present a method to detect the maximal redundancy present between a main (outer) query block and a subquery block. We then use the method for query optimization, introducing query plans and a new operator that take full advantage of the redundancy discovered. Our approach can deal with redundancy in a wider spectrum of queries than existing techniques. We show experimental evidence that our approach works under certain conditions, and compares favorably to existing optimization techniques when applicable.

international conference on data engineering | 2012

Opaque Attribute Alignment

Jennifer Sleeman; Rafael Alonso; Hua Li; Art Pope; Antonio Badia

Ontology alignment describes a process of mapping ontological concepts, classes and attributes between different ontologies providing a way to achieve interoperability. While there has been considerable research in this area, most approaches that rely upon the alignment of attributes use labelbased string comparisons of property names. The ability to process opaque or non-interpreted attribute names is a necessary component of attribute alignment. We describe a new attribute alignment approach to support ontology alignment that uses the density estimation as a means for determining alignment among objects. Using the combination of similarity hashing, Kernel Density Estimation (KDE) and Cross entropy, we are able to show promising F-Measure scores using the standard Ontology Alignment Evaluation Initiative (OAEI) 2011 benchmark.

Explore More