Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Snehal Thakkar is active.

Publication


Featured researches published by Snehal Thakkar.


IEEE Intelligent Systems | 2004

Retrieving and semantically integrating heterogeneous data from the Web

Martin Michalowski; José Luis Ambite; Snehal Thakkar; Rattapoom Tuchinda; Craig A. Knoblock

Building Finder uses semantic Web technologies to integrate different data types from various online data sources. The applications use of the RDF and RDF data query language makes it usable by computer agents as well as human users. An agent would send a query, expressed in terms of its preferred ontology (schema), to a system that would then find and integrate the relevant data from multiple sources and return it using the agents ontology. We discuss about retrieving and semantically integrating heterogeneous data from the Web.


very large data bases | 2005

Composing, optimizing, and executing plans for bioinformatics web services

Snehal Thakkar; Luis Ambite; A. Knoblock

The emergence of a large number of bioinformatics datasets on the Internet has resulted in the need for flexible and efficient approaches to integrate information from multiple bioinformatics data sources and services. In this paper, we present our approach to automatically generate composition plans for web services, optimize the composition plans, and execute these plans efficiently. While data integration techniques have been applied to the bioinformatics domain, the focus has been on answering specific user queries. In contrast, we focus on automatically generating parameterized integration plans that can be hosted as web services that respond to a range of inputs. In addition, we present two novel techniques that improve the execution time of the generated plans by reducing the number of requests to the existing data sources and by executing the generated plan more efficiently. The first optimization technique, called tuple-level filtering, analyzes the source/service descriptions in order to automatically insert filtering conditions in the composition plans that result in fewer requests to the component web services. To ensure that the filtering conditions can be evaluated, this technique may include sensing operations in the integration plan. The savings due to filtering significantly exceed the cost of the sensing operations. The second optimization technique consists in mapping the integration plans into programs that can be executed by a dataflow-style, streaming execution engine. We use real-world bioinformatics web services to show experimentally that (1) our automatic composition techniques can efficiently generate parameterized plans that integrate data from large numbers of existing services and (2) our optimization techniques can significantly reduce the response time of the generated integration plans.


International Journal of Geographical Information Science | 2004

Exploiting online sources to accurately geocode addresses

Rahul Bakshi; Craig A. Knoblock; Snehal Thakkar

Many Geographic Information System (GIS) applications require the conversion of an address to geographic coordinates. This process is called geocoding. The traditional geocoding method uses a street vector data source, such as, Tigerlines, to obtain address range and coordinates of the street segment on which the given address is located. Next, an approximation technique is used to estimate the location of the given address using the address range of the selected street segment. However, this provides inaccurate results since the approximation assumes that properties exist at all possible addresses and all properties are of equal size. To address the inaccuracy of the traditional geocoding approach, we propose two new methods for geocoding using additional online data sources. The first method, the uniform-lot-size method, uses the number of addresses/lots present on the street segment to approximate the location of an address. The second method, the actual-lot-size method, takes into consideration the lot sizes on the street segment and the orientation of the lots as well. Moreover, we describe an implementation of these methods using an information mediator to obtain information about actual number of lots and sizes of the lots on the streets from various property tax web sites. We geocoded an area covering 13 blocks (267 addresses) using all three methods. Our evaluation shows that the traditional method results in an average error of 36.85 meters, while the uniform-lot-size and the actual-lot-size methods result in the average error of 7.87 meters and 1.63 meters, respectively.


symposium on large spatial databases | 2003

Automatically Annotating and Integrating Spatial Datasets

Ching-Chien Chen; Snehal Thakkar; Craig A. Knoblock; Cyrus Shahabi

Recent growth of the geo-spatial information on the web has made it possible to easily access a wide variety of spatial data. By integrating these spatial datasets, one can support a rich set of queries that could not have been answered given any of these sets in isolation. However, accurately integrating geo-spatial data from different data sources is a challenging task. This is because spatial data obtained from various data sources may have different projections, different accuracy levels and different formats (e.g. raster or vector format). In this paper, we describe an information integration approach, which utilizes various geo-spatial and textual data available on the Internet to automatically annotate and conflate satellite imagery with vector datasets. We describe two techniques to automatically generate control point pairs from the satellite imagery and vector data to perform the conflation. The first technique generates the control point pairs by integrating information from different online sources. The second technique exploits the information from the vector data to perform localized image-processing on the satellite imagery. Using these techniques, we can automatically integrate vector data with satellite imagery or align multiple satellite images of the same area. Our automatic conflation techniques can automatically identify the roads in satellite imagery with an average error of 8.61 meters compared to the original error of 26.19 meters for the city of El Segundo and 7.48 meters compared to 15.27 meters for the city of Adams Morgan in Washington, DC.


Ai Magazine | 2005

Automatically utilizing secondary sources to align information across sources

Martin Michalowski; Snehal Thakkar; Craig A. Knoblock

XML, web services, and the semantic web have opened the door for new and exciting information-integration applications. Information sources on the web are controlled by different organizations or people, utilize different text formats, and have varying inconsistencies. Therefore, any system that integrates information from different data sources must identify common entities from these sources. Data from many data sources on the web does not contain enough information to link the records accurately using state-of-the-art record-linkage systems. However, it is possible to exploit secondary data sources on the web to improve the record-linkage process.We present an approach to accurately and automatically match entities from various data sources by utilizing a state-of-the-art record-linkage system in conjunction with a data-integration system. The data-integration system is able to automatically determine which secondary sources need to be queried when linking records from various data sources. In turn, the record-linkage system is then able to utilize this additional information to improve the accuracy of the linkage between datasets.


advances in geographic information systems | 2007

Quality-driven geospatial data integration

Snehal Thakkar; Craig A. Knoblock; José Luis Ambite

Accurate and efficient integration of geospatial data is an important problem with applications in areas such as emergency response and urban planning. Some of the key challenges in supporting large-scale geospatial data integration are automatically computing the quality of the data provided by a large number of geospatial sources and dynamically providing high quality answers to the user queries based on a quality criteria supplied by the user. We describe a framework called the Quality-driven Geospatial Mediator (QGM) that supports efficient and accurate integration of geospatial data from a large number of sources. The key contributions of our framework are: (1) the ability to automatically estimate the quality of data provided by a source by using the information from another source of known quality, (2) representing the quality of data provided by the sources in a declarative data integration framework, and (3) a query answering technique that exploits the quality information to provide high quality geospatial data in response to user queries. Our experimental evaluation using over 1200 real-world sources shows that QGM can accurately estimate the quality of geospatial sources. Moreover, QGM provides better quality data in response to the user queries compared to the traditional data integration systems and does so with lower response time.


international conference on multimedia and expo | 2006

Geodec: Enabling Geospatial Decision Making

Cyrus Shahabi; Yao-Yi Chiang; Kelvin Chung; Kai-Chen Huang; Jeff Khoshgozaran-Haghighi; Craig A. Knoblock; Sung Chun Lee; Ulrich Neumann; Ram Nevatia; Arjun Rihan; Snehal Thakkar; Suya You

The rapid increase in the availability of geospatial data has motivated the effort to seamlessly integrate this information into an information-rich and realistic 3D environment. However, heterogeneous data sources with varying degrees of consistency and accuracy pose a challenge to such efforts. We describe the geospatial decision making (GeoDec) system, which accurately integrates satellite imagery, three-dimensional models, textures and video streams, road data, maps, point data and temporal data. The system also includes a glove-based user interface


advances in geographic information systems | 2001

Efficiently querying moving objects with pre-defined paths in a distributed environment

Cyrus Shahabi; Mohammad R. Kolahdouzan; Snehal Thakkar; José Luis Ambite; Craig A. Knoblock

Due to the recent growth of the World Wide Web, numerous spatio-temporal applications can obtain their required information from publicly available web sources. We consider those sources maintaining moving objects with predefined paths and schedules, and investigate different plans to perform queries on the integration of these data sources efficiently. Examples of such data sources are networks of railroad paths and schedules for trains running between cities connected through these networks. A typical query on such data sources is to find all trains that pass through a given point on the network within a given time interval. We show that traditional filter+semi-join plans would not result in efficient query response times on distributed spatio-temporal sources. Hence, we propose a novel spatio-temporal filter, called deviation filter, that exploits both the spatial and temporal characteristics of the sources in order to improve the selectivity. We also report on our experiments in comparing the performances of the alternative query plans and conclude that the plan with spatio-temporal filter is the most viable and superior plan.


Archive | 2002

Dynamically Composing Web Services from On-line Sources

Snehal Thakkar; Craig A. Knoblock; José Luis Ambite; Cyrus Shahabi


Archive | 2003

A View Integration Approach to Dynamic Composition of Web Services

Snehal Thakkar; Craig A. Knoblock; José Luis Ambite

Collaboration


Dive into the Snehal Thakkar's collaboration.

Top Co-Authors

Avatar

Craig A. Knoblock

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Cyrus Shahabi

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

José Luis Ambite

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Ching-Chien Chen

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Martin Michalowski

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Rattapoom Tuchinda

Information Sciences Institute

View shared research outputs
Top Co-Authors

Avatar

Ewa Deelman

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Mohammad R. Kolahdouzan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Yao-Yi Chiang

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Yolanda Gil

University of Southern California

View shared research outputs
Researchain Logo
Decentralizing Knowledge