Is this you? Create Your Porfile

Warren Shen

University of Wisconsin-Madison

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Warren Shen is active.

Explore More

Publication

Featured researches published by Warren Shen.

international conference on data engineering | 2008

Matching Schemas in Online Communities: A Web 2.0 Approach

Robert Emmett Mccann; Warren Shen; AnHai Doan

When integrating data from multiple sources, a key task that online communities often face is to match the schemas of the data sources. Today, such matching often incurs a huge workload that overwhelms the relatively small set of volunteer integrators. In such cases, community members may not even volunteer to be integrators, due to the high workload, and consequently no integration systems can be built. To address this problem, we propose to enlist the multitude of users in the community to help match the schemas, in a Web 2.0 fashion. We discuss the challenges of this approach and provide initial solutions. Finally, we describe an extensive set of experiments on both real-world and synthetic data that demonstrate the utility of the approach.

international conference on management of data | 2009

Information extraction challenges in managing unstructured data

AnHai Doan; Jeffrey F. Naughton; Raghu Ramakrishnan; Akanksha Baid; Xiaoyong Chai; Fei Chen; Ting Chen; Eric Chu; Pedro DeRose; Byron J. Gao; Chaitanya Gokhale; Jiansheng Huang; Warren Shen; Ba-Quy Vuong

Over the past few years, we have been trying to build an end-to-end system at Wisconsin to manage unstructured data, using extraction, integration, and user interaction. This paper describes the key information extraction (IE) challenges that we have run into, and sketches our solutions. We discuss in particular developing a declarative IE language, optimizing for this language, generating IE provenance, incorporating user feedback into the IE process, developing a novel wiki-based user interface for feedback, best-effort IE, pushing IE into RDBMSs, and more. Our work suggests that IE in managing unstructured data can open up many interesting research challenges, and that these challenges can greatly benefit from the wealth of work on managing structured data that has been carried out by the database community.

international conference on data engineering | 2005

Integrating data from disparate sources: a mass collaboration approach

Robert McCann; Alexander Kramnik; Warren Shen; Vanitha Varadarajan; Olu Sobulo; AnHai Doan

The rapid growth of distributed data at enterprises and on the WWW has fueled significant interest in building data integration systems. Such a system provides users with a uniform query interface (called mediated schema) to a multitude of data sources, thus freeing them from manually querying each individual source. To address some problems in the MOBS (Mass Collaboration to Build Systems) project at the University of Illinois, we develop solutions that learn from the multitude of users in the integration environment to improve the accuracy of integration tools. The improved accuracy in turn can significantly reduce the workload of the system builder. In developing MOBS we address the following key challenges: (i) obtaining user participation, (ii) learning from user participation, and (iii) combining user answers.

very large data bases | 2007