Lide Wu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lide Wu is active.

Explore More

Publication

Featured researches published by Lide Wu.

empirical methods in natural language processing | 2009

Phrase Dependency Parsing for Opinion Mining

Yuanbin Wu; Qi Zhang; Xuangjing Huang; Lide Wu

In this paper, we present a novel approach for mining opinions from product reviews, where it converts opinion mining task to identify product features, expressions of opinions and relations between them. By taking advantage of the observation that a lot of product features are phrases, a concept of phrase dependency parsing is introduced, which extends traditional dependency parsing to phrase level. This concept is then implemented for extracting relations between product features and expressions of opinions. Experimental evaluations show that the mining task can benefit from phrase dependency parsing.

empirical methods in natural language processing | 2003

A fast algorithm for feature selection in conditional maximum entropy modeling

Yaqian Zhou; Lide Wu; Fuliang Weng; Hauke Schmidt

This paper describes a fast algorithm that selects features for conditional maximum entropy modeling. Berger et al. (1996) presents an incremental feature selection (IFS) algorithm, which computes the approximate gains for all candidate features at each selection stage, and is very time-consuming for any problems with large feature spaces. In this new algorithm, instead, we only compute the approximate gains for the top-ranked features based on the models obtained from previous stages. Experiments on WSJ data in Penn Treebank are conducted to show that the new algorithm greatly speeds up the feature selection process while maintaining the same quality of selected features. One variant of this new algorithm with look-ahead functionality is also tested to further confirm the good quality of the selected features. The new algorithm is easy to implement, and given a feature space of size F, it only uses O(F) more space than the original IFS algorithm.

international conference on multimedia and expo | 2004

Key frame extraction using inter-shot information

Jiawei Rong; Wanjun Jin; Lide Wu

Key frame extraction is one of the basic procedures relating to video retrieval and summary. Efficient key frame extraction techniques will facilitate video browsing systems, which have wide applications in the real world. We establish a new criterion for well-representative key frames and correspondingly, create a key frame selection algorithm based on FF-ISF. Its novelty lies in that it not only uses intra-shot information but inter-shot information as well, for key frame extraction.

international conference on machine learning | 2009

Sparse higher order conditional random fields for improved sequence labeling

Xian Qian; Xiaoqian Jiang; Qi Zhang; Xuanjing Huang; Lide Wu

In real sequence labeling tasks, statistics of many higher order features are not sufficient due to the training data sparseness, very few of them are useful. We describe Sparse Higher Order Conditional Random Fields (SHO-CRFs), which are able to handle local features and sparse higher order features together using a novel tractable exact inference algorithm. Our main insight is that states and transitions with same potential functions can be grouped together, and inference is performed on the grouped states and transitions. Though the complexity is not polynomial, SHO-CRFs are still efficient in practice because of the feature sparseness. Experimental results on optical character recognition and Chinese organization name recognition show that with the same higher order feature set, SHO-CRFs significantly outperform previous approaches.

international joint conference on natural language processing | 2004

BBS based hot topic retrieval using back-propagation neural network

Lan You; Yongping Du; Jiayin Ge; Xuanjing Huang; Lide Wu

BBS, often referred to as forum, is a system that offers so much information, where people talk about various topics. Some topics are hot while others are unpopular. It’s rather a hard job for a person to find out hot topics in these tons of information. In this paper we introduce a system that automatically retrieves hot topics on BBS. Unlike some topic detection systems, this system not only discovers topics but also judges their hotness. Messages are first clustered into topics based on their lexical similarity. Then a BPNN (Back-Propagation Neural Network) based classification algorithm is used to judge the hotness of topic according to its popularity, its quality as well as its message distribution over time. We have conducted experiments over Yahoo! Message Board (Yahoo BBS) and retrieved satisfactory results.

international joint conference on natural language processing | 2004

A novel pattern learning method for open domain question answering

Yongping Du; Xuanjing Huang; Xin Li; Lide Wu

Open Domain Question Answering (QA) represents an advanced application of natural language processing. We develop a novel pattern based method for implementing answer extraction in QA. For each type of question, the corresponding answer patterns can be learned from the Web automatically. Given a new question, these answer patterns can be applied to find the answer. Although many other QA systems have used pattern based method, however, it is noteworthy that our method has been implemented automatically and it can handle the problem other system failed, and satisfactory results have been achieved. Finally, we give a performance analysis of this approach using the TREC-11 question set.

international conference on multimedia and expo | 2004

Boosting image classification scheme

Xipeng Qiu; Zhe Feng; Lide Wu

Image classification is very active and promising research domain in image retrieval and management. We propose a boosting image classification scheme with automatic selection of discriminative features. Firstly, we present an image feature called the orientational color correlogram (OCC) and apply it to image classification. OCC extends the color correlogram by adding in orientational information which can take into account both the local color correlation and the global context structure of an image. Secondly, we give a solution to feature selection for the very high dimensionality of OCC by using a boosting classification scheme which can select the most discriminative features automatically. In our experiments, only a small number of elements of OCC are selected, which can reduce the storage space of classifier models and speed up the classification process. The experimental results suggest the proposed method has preferable performances

international acm sigir conference on research and development in information retrieval | 2009

Template-independent wrapper for web forums

Qi Zhang; Yang Shi; Xuanjing Huang; Lide Wu

This paper presents a novel work on the task of extracting data from Web forums. Millions of users contribute rich information to Web forum everyday, which has become an important resource for manyWeb applications, such as product opinion retrieval, social network analysis, and so on. The novelty of the proposed algorithm is that it can not only extract the pure text but also distinguish between the original post and replies. Experimental results on a large number of real Web forums indicate that the proposed algorithm can correctly ex

international world wide web conferences | 2010

Selective recrawling for object-level vertical search

Yaqian Zhou; Mengjing Jiang; Qi Zhang; Xuanjing Huang; Lide Wu

In this paper we propose a novel recrawling method based on navigation patterns called Selective Recrawling. The goal of selective recrawling is to automatically select page collections that have large coverage and little redundancy to a pre-defined vertical domain. It only requires several seed objects and can select a set of URL patterns to cover most objects. The selected set can be used to recrawl the web pages for quite a period of time and renewed periodically. Experiments on local event data show that our method can greatly reduce the downloading of web pages while keep the comparative object coverage.

asia information retrieval symposium | 2008

Graph mutual reinforcement based bootstrapping

Qi Zhang; Yaqian Zhou; Xuanjing Huang; Lide Wu

In this paper, we present a new bootstrapping method based on Graph Mutual Reinforcement (GMR-Bootstrapping) to learn semantic lexicons. The novelties of this work include 1) We integrate Graph Mutual Reinforcement method with the Bootstrapping structure to sort the candidate words and patterns; 2) Patterns uncertainty is defined and used to enhance GMR-Bootstrapping to learn multiple categories simultaneously. Experimental results on MUC4 corpus show that GMR-Bootstrapping outperforms the state-of-the-art algorithms. We also use it to extract names of automobile manufactures and models from Chinese corpus. It achieves good results too.

Explore More