Is this you? Create Your Porfile

Lifeng Jia

University of Illinois at Chicago

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lifeng Jia is active.

Explore More

Publication

Featured researches published by Lifeng Jia.

conference on information and knowledge management | 2008

Improve the effectiveness of the opinion retrieval and opinion polarity classification

Wei Zhang; Lifeng Jia; Clement T. Yu; Weiyi Meng

Opinion retrieval is a document retrieving and ranking process. A relevant document must be relevant to the query and contain opinions toward the query. Opinion polarity classification is an extension of opinion retrieval. It classifies the retrieved document as positive, negative or mixed, according to the overall polarity of the query relevant opinions in the document. This paper (1) proposes several new techniques that help improve the effectiveness of an existing opinion retrieval system; (2) presents a novel two-stage model to solve the opinion polarity classification problem. In this model, every query relevant opinionated sentence in a document retrieved by our opinion retrieval system is classified as positive or negative respectively by a SVM classifier. Then a second classifier determines the overall opinion polarity of the document. Experimental results show that both the opinion retrieval system with the proposed opinion retrieval techniques and the polarity classification model outperformed the best reported systems respectively.

health information science | 2014

Design and implementation of Metta, a metasearch engine for biomedical literature retrieval intended for systematic reviewers

Neil R. Smalheiser; Can Lin; Lifeng Jia; Yu Jiang; Aaron M. Cohen; Clement T. Yu; John M. Davis; Clive E Adams; Marian McDonagh; Weiyi Meng

BackgroundIndividuals and groups who write systematic reviews and meta-analyses in evidence-based medicine regularly carry out literature searches across multiple search engines linked to different bibliographic databases, and thus have an urgent need for a suitable metasearch engine to save time spent on repeated searches and to remove duplicate publications from initial consideration. Unlike general users who generally carry out searches to find a few highly relevant (or highly recent) articles, systematic reviewers seek to obtain a comprehensive set of articles on a given topic, satisfying specific criteria. This creates special requirements and challenges for metasearch engine design and implementation.MethodsWe created a federated search tool that is connected to five databases: PubMed, EMBASE, CINAHL, PsycINFO, and the Cochrane Central Register of Controlled Trials. Retrieved bibliographic records were shown online; optionally, results could be de-duplicated and exported in both BibTex and XML format.ResultsThe query interface was extensively modified in response to feedback from users within our team. Besides a general search track and one focused on human-related articles, we also added search tracks optimized to identify case reports and systematic reviews. Although users could modify preset search options, they were rarely if ever altered in practice. Up to several thousand retrieved records could be exported within a few minutes. De-duplication of records returned from multiple databases was carried out in a prioritized fashion that favored retaining citations returned from PubMed.ConclusionsSystematic reviewers are used to formulating complex queries using strategies and search tags that are specific for individual databases. Metta offers a different approach that may save substantial time but which requires modification of current search strategies and better indexing of randomized controlled trial articles. We envision Metta as one piece of a multi-tool pipeline that will assist systematic reviewers in retrieving, filtering and assessing publications. As such, Metta may find wide utility for anyone who is carrying out a comprehensive search of the biomedical literature.

Applied Mathematics and Computation | 2007

RFIMiner: A regression-based algorithm for recently frequent patterns in multiple time granularity data streams

Lifeng Jia; Zhe Wang; Nan Lu; Xiujuan Xu; Dongbin Zhou; Yan Wang

In this paper, we propose an algorithm for computing and maintaining recently frequent patterns which is more stable and smaller than the data stream and dynamically updating them with the incoming transactions. Our study mainly has two contributions. First, a regression-based data stream model is proposed to differentiate new and old transactions. The novel model reflects transactions into many multiple time granularities and can automatically adjust transactional fading rate by defining a fading factor. The factor defines a desired life-time of the information of transactions in the data stream. Second, we develop RFIMiner, a single-scan algorithm for mining recently frequent patterns from data streams. Our algorithm employs a special property among suffix-trees, so it is unnecessary to traverse suffix-trees when patterns are discovered. To cater to suffix-trees, we also adopt a new method called Depth-first and Bottom-up Inside Itemset Growth to find more recently frequent patterns from known frequent ones. Moreover, it avoids generating redundant computation and candidate patterns as well. We conduct detailed experiments to evaluate the performance of algorithm in several aspects. Results confirm that the new method has an excellent scalability and the performance meets the condition which requires better quality and efficiency of mining recently frequent itemsets in the data stream.

ACM Transactions on Information Systems | 2013

The Impacts of Structural Difference and Temporality of Tweets on Retrieval Effectiveness

Lifeng Jia; Clement T. Yu; Weiyi Meng

To explore the information seeking behaviors in microblogosphere, the microblog track at TREC 2011 introduced a real-time ad-hoc retrieval task that aims at ranking relevant tweets in reverse-chronological order. We study this problem via a two-phase approach: 1) retrieving tweets in an ad-hoc way; 2) utilizing the temporal information of tweets to enhance the retrieval effectiveness of tweets. Tweets can be categorized into two types. One type consists of short messages not containing any URL of a Web page. The other type has at least one URL of a Web page in addition to a short message. These two types of tweets have different structures. In the first phase, to address the structural difference of tweets, we propose a method to rank tweets using the divide-and-conquer strategy. Specifically, we first rank the two types of tweets separately. This produces two rankings, one for each type. Then we merge these two rankings of tweets into one ranking. In the second phase, we first categorize queries into several types by exploring the temporal distributions of their top-retrieved tweets from the first phase; then we calculate the time-related relevance scores of tweets according to the classified types of queries; finally we combine the time scores with the IR scores from the first phase to produce a ranking of tweets. Experimental results achieved by using the TREC 2011 and TREC 2012 queries over the TREC Tweets2011 collection show that: (i) our way of ranking the two types of tweets separately and then merging them together yields better retrieval effectiveness than ranking them simultaneously; (ii) our way of incorporating temporal information into the retrieval process yields further improvements, and (iii) our method compares favorably with state-of-the-art methods in retrieval effectiveness.

granular computing | 2006

An agent-based dual-tier algorithm for clustering data streams

Dongbin Zhou; Lifeng Jia; Zhe Wang; Xiujuan Xu; Chunguang Zhou

Characteristics of data stream make it difficult for the clustering algorithms to satisfy the requirements on efficiency and effectiveness. This paper proposes a data stream clustering algorithm on dual-tier structure which employs the agent method. In the on-line process, a set of agents working simultaneously collect similar data points into sub-clusters by applying a heuristic strategy. And in the off-line process, summary information from the on-line component will be further analyzed to obtain the final clusters. The algorithm also supports the time-window queries on streams. The empirical evidence shows that this method can obtain high-quality clusters with low time complexity. analysis over an arbitrary period of the stream etc. As for stream clustering, a common method is dividing the streaming data into chunks, and algorithms for static sets can be used on each sub-set separately (2). In recent years, stream algorithms have developed into a two-phase structure (3), (4). Usually, a dual framework includes two parts: the on-line component and the off-line component. The former is responsible for the fast but rough processing of streaming data and saving the summary information to meet the one-pass restriction while the latter takes advantage of the information to conduct high-level analysis. At present, stream algorithms are still facing some problems, for example: sensitive to the initial data points; bad quality of clusters due to the loss of global information caused by dividing the stream; high time complexity etc. A novel dual-tier clustering algorithm for data streams, AGCluStream, is proposed in this paper. The on-line algorithm uses agents to make similar points denser in local areas, and record the temporary distribution of data according to the pyramidal time frame (3). The off-line algorithm uses these records to conduct time-window analysis and higher-level clustering analysis. AGCluStream dose not divide the stream, and it adopts an incomplete-partition strategy to maintain the global information more effectively.

fuzzy systems and knowledge discovery | 2005

SuffixMiner: efficiently mining frequent itemsets in data streams by suffix-forest

Lifeng Jia; Chunguang Zhou; Zhe Wang; Xiujuan Xu

We proposed a new algorithm SuffixMiner which eliminates the requirement of multiple passes through the data when finding out all frequent itemsets in data streams, takes full advantage of the special property of suffix-tree to avoid generating candidate itemsets and traversing each suffix-tree during the itemset growth, and utilizes a new itemset growth method to mine all frequent itemsets in data streams. Experiment results show that the SuffixMiner algorithm not only has an excellent scalability to mine frequent itemsets over data streams, but also outperforms Apriori and Fp-Growth algorithms.

advanced data mining and applications | 2005

Mining recent frequent itemsets in data streams by radioactively attenuating strategy

Lifeng Jia; Zhe Wang; Chunguang Zhou; Xiujuan Xu

We propose a novel approach for mining recent frequent itemsets. The approach has three key contributions. First, it is a single-scan algorithm which utilizes the special property of suffix-trees to guarantee that all frequent itemsets are mined. During the phase of itemset growth it is unnecessary to traverse the suffix-trees which are the data structure for storing the summary information of data. Second, our algorithm adopts a novel method for itemset growth which includes two special kinds of itemset growth operations to avoid generating any candidate itemset. Third, we devise a new regressive strategy from the attenuating phenomenon of radioelement in nature, and apply it into the algorithm to distinguish the influence of latest transactions from that of obsolete transactions. We conduct detailed experiments to evaluate the algorithm. It confirms that the new method has an excellent scalability and the performance illustrates better quality and efficiency.

Lecture Notes in Computer Science | 2005

DualRank: a dual-phase algorithm for optimal profit mining in retailing market

Xiujuan Xu; Lifeng Jia; Zhe Wang; Chunguang Zhou

We systematically propose a dual-phase algorithm, DualRank, to mine the optimal profit in retailing market. DualRank algorithm has two major phases which are called mining general profit phase and optimizing profit phase respectively. In the first phase, the novel sub-algorithm, ItemRank, integrates the random distribution of items into profit mining to improve the performance of item order. In the other phase, two novel optimizing sub-algorithms are proposed to ameliorating results generated in the first phase. According to the cross-selling effect and the self-profit of items, DualRank algorithm could solve the problem of item order objectively and mechanically. We conduct detailed experiments to evaluate DualRank algorithm and experiment result confirms that the new method has an excellent ability for profit mining and the performance meets the condition which requires better quality and efficiency.

conference on information and knowledge management | 2009