Yanan Qian | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yanan Qian is active.

Explore More

Publication

Featured researches published by Yanan Qian.

conference on information and knowledge management | 2011

Combining machine learning and human judgment in author disambiguation

Yanan Qian; Yunhua Hu; Jianling Cui; Qinghua Zheng; Zaiqing Nie

Author disambiguation in digital libraries becomes increasingly difficult as the number of publications and consequently the number of ambiguous author names keep growing. The fully automatic author disambiguation approach could not give satisfactory results due to the lack of signals in many cases. Furthermore, human judgment on the basis of automatic algorithms is also not suitable because the automatically disambiguated results are often mixed and not understandable for humans. In this paper, we propose a Labeling Oriented Author Disambiguation approach, called LOAD, to combine machine learning and human judgment together in author disambiguation. LOAD exploits a framework which consists of high precision clustering, high recall clustering, and top dissimilar clusters selection and ranking. In the framework, supervised learning algorithms are used to train the similarity functions between publications and a clustering algorithm is further applied to generate clusters. To validate the effectiveness and efficiency of the proposed LOAD approach, comprehensive experiments are conducted. Comparing to conventional author disambiguation algorithms, the LOAD yields much more accurate results to assist human labeling. Further experiments show that the LOAD approach can save labeling time dramatically.

very large data bases | 2011

Mining learning-dependency between knowledge units from text

Jun Liu; Lu Jiang; Zhaohui Wu; Qinghua Zheng; Yanan Qian

Identifying learning-dependency among the knowledge units (KU) is a preliminary requirement of navigation learning. Methods based on link mining lack the ability of discovering such dependencies among knowledge units that are arranged in a linear way in the text. In this paper, we propose a method of mining the learning- dependencies among the KU from text document. This method is based on two features that we found and studied from the KU and the learning-dependencies among them. They are the distributional asymmetry of the domain terms and the local nature of the learning-dependency, respectively. Our method consists of three stages, (1) Build document association relationship by calculating the distributional asymmetry of the domain terms. (2) Generate the candidate KU-pairs by measuring the locality of the dependencies. (3) Use classification algorithm to identify the learning-dependency between KU-pairs. Our experimental results show that our method extracts the learning-dependency efficiently and reduces the computational complexity.

Information Retrieval | 2015

Dynamic author name disambiguation for growing digital libraries

Yanan Qian; Qinghua Zheng; Tetsuya Sakai; Junting Ye; Jun Liu

AbstractWhen a digital library user searches for publications by an author name, she often sees a mixture of publications by different authors who have the same name. With the growth of digital libraries and involvement of more authors, this author ambiguity problem is becoming critical. Author disambiguation (AD) often tries to solve this problem by leveraging metadata such as coauthors, research topics, publication venues and citation information, since more personal information such as the contact details is often restricted or missing. In this paper, we study the problem of how to efficiently disambiguate author names given an incessant stream of published papers. To this end, we propose a “BatchAD+IncAD” framework for dynamic author disambiguation. First, we perform batch author disambiguation (BatchAD) to disambiguate all author names at a given time by grouping all records (each record refers to a paper with one of its author names) into disjoint clusters. This establishes a one-to-one mapping between the clusters and real-world authors. Then, for newly added papers, we periodically perform incremental author disambiguation (IncAD), which determines whether each new record can be assigned to an existing cluster, or to a new cluster not yet included in the previous data. Based on the new data, IncAD also tries to correct previous AD results. Our main contributions are: (1) We demonstrate with real data that a small number of new papers often have overlapping author names with a large portion of existing papers, so it is challenging for IncAD to effectively leverage previous AD results. (2) We propose a novel IncAD model which aggregates metadata from a cluster of records to estimate the author’s profile such as her coauthor distributions and keyword distributions, in order to predict how likely it is that a new record is “produced” by the author. (3) Using two labeled datasets and one large-scale raw dataset, we show that the proposed method is much more efficient than state-of-the-art methods while ensuring high accuracy.

conference on information and knowledge management | 2013

Dynamic query intent mining from a search log stream

Yanan Qian; Tetsuya Sakai; Junting Ye; Qinghua Zheng; Cong Li

It has long been recognized that search queries are often broad and ambiguous. Even when submitting the same query, different users may have different search intents. Moreover, the intents are dynamically evolving. Some intents are constantly popular with users, others are more bursty. We propose a method for mining dynamic query intents from search query logs. By regarding the query logs as a data stream, we identify constant intents while quickly capturing new bursty intents. To evaluate the accuracy and efficiency of our method, we conducted experiments using 50 topics from the NTCIR INTENT-9 data and additional five popular topics, all supplemented with six-month query logs from a commercial search engine. Our results show that our method can accurately capture new intents with short response time.

international conference on e-business engineering | 2010

Yotta: A Knowledge Map Centric E-Learning System

Qinghua Zheng; Yanan Qian; Jun Liu

Current e-learning systems are primarily resource oriented, rather than cognition oriented. To reduce learners’ cognitive overload in e-learning, we proposed a novel e-learning system Yotta tackling the problem of knowledge acquisition, knowledge presentation, and knowledge resources management. The granularity for knowledge acquisition in Yotta is based on knowledge units that are the smallest integral learning objects in a specific domain. Knowledge presentation and navigation in Yotta is based on knowledge maps, which can exhibit knowledge and the intrinsic knowledge relations at both concept level and knowledge unit level. Automatic knowledge unit extraction and knowledge map construction techniques are also discussed in this paper. Yotta allows for huge concurrent access and enormous resource storage, as it is deployed on a cloud platform with the Hadoop distributed file system. The Yotta demo has already been implemented on our campus network, and has been approved by hundreds of students. Yotta offers new ideas for building e-learning systems, and its core techniques still require further study.

computer supported cooperative work in design | 2009

ETM Toolkit: A development tool based on Extended Topic Map

Lu Jiang; Jun Liu; Zhaohui Wu; Qinghua Zheng; Yanan Qian

By research on Topic Map standard, the Extended Topic Map (ETM) is proposed as a novel model for organization and management of the massive knowledge resources in E-learning. Based on the model, an Extended Topic Map Toolkit is designed and implemented, which allows for operations as exploration, search, consistency check and etc. The ETM Toolkit not only provides learners with visual navigation and search on massive E-learning resources, but also offers an efficient way for instructors to build the shareable and reusable domain knowledge. By ETM Toolkit, an extended topic map with a certain scale on Computer Networks has been built and is currently available for students in our university.

acm symposium on applied computing | 2010

Mining preorder relation between knowledge units from text

Jun Liu; Lu Jiang; Zhaohui Wu; Qinghua Zheng; Yanan Qian

Preorder relation between Knowledge Units (KU) is the precondition for navigation learning. Although possible solutions, existing link mining methods lack the ability of mining preorder relation between knowledge units which are linearly arranged in text. Through the analysis of sample data, we discovered and studied two characteristics of knowledge units: the locality of preorder relation and the distribution asymmetry of domain terms. Based on these two characteristics, a method is presented for mining preorder relation between knowledge units from text documents, which proceeds in three stages. Firstly, the associations between text documents are established according to the distribution asymmetry of domain terms. Secondly, candidate KU-pairs are generated according to the locality of preorder relation. Finally, the preorder relations between KU-pairs are identified by using classification methods. The experimental results show the method can efficiently extract the preorder relation, and reduce the computational complexity caused by the quadratic problem of link mining.

asia information retrieval symposium | 2012

PLIDMiner: A Quality Based Approach for Researcher’s Homepage Discovery

Junting Ye; Yanan Qian; Qinghua Zheng

Researchers’ high quality homepages are important resources in academic search because they provide comprehensive and up-to-date information about researchers. Meanwhile, low quality homepages widely exist. A case study shows that 57.8% of all homepages retrieved among top 10 results from Google are low quality and 95% top researchers own out-of-date homepages. Besides, some academic portals generate dynamic homepages introducing researchers. These homepages are not maintained by researchers and may contain incorrect information. The quality of discovered homepages can not be ensured by existing work, which decreases the efficiency of academic search. It is difficult to define a high quality homepage from a quantitative perspective. Instead, on the basis of analyzing labeled high quality homepages, we propose “informative researcher’s homepage”, at least consisting of identifiable information (introducing a researcher’s basic information) and publication list (listing his/her corresponding publications), as an estimation for high quality homepage. Based on the observation that informative researchers’ homepages are organized in two ways, integrated and scattered, we propose an effective discovering model, PLIDMiner, with F1 scores over 0.9 on labeled data. Our model can also be applied to verify homepages’ quality. We crawl thousands of homepage resources from popular academic portals and assess their overall qualities. It turns out that nearly 25% of homepage resources in these portals are not informative, which strengthens our motivation.

international acm sigir conference on research and development in information retrieval | 2012