Leilei Kong
Harbin Engineering University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Leilei Kong.
Journal of Zhejiang University Science C | 2017
Leilei Kong; Zhimao Lu; Haoliang Qi; Zhongyuan Han
Plagiarism source retrieval is the core task of plagiarism detection. It has become the standard for plagiarism detection to use the queries extracted from suspicious documents to retrieve the plagiarism sources. Generating queries from a suspicious document is one of the most important steps in plagiarism source retrieval. Heuristic-based query generation methods are widely used in the current research. Each heuristic-based method has its own advantages, and no one statistically outperforms the others on all suspicious document segments when generating queries for source retrieval. Further improvements on heuristic methods for source retrieval rely mainly on the experience of experts. This leads to difficulties in putting forward new heuristic methods that can overcome the shortcomings of the existing ones. This paper paves the way for a new statistical machine learning approach to select the best queries from the candidates. The statistical machine learning approach to query generation for source retrieval is formulated as a ranking framework. Specifically, it aims to achieve the optimal source retrieval performance for each suspicious document segment. The proposed method exploits learning to rank to generate queries from the candidates. To our knowledge, our work is the first research to apply machine learning methods to resolve the problem of query generation for source retrieval. To solve the essential problem of an absence of training data for learning to rank, the building of training samples for source retrieval is also conducted. We rigorously evaluate various aspects of the proposed method on the publicly available PAN source retrieval corpus. With respect to the established baselines, the experimental results show that applying our proposed query generation method based on machine learning yields statistically significant improvements over baselines in source retrieval effectiveness.
international conference for young computer scientists | 2016
Zhongyuan Han; Wenhao Qiao; Shuo Cui; Leilei Kong
This demo shows a time-based microblog research system which developed based on the time profile to estimate the query model, the document model and rank function for microblog search. The system exploits the time profile to boost the performance of microblog search. A brief description of the time-based query model, time-based document model and time-based similarity score is introduced. The index strategy for temporal microblog search is described. Using TREC 2011 and TREC 2012 microblog retrieval collection, the examples of microblog search results are demonstrated.
international conference on computer science and network technology | 2015
Leilei Kong; Zhimao Lu; Haoliang Qi; Zhongyuan Han
The identification of high-obfuscation plagiarism seeds is one of the most difficult problems to be solved in plagiarism detection. Single feature type cannot identify the plagiarism seeds effectively because of the varied plagiarism methods used in high-obfuscation plagiarism. In this paper, a multi-features fusion method based on Logical Regression model for the high-obfuscation plagiarism seeds identification was proposed. This method used Logical Regression model to combine lexicon features, syntax features, semantics features and structure features extracted from suspicious text fragments pairs. Experiments show that the method is feasible and effective.
international conference for young computer scientists | 2015
Leilei Kong; Jie Li; Feng Zhao; Haoliang Qi; Zhongyuan Han; Yong Han; Zhimao Lu
The high-obfuscation plagiarism detection in big data environment, such as the paraphrasing and cross-language plagiarism, is often difficult for anti-plagiarism system because the plagiarism skills are becoming more and more complex. This paper proposes HawkEyes, a plagiarism detection system implemented based on the source retrieval and text alignment algorithms which developed for the international competition on plagiarism detection organized by CLEF. The text alignment algorism in HawkEyes gained the first place in PAN@CLEF2012. In the demonstration, we will present our system implemented on PAN@CLEF2014 training data corpus.
international conference for young computer scientists | 2015
Yong Han; Li Min; Yu Zou; Zhongyuan Han; Song Li; Leilei Kong; Haoliang Qi; Wenhao Qiao; Shuo Cui; Hong Deng
Lyrics retrieval is one of the frequently-used retrieval functions of search engines. However, diversified information requirements are neglected in the existing lyrics retrieval systems. A lyrics retrieval system named LRC Sousou, in which erroneous characters are corrected automatically, the mixed queries of Chinese words and Pinyin are supported, and English phonemes queries are also achieved effectively, is introduced in this paper. The technologies of natural language processing, information retrieval and machine learning algorithm are applied to our lyrics retrieval system which enhance the practicability and efficiency of lyrics search, and improve user experience.
Archive | 2014
Leilei Kong; Zhimao Lu; Haoliang Qi; Zhongyuan Han
International journal of database theory and application | 2016
Leilei Kong; Zicheng Zhao; Zhimao Lu; Haoliang Qi; Feng Zhao
International Journal of Grid and Distributed Computing | 2016
Zhongyuan Han; Muyun Yang; Leilei Kong; Haoliang Qi; Sheng Li
Chinese Journal of Electronics | 2016
Muyun Yang; Sheng Li; Leilei Kong; Zhongyuan Han; Haoliang Qi
International journal of database theory and application | 2015
Zhongyuan Han; Muyun Yang; Leilei Kong; Haoliang Qi; Sheng Li