Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Leilei Kong is active.

Publication


Featured researches published by Leilei Kong.


Journal of Zhejiang University Science C | 2017

A machine learning approach to query generation in plagiarism source retrieval

Leilei Kong; Zhimao Lu; Haoliang Qi; Zhongyuan Han

Plagiarism source retrieval is the core task of plagiarism detection. It has become the standard for plagiarism detection to use the queries extracted from suspicious documents to retrieve the plagiarism sources. Generating queries from a suspicious document is one of the most important steps in plagiarism source retrieval. Heuristic-based query generation methods are widely used in the current research. Each heuristic-based method has its own advantages, and no one statistically outperforms the others on all suspicious document segments when generating queries for source retrieval. Further improvements on heuristic methods for source retrieval rely mainly on the experience of experts. This leads to difficulties in putting forward new heuristic methods that can overcome the shortcomings of the existing ones. This paper paves the way for a new statistical machine learning approach to select the best queries from the candidates. The statistical machine learning approach to query generation for source retrieval is formulated as a ranking framework. Specifically, it aims to achieve the optimal source retrieval performance for each suspicious document segment. The proposed method exploits learning to rank to generate queries from the candidates. To our knowledge, our work is the first research to apply machine learning methods to resolve the problem of query generation for source retrieval. To solve the essential problem of an absence of training data for learning to rank, the building of training samples for source retrieval is also conducted. We rigorously evaluate various aspects of the proposed method on the publicly available PAN source retrieval corpus. With respect to the established baselines, the experimental results show that applying our proposed query generation method based on machine learning yields statistically significant improvements over baselines in source retrieval effectiveness.


international conference for young computer scientists | 2016

Time-Based Microblog Search System

Zhongyuan Han; Wenhao Qiao; Shuo Cui; Leilei Kong

This demo shows a time-based microblog research system which developed based on the time profile to estimate the query model, the document model and rank function for microblog search. The system exploits the time profile to boost the performance of microblog search. A brief description of the time-based query model, time-based document model and time-based similarity score is introduced. The index strategy for temporal microblog search is described. Using TREC 2011 and TREC 2012 microblog retrieval collection, the examples of microblog search results are demonstrated.


international conference on computer science and network technology | 2015

High obfuscation plagiarism detection using multi-feature fusion based on Logical Regression model

Leilei Kong; Zhimao Lu; Haoliang Qi; Zhongyuan Han

The identification of high-obfuscation plagiarism seeds is one of the most difficult problems to be solved in plagiarism detection. Single feature type cannot identify the plagiarism seeds effectively because of the varied plagiarism methods used in high-obfuscation plagiarism. In this paper, a multi-features fusion method based on Logical Regression model for the high-obfuscation plagiarism seeds identification was proposed. This method used Logical Regression model to combine lexicon features, syntax features, semantics features and structure features extracted from suspicious text fragments pairs. Experiments show that the method is feasible and effective.


international conference for young computer scientists | 2015

HawkEyes Plagiarism Detection System

Leilei Kong; Jie Li; Feng Zhao; Haoliang Qi; Zhongyuan Han; Yong Han; Zhimao Lu

The high-obfuscation plagiarism detection in big data environment, such as the paraphrasing and cross-language plagiarism, is often difficult for anti-plagiarism system because the plagiarism skills are becoming more and more complex. This paper proposes HawkEyes, a plagiarism detection system implemented based on the source retrieval and text alignment algorithms which developed for the international competition on plagiarism detection organized by CLEF. The text alignment algorism in HawkEyes gained the first place in PAN@CLEF2012. In the demonstration, we will present our system implemented on PAN@CLEF2014 training data corpus.


international conference for young computer scientists | 2015

LRC Sousou: A Lyrics Retrieval System

Yong Han; Li Min; Yu Zou; Zhongyuan Han; Song Li; Leilei Kong; Haoliang Qi; Wenhao Qiao; Shuo Cui; Hong Deng

Lyrics retrieval is one of the frequently-used retrieval functions of search engines. However, diversified information requirements are neglected in the existing lyrics retrieval systems. A lyrics retrieval system named LRC Sousou, in which erroneous characters are corrected automatically, the mixed queries of Chinese words and Pinyin are supported, and English phonemes queries are also achieved effectively, is introduced in this paper. The technologies of natural language processing, information retrieval and machine learning algorithm are applied to our lyrics retrieval system which enhance the practicability and efficiency of lyrics search, and improve user experience.


Archive | 2014

Detecting High Obfuscation Plagiarism: Exploring Multi-Features Fusion via Machine Learning

Leilei Kong; Zhimao Lu; Haoliang Qi; Zhongyuan Han


International journal of database theory and application | 2016

A Method of Plagiarism Source Retrieval and Text Alignment Based on Relevance Ranking Model

Leilei Kong; Zicheng Zhao; Zhimao Lu; Haoliang Qi; Feng Zhao


International Journal of Grid and Distributed Computing | 2016

A Temporal Microblog Filtering Model

Zhongyuan Han; Muyun Yang; Leilei Kong; Haoliang Qi; Sheng Li


Chinese Journal of Electronics | 2016

A Hybrid Model for Microblog Real-Time Filtering

Muyun Yang; Sheng Li; Leilei Kong; Zhongyuan Han; Haoliang Qi


International journal of database theory and application | 2015

A Hyperlink-Extended Language Model for Microblog Retrieval

Zhongyuan Han; Muyun Yang; Leilei Kong; Haoliang Qi; Sheng Li

Collaboration


Dive into the Leilei Kong's collaboration.

Top Co-Authors

Avatar

Haoliang Qi

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Zhongyuan Han

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Zhimao Lu

Harbin Engineering University

View shared research outputs
Top Co-Authors

Avatar

Muyun Yang

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Sheng Li

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Song Li

Harbin Engineering University

View shared research outputs
Top Co-Authors

Avatar

Jie Li

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Liuyang Tian

Harbin Engineering University

View shared research outputs
Researchain Logo
Decentralizing Knowledge