Conghui Zhu
Harbin Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Conghui Zhu.
fuzzy systems and knowledge discovery | 2011
Lexiao Tian; Dequan Zheng; Conghui Zhu
As more and more text-image co-occurrence data become available on the web, mining on those data is playing an increasingly important role in web applications. In this paper, we consider utilizing description information to help image classification and propose a novel image classification method focusing on text-image co-occurrence data. In general, there are three main steps in our system: feature extraction, training classifiers and classifier fusion. In feature extraction phase, several features are extracted including not only visual features such as color, shape, texture, but also text features. In the process of training classifiers, visual and text classifiers are trained separately with SVM model. Finally, Weight learning is used to build the classifier fusion system. Comparing with other methods, we make full use of unstructured texts around images and filter text features through information gain, also efficient combination of features is achieved by comparing different combination methods. Experimental results show that our method is efficient and enhances the accuracy of image classification.
fuzzy systems and knowledge discovery | 2011
Mo Yu; Shu Wang; Conghui Zhu; Tiejun Zhao
The Application of word sense disambiguation (WSD) methods based on supervised machine learning are limited by the difficulties in defining sense tags and acquiring labeled data for training. In this paper, the two problems of WSD are solved in a semi-supervised learning framework with the help of parallel corpora. The sense tags are defined automatically according to the results of word alignment on the parallel corpora. And label propagation, a graph-based semi-supervised algorithm, is employed. The experiments show that our method achieves great improvement on Chinese WSD tasks and the performances get significant growth when the scale of monolingual sentences is increasing.
international conference on natural computation | 2016
Xiaoxue Wang; Conghui Zhu; Sheng Li; Tiejun Zhao; Dequan Zheng
Statistical machine translation (SMT) plays more and more important role now. The performance of the SMT is largely dependent on the size and quality of training data. But the demands for translation is rich, how to make the best of limited in-domain data to satisfy the needs of translation coming from different domains is one of the hot focus in current SMT. Domain adaption aims to obviously improve the specific-domain performance by bringing much out-of-domain parallel corpus at the absence of in-domain parallel corpus. Domain adaption is one of the keys to get the SMT into practical application. This paper introduces mainstream methods of domain adaption for SMT, compares advantages and disadvantages of representative methods based on the result of the same data and shows personal views about the possible future direction of domain adaption for SMT.
applications of natural language to data bases | 2013
Yiming Cui; Conghui Zhu; Xiaoning Zhu; Tiejun Zhao; Dequan Zheng
As the parallel corpus is not available all the time, pivot language was introduced to solve the parallel corpus sparseness in statistical machine translation. In this paper, we carried out several phrase-based SMT experiments, and analyzed the detailed reasons that caused the decline in translation performance. Experimental results indicated that both covering rate of phrase pairs and translation probability accuracy affect the quality of translation.
computational intelligence and security | 2012
Chunyan Liu; Conghui Zhu; Tiejun Zhao; Dequan Zheng
Online social media has become one of the most important ways people communicate, while how to find valuable information from huge amounts of data becomes a key problem. We present a novel topic extraction method that employs topic value of each words and social model attributes as additional features based on the multi-document summarization. The experimental results show that the multi-document summarization with the topic and the sociality are helpful to extract topics from social media.
fuzzy systems and knowledge discovery | 2011
Xiaochun He; Conghui Zhu; Tiejun Zhao
The unique characteristic of short text makes short text classification quite different from traditional long text processing. The feature space of short text is so sparse, which makes it notoriously difficult to extract sufficient and effective features. In this paper, aiming to classify the short text on web forum accurately, a novel short-text-processing method based on semantic extension is introduced to enhance the content of the original short text, which effectively solves the problem of feature sparse. In addition, we put forward the concept of Key-Pattern (KP) and propose a new text feature representation approach based on KP, which extracts phrase with powerful semantic information as the text features. Traditional classifier model are applied to estimate the texts classification, experimental results show that the proposed method is effective to improve the accuracy and recall of short text classification.
artificial intelligence and computational intelligence | 2011
Bing Xu; Tiejun Zhao; Jian-Wei Wu; Conghui Zhu
With the rapid development of Internet and E-commerce, the quantity of product reviews on the web grows very fast, but the review quality is inconsistent. This paper addresses the problem of automatically ranking reviews. A specification for judging the reviews quality is first defined and thus ranking review is formalized as ordinal regression problem. In this paper, we employ Ranking SVM as the ordinal regression model. To improve system performance, we capture many important features, including structural features, syntactic features and semantic features. Experimental results indicate that Ranking SVM can obviously outperform baseline methods. For the identification of lowquality reviews, the Ranking SVM model is more effective than SVM regression model. Experimental results also show that the unigrams, adjectives and product features are more effective features for modeling.
Journal of intelligent systems | 2013
Lexiao Tian; Dequan Zheng; Conghui Zhu
With more and more text‐image co‐occurrence data becoming available on the Web, we are interested in how text especially Chinese context around images can aid image classification. The goal is to construct a classification system for images, and we used the context of the images to improve the classification system. First, we extracted three kinds of features, including global visual features, local visual features, and text features using both the image content and context. Then, we tried various feature combination methods and train classifiers for each kind of feature vector. Finally, we used a classifier fusion strategy based on weight learning, combining classifier outputs together, and we obtained the category of unlabeled images. In our experiments on the data set extracted from Google Image Search, we demonstrated the benefit of using context to help image classification. By comparing different feature combination methods on our feature set, we adopted the most effective one. Meanwhile, the classifier fusion approach improves the classification accuracy.
international conference on natural computation | 2016
Shanshan Zhao; Yuqing Zheng; Conghui Zhu; Tiejun Zhao; Sheng Li
In this paper, we develop a question answering system for solving single-option geography questions. The system is built in two directions. One computes semantic similarity between two questions. The other converts the task into question sentence binary-classification by generating the distributed representation of sentence semantic. When computing semantic similarity, we first implement a basic framework based on bag-of-words (BOW), and then extend the framework to Edit Distance variant and BM25 variant. On the other hand, we use convolutional neural network and stacked denoising auto-encoder to generate the distributed representation of sentence semantic respectively. Given the semantic representation of sentence, a logistic regression classifier is employed to classify the sentence. The dataset we use is a large scale Chinese college entrance examination question set of geography, which is clawed from the internet. Experiment results show that the performance of CNN can answer the single-option geography questions with high accuracy, which can achieve 0.7310.
Archive | 2018
Jingyi Ma; Muyun Yang; Haoyong Wang; Conghui Zhu; Bing Xu
Corpus has played an important role in most of research fields, especially in natural language processing. Some research demos provided detailed corpus content to highlight the contribution they have made, while overlook the security of corpus. In this paper, we explore content leakage resulted from the content display through a crawler. A website for displaying corpus is selected to be crawled by a simply crawler algorithm with some strategies we present. It is estimated that over 85% of the corpus can be downloaded, which means a substantial threaten to its IP right. Finally, we discuss the protection measures for content display, and give some valid suggestions for information content protection in technology and law.