Is this you? Create Your Porfile

Conghui Zhu

Harbin Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Conghui Zhu is active.

Explore More

Publication

Featured researches published by Conghui Zhu.

fuzzy systems and knowledge discovery | 2011

Research on image classification based on a combination of text and visual features

Lexiao Tian; Dequan Zheng; Conghui Zhu

As more and more text-image co-occurrence data become available on the web, mining on those data is playing an increasingly important role in web applications. In this paper, we consider utilizing description information to help image classification and propose a novel image classification method focusing on text-image co-occurrence data. In general, there are three main steps in our system: feature extraction, training classifiers and classifier fusion. In feature extraction phase, several features are extracted including not only visual features such as color, shape, texture, but also text features. In the process of training classifiers, visual and text classifiers are trained separately with SVM model. Finally, Weight learning is used to build the classifier fusion system. Comparing with other methods, we make full use of unstructured texts around images and filter text features through information gain, also efficient combination of features is achieved by comparing different combination methods. Experimental results show that our method is efficient and enhances the accuracy of image classification.

fuzzy systems and knowledge discovery | 2011

Semi-supervised learning for word sense disambiguation using parallel corpora

Mo Yu; Shu Wang; Conghui Zhu; Tiejun Zhao

The Application of word sense disambiguation (WSD) methods based on supervised machine learning are limited by the difficulties in defining sense tags and acquiring labeled data for training. In this paper, the two problems of WSD are solved in a semi-supervised learning framework with the help of parallel corpora. The sense tags are defined automatically according to the results of word alignment on the parallel corpora. And label propagation, a graph-based semi-supervised algorithm, is employed. The experiments show that our method achieves great improvement on Chinese WSD tasks and the performances get significant growth when the scale of monolingual sentences is increasing.

international conference on natural computation | 2016

Domain adaptation for statistical machine translation

Xiaoxue Wang; Conghui Zhu; Sheng Li; Tiejun Zhao; Dequan Zheng

Statistical machine translation (SMT) plays more and more important role now. The performance of the SMT is largely dependent on the size and quality of training data. But the demands for translation is rich, how to make the best of limited in-domain data to satisfy the needs of translation coming from different domains is one of the hot focus in current SMT. Domain adaption aims to obviously improve the specific-domain performance by bringing much out-of-domain parallel corpus at the absence of in-domain parallel corpus. Domain adaption is one of the keys to get the SMT into practical application. This paper introduces mainstream methods of domain adaption for SMT, compares advantages and disadvantages of representative methods based on the result of the same data and shows personal views about the possible future direction of domain adaption for SMT.

applications of natural language to data bases | 2013

Phrase Table Combination Deficiency Analyses in Pivot-Based SMT

Yiming Cui; Conghui Zhu; Xiaoning Zhu; Tiejun Zhao; Dequan Zheng

As the parallel corpus is not available all the time, pivot language was introduced to solve the parallel corpus sparseness in statistical machine translation. In this paper, we carried out several phrase-based SMT experiments, and analyzed the detailed reasons that caused the decline in translation performance. Experimental results indicated that both covering rate of phrase pairs and translation probability accuracy affect the quality of translation.

computational intelligence and security | 2012

Extracting Main Content of a Topic on Online Social Network by Multi-document Summarization

Chunyan Liu; Conghui Zhu; Tiejun Zhao; Dequan Zheng

Online social media has become one of the most important ways people communicate, while how to find valuable information from huge amounts of data becomes a key problem. We present a novel topic extraction method that employs topic value of each words and social model attributes as additional features based on the multi-document summarization. The experimental results show that the multi-document summarization with the topic and the sociality are helpful to extract topics from social media.

fuzzy systems and knowledge discovery | 2011

Research on short text classification for web forum

Xiaochun He; Conghui Zhu; Tiejun Zhao

The unique characteristic of short text makes short text classification quite different from traditional long text processing. The feature space of short text is so sparse, which makes it notoriously difficult to extract sufficient and effective features. In this paper, aiming to classify the short text on web forum accurately, a novel short-text-processing method based on semantic extension is introduced to enhance the content of the original short text, which effectively solves the problem of feature sparse. In addition, we put forward the concept of Key-Pattern (KP) and propose a new text feature representation approach based on KP, which extracts phrase with powerful semantic information as the text features. Traditional classifier model are applied to estimate the texts classification, experimental results show that the proposed method is effective to improve the accuracy and recall of short text classification.

artificial intelligence and computational intelligence | 2011

Automatically ranking reviews based on the ordinal regression model

Bing Xu; Tiejun Zhao; Jian-Wei Wu; Conghui Zhu

With the rapid development of Internet and E-commerce, the quantity of product reviews on the web grows very fast, but the review quality is inconsistent. This paper addresses the problem of automatically ranking reviews. A specification for judging the reviews quality is first defined and thus ranking review is formalized as ordinal regression problem. In this paper, we employ Ranking SVM as the ordinal regression model. To improve system performance, we capture many important features, including structural features, syntactic features and semantic features. Experimental results indicate that Ranking SVM can obviously outperform baseline methods. For the identification of lowquality reviews, the Ranking SVM model is more effective than SVM regression model. Experimental results also show that the unigrams, adjectives and product features are more effective features for modeling.

Journal of intelligent systems | 2013

Image Classification Based on the Combination of Text Features and Visual Features

Lexiao Tian; Dequan Zheng; Conghui Zhu

With more and more text‐image co‐occurrence data becoming available on the Web, we are interested in how text especially Chinese context around images can aid image classification. The goal is to construct a classification system for images, and we used the context of the images to improve the classification system. First, we extracted three kinds of features, including global visual features, local visual features, and text features using both the image content and context. Then, we tried various feature combination methods and train classifiers for each kind of feature vector. Finally, we used a classifier fusion strategy based on weight learning, combining classifier outputs together, and we obtained the category of unlabeled images. In our experiments on the data set extracted from Google Image Search, we demonstrated the benefit of using context to help image classification. By comparing different feature combination methods on our feature set, we adopted the most effective one. Meanwhile, the classifier fusion approach improves the classification accuracy.

international conference on natural computation | 2016

Semantic computation in geography question answering

Shanshan Zhao; Yuqing Zheng; Conghui Zhu; Tiejun Zhao; Sheng Li

In this paper, we develop a question answering system for solving single-option geography questions. The system is built in two directions. One computes semantic similarity between two questions. The other converts the task into question sentence binary-classification by generating the distributed representation of sentence semantic. When computing semantic similarity, we first implement a basic framework based on bag-of-words (BOW), and then extend the framework to Edit Distance variant and BM25 variant. On the other hand, we use convolutional neural network and stacked denoising auto-encoder to generate the distributed representation of sentence semantic respectively. Given the semantic representation of sentence, a logistic regression classifier is employed to classify the sentence. The dataset we use is a large scale Chinese college entrance examination question set of geography, which is clawed from the internet. Experiment results show that the performance of CNN can answer the single-option geography questions with high accuracy, which can achieve 0.7310.

Archive | 2018

A Study on Corpus Content Display and IP Protection

Jingyi Ma; Muyun Yang; Haoyong Wang; Conghui Zhu; Bing Xu

Corpus has played an important role in most of research fields, especially in natural language processing. Some research demos provided detailed corpus content to highlight the contribution they have made, while overlook the security of corpus. In this paper, we explore content leakage resulted from the content display through a crawler. A website for displaying corpus is selected to be crawled by a simply crawler algorithm with some strategies we present. It is estimated that over 85% of the corpus can be downloaded, which means a substantial threaten to its IP right. Finally, we discuss the protection measures for content display, and give some valid suggestions for information content protection in technology and law.

Explore More