Hai-Tao Zheng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hai-Tao Zheng is active.

Explore More

Publication

Featured researches published by Hai-Tao Zheng.

Information Sciences | 2014

A semantic similarity measure based on information distance for ontology alignment

Yong Jiang; Xinmin Wang; Hai-Tao Zheng

Abstract Ontology alignment is the key point to reach interoperability over ontologies. In semantic web environment, ontologies are usually distributed and heterogeneous and thus it is necessary to find the alignment between them before processing across them. Many efforts have been conducted to automate the alignment by discovering the correspondence between entities of ontologies. However, some problems are still obvious, and the most crucial one is that it is almost impossible to extract semantic meaning of a lexical label that denotes the entity by traditional methods. In this paper, ontology alignment is formalized as a problem of information distance metric. In this way, discovery of optimal alignment is cast as finding out the correspondences with minimal information distance. We demonstrate a novel measure named link weight that uses semantic characteristics of two entities and Google page count to calculate an information distance similarity between them. The experimental results show that our method is able to create alignments between different lexical entities that denotes the same ones. These results outperform the typical ontology alignment methods like PROMPT (Noy and Musen, 2000) [38], QOM (Ehrig and Staab, 2004) [12], and APFEL (Ehrig et al., 2005) [13] in terms of semantic precision and recall.

IEEE Transactions on Signal Processing | 2015

Deterministic Constructions of Binary Measurement Matrices From Finite Geometry

Shu-Tao Xia; Xin-Ji Liu; Yong Jiang; Hai-Tao Zheng

Deterministic constructions of measurement matrices in compressed sensing (CS) are considered in this paper. The constructions are inspired by the recent discovery of Dimakis, Smarandache and Vontobel which says that parity-check matrices of good low-density parity-check (LDPC) codes can be used as provably good measurement matrices for compressed sensing under l1-minimization. The performance of the proposed binary measurement matrices is mainly theoretically analyzed with the help of the analyzing methods and results from (finite geometry) LDPC codes. Particularly, several lower bounds of the spark (i.e., the smallest number of columns that are linearly dependent, which totally characterizes the recovery performance of l0-minimization) of general binary matrices and finite geometry matrices are obtained and they improve the previously known results in most cases. Simulation results show that the proposed matrices perform comparably to, sometimes even better than, the corresponding Gaussian random matrices. Moreover, the proposed matrices are sparse, binary, and most of them have cyclic or quasi-cyclic structure, which will make the hardware realization convenient and easy.

Information Sciences | 2012

An ontology-based approach to Chinese semantic advertising

Hai-Tao Zheng; Jin-Yuan Chen; Yong Jiang

In the web advertising domain, contextual advertising and sponsored search are two of the main advertising channels used to display related advertisements on web pages. A major challenge for contextual advertising is to match advertisements and web pages based on their semantics. When a web page and its semantically related advertisements contain many different words, the performance of the traditional methods can be very poor. In particular, there are few studies presented for Chinese contextual advertising that are based on semantics. To address these issues, we propose an ontology-based approach to Chinese semantic advertising. We utilize an ontology called the Taobao Ontology and populate it by automatically adding related phrases as instances. The ontology is used to match web pages and advertisements on a conceptual level. Based on the Taobao Ontology, the proposed method exploits seven distance functions to measure the similarities between concepts and web pages or advertisements. Then, the similarities between web pages and advertisements are calculated by considering the ontology-based similarities as well as term-based similarities. The empirical experiments indicate that our method is able to match Chinese web pages and advertisements with a relatively high accuracy. Among the seven distance functions, Cosine distance and Tanimoto distance show the best performance in terms of precision, recall, and F-measure. In addition, our method outperforms two contextual advertising methods, i.e., the impedance coupling method and the SVM-based method.

Multimedia Tools and Applications | 2018

Weakly-supervised image captioning based on rich contextual information

Hai-Tao Zheng; Zhe Wang; Ningning Ma; Jin-Yuan Chen; Xi Xiao; Arun Kumar Sangaiah

Automatically generation of an image description is a challenging task which attracts broad attention in artificial intelligence. Inspired by methods of computer vision and natural language processing, different approaches have been proposed to solve the problem. However, captions generated by the existing approaches have been lack of enough contextual information to describe the corresponding images completely. The labeled captions in the training set only basically describe images and lack of enough contextual annotations. In this paper, we propose a Weakly-supervised Image Captioning Approach (WICA) to generate captions containing rich contextual information, without complete annotations for the contextual information in datasets. We utilize encoder-decoder neural networks to extract basic captioning features and leverage object detection networks to identify contextual features. Then, we encode the two levels of features by a phrase-based language model in order to generate captions with rich contextual information. The comprehensive experimental results reveal that proposed model outperforms the existing baselines in terms of on the richness and reasonability of contextual information for image captioning.

web age information management | 2015

Exploiting Conceptual Relations of Sentences for Multi-document Summarization

Hai-Tao Zheng; Shu-Qin Gong; Ji-Min Guo; Wen-Zhen Wu

Multi-document Summarization becomes increasingly important in the age of big data. However, existing summarization systems do not or implicitly consider the conceptual relations of sentences. In this paper, we propose a novel method called Multi-document Summarization based on Explicit Semantics of Sentences (MDSES), which explicitly take conceptual relations of sentences into consideration. It is composed of three components: sentence-concept graph construction, concept clustering and summary generation. We first obtain sentence-concept semantic relation to construct a sentence-concept graph. Then we run graph weighting algorithm to get ranked weighted sentences and concepts. Besides, we obtain concept-concept semantic relation for concepts clustering to eliminate redundancy. Finally, we conduct summary generation to get informative summary. Experimental results on DUC dataset using ROUGE metrics demonstrate the good effectiveness of our methods.

international conference on neural information processing | 2014

Multi-document Summarization Based on Sentence Clustering

Hai-Tao Zheng; Shu-Qin Gong; Hao Chen; Yong Jiang; Shu-Tao Xia

A main task of multi-document summarization is sentence selection. However, many of the existing approaches only select top ranked sentences without redundancy detection. In addition, some summarization approaches generate summaries with low redundancy but they are supervised. To address these issues, we propose a novel method named Redundancy Detection-based Multi-document Summarizer (RDMS). The proposed method first generates an informative sentence set, then applies sentence clustering to detect redundancy. After sentence clustering, we conduct cluster ranking, candidate selection, and representative selection to eliminate redundancy. RDMS is an unsupervised multi-document summarization system and the experimental results on DUC 2004 and DUC 2005 datasets indicate that the performance of RDMS is better than unsupervised systems and supervised systems in terms of ROUGE-1, ROUGE-L and ROUGE-SU.

web age information management | 2017

Boost Clickbait Detection Based on User Behavior Analysis

Hai-Tao Zheng; Xin Yao; Yong Jiang; Shu-Tao Xia; Xi Xiao

Article in the web is usually titled with a misleading title to attract the users click for gaining click-through rate (CTR). A clickbait title may increase click-through rate, but decrease user experience. Thus, it is important to identify the articles with a misleading title and block them for specific users. Existing methods just consider text features, which hardly produce a satisfactory result. User behavior is useful in clickbait detection. Users have different tendencies for the articles with a clickbait title. User actions in an article usually indicate whether an article is with a clickbait title. In this paper, we design an algorithm to model user behavior in order to improve the impact of clickbait detection. Specifically, we use a classifier to produce an initial clickbait-score for articles. Then, we define a loss function on the user behavior and tune the clickbait score toward decreasing the loss function. Experiment shows that we improve precision and recall after using user behavior.

asia-pacific web conference | 2015

Online Feature Selection Based on Passive-Aggressive Algorithm with Retaining Features

Hai-Tao Zheng; Haiyang Zhang

Feature selection is an important topic in data mining and machine learning, and has been extensively studied in many literature. Unlike traditional batch learning methods, online learning is more efficient for real-world applications. Most existing studies of online learning require accessing all the features of training instances, but in real world, it is often expensive to acquire the full set of attributes. In online feature selection process, when a training instance arrive, a fixed small number of features will be selected, and then the other features will be ignored. However, those ignored features may be useful and selected in later instances. If we only consider the new instances for these special features, it will lead to extreme errors. To address these issues, we improved a novel algorithm with Passive-Aggressive Algorithm and retaining features. Then we evaluate the performance of the proposed algorithms for online feature selection on several public datasets, and we can see from the experiments that our algorithm consistently surpassed the baseline algorithms for all the situations.

international symposium on computers and communications | 2013

A traffic localization strategy for peer-to-peer live streaming

Chao Dai; Yong Jiang; Shu-Tao Xia; Hai-Tao Zheng; Laizhong Cui

Current P2P applications are based on random connected overlays, which lead to generating a significant amount of inter-ISP traffic. Asking for more QoS requirements, such as a short delay and a stable streaming rate, few studies are dedicated to optimizing P2P live streaming applications, despite that recent work have proposed some solutions for P2P file distribution applications. In this paper, current traffic localization strategy is analyzed at first, and its two inherent flaws are revealed. Then we propose a novel strategy for ISP-friendly live streaming: based on a hybrid overlay, which is a two-tier structure, all ISPs are organized into an ISP-tree, and then local peers in each ISP form a mesh overlay. In each tier, a data scheduling is designed for inter-ISP traffic reduction and performance guarantee respectively. Compared with a famous live streaming strategy, R2, simulation results demonstrate that our strategy generates much less inter-ISP traffic and achieves a higher system performance.

advanced data mining and applications | 2013

Exploiting Multiple Features for Learning to Rank in Expert Finding

Hai-Tao Zheng; Qi Li; Yong Jiang; Shu-Tao Xia; Lanshan Zhang

Expert finding is the process of identifying experts given a particular topic. In this paper, we propose a method called Learning to Rank for Expert Finding LREF attempting to leverage learning to rank to improve the estimation for expert finding. Learning to rank is an established means of predicting ranking and has recently demonstrated high promise in information retrieval. LREF first defines representations for both topics and experts, and then collects the existing popular language models and basic document features to form feature vectors for learning purpose from the representations. Finally, LRER adopts RankSVM, a pair wise learning to rank algorithm, to generate the lists of experts for topics. Extensive experiments in comparison with the language models profile based model and document based model, which are state-of-the-art expert finding methods, show that LREF enhances expert finding accuracy.

Explore More