Liu Wenyin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Liu Wenyin is active.

Explore More

Publication

Featured researches published by Liu Wenyin.

IEEE Transactions on Dependable and Secure Computing | 2006

Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD)

Anthony Y. Fu; Liu Wenyin; Xiaotie Deng

An effective approach to phishing Web page detection is proposed, which uses Earth movers distance (EMD) to measure Web page visual similarity. We first convert the involved Web pages into low resolution images and then use color and coordinate features to represent the image signatures. We use EMD to calculate the signature distances of the images of the Web pages. We train an EMD threshold vector for classifying a Web page as a phishing or a normal one. Large-scale experiments with 10,281 suspected Web pages are carried out to show high classification precision, phishing recall, and applicable time performance for online enterprise solution. We also compare our method with two others to manifest its advantage. We also built up a real system which is already used online and it has caught many real phishing cases

machine vision applications | 1997

A protocol for performance evaluation of line detection algorithms

Liu Wenyin; Dov Dori

Abstract.Accurate and efficient vectorization of line drawings is essential for any higher level processing in document analysis and recognition systems. In spite of the prevalence of vectorization and line detection methods, no standard for their performance evaluation protocol exists. We propose a protocol for evaluating both straight and circular line extraction to help compare, select, improve, and even design line detection algorithms to be incorporated into line drawing recognition and understanding systems. The protocol involves both positive and negative sets of indices, at pixel and vector levels. Time efficiency is also included in the protocol. The protocol may be extended to handle lines of any shape as well as other classes of graphic objects.

international world wide web conferences | 2005

Detection of phishing webpages based on visual similarity

Liu Wenyin; Guanglin Huang; Liu Xiaoyue; Zhang Min; Xiaotie Deng

An approach to detection of phishing webpages based on visual similarity is proposed, which can be utilized as a part of an enterprise solution for anti-phishing. A legitimate webpage owner can use this approach to search the Web for suspicious webpages which are visually similar to the true webpage. A webpage is reported as a phishing suspect if the visual similarity is higher than its corresponding preset threshold. Preliminary experiments show that the approach can successfully detect those phishing webpages for online use.

multimedia information retrieval | 2001

Automatic location of text in video frames

Xian-Sheng Hua; Xiang-Rong Chen; Liu Wenyin; HongJiang Zhang

A new automatic text location approach for videos is proposed. First of all, the corner points of the selected video frames are detected. After deleting some isolate corners, we merge the remaining corners to form candidate text regions. The regions are then decomposed vertically and horizontally using edge maps of the video frames to get candidate text lines. Finally, a text box verification step based on the feature derived from edge maps is taken to significantly reduce false alarms. Experimental results show that the new text location scheme proposed in this paper is accurate.

IEEE Transactions on Circuits and Systems for Video Technology | 2004

An automatic performance evaluation protocol for video text detection algorithms

Xian-Sheng Hua; Liu Wenyin; Hong-Jiang Zhang

Text presented in videos provides important supplemental information for video indexing and retrieval. Many efforts have been made for text detection in videos. However, there is still a lack of performance evaluation protocols for video text detection. In this paper, we propose an objective and comprehensive performance evaluation protocol for video text detection algorithms. The protocol includes a positive set and a negative set of indices at the textbox level, which evaluate the detection quality in terms of both location accuracy and fragmentation of the detected textboxes. In the protocol, we assign a detection difficulty (DD) level to each ground truth textbox. The performance indices can then be normalized with respect to the textbox DD level and are therefore tolerant to different ground-truth difficulties to a certain degree. We also assign a detectability index (DI) value to each ground-truth textbox. The overall detection rate is the DI-weighted average of the detection qualities of all ground-truth textboxes, which makes the detection rate more accurate to reveal the real performance. The automatic performance evaluation scheme has been applied to performance evaluation of a text detection approach to determine the best thresholds that can yield the best detection results. The protocol has also been employed to compare the performances of several text detection systems. Hence, we believe that the proposed protocol can be used to compare the performance of different video/image text detection algorithms/systems and can even help improve, select, and design new text detection methods.

international world wide web conferences | 2002

User Intention Modeling in Web Applications Using Data Mining

Zheng Chen; Fan Lin; Huan Liu; Yin Liu; Wei-Ying Ma; Liu Wenyin

The problem of inferring a users intentions in Machine–Human Interaction has been the key research issue for providing personalized experiences and services. In this paper, we propose novel approaches on modeling and inferring users actions in a computer. Two linguistic features – keyword and concept features – are extracted from the semantic context for intention modeling. Concept features are the conceptual generalization of keywords. Association rule mining is used to find the proper concept of corresponding keyword. A modified Naïve Bayes classifier is used in our intention modeling. Experimental results have shown that our proposed approach achieved 84% average accuracy in predicting users intention, which is close to the precision (92%) of human prediction.

web information and data management | 2002

Ranking user's relevance to a topic through link analysis on web logs

Jidong Wang; Zheng Chen; Li Tao; Wei-Ying Ma; Liu Wenyin

Computing the web-users relevance to a give topic is an important task for any personalization service on the Web. Since the interest and preference of a web-user are revealed in his Web browsing history, in this paper we develop a novel approach that utilizes Web logs to compute the relevance of a web-user to a given query. In contrast to traditional methods that are purely based on textual analysis, our approach calculates the web-users relevance through link analysis under a unified framework where the importance of web-pages and web-users mutually reinforce each other in an iterative way. The experimental results show that our approach has achieved 53 of accuracy when ranking the web-users relevance to a search topic.

World Wide Web | 2014

Building emotional dictionary for sentiment analysis of online news

Yanghui Rao; Jingsheng Lei; Liu Wenyin; Qing Li; Mingliang Chen

Sentiment analysis of online documents such as news articles, blogs and microblogs has received increasing attention in recent years. In this article, we propose an efficient algorithm and three pruning strategies to automatically build a word-level emotional dictionary for social emotion detection. In the dictionary, each word is associated with the distribution on a series of human emotions. In addition, a method based on topic modeling is proposed to construct a topic-level dictionary, where each topic is correlated with social emotions. Experiment on the real-world data sets has validated the effectiveness and reliability of the methods. Compared with other lexicons, the dictionary generated using our approach is language-independent, fine-grained, and volume-unlimited. The generated dictionary has a wide range of applications, including predicting the emotional distribution of news articles, identifying social emotions on certain entities and news events.

international conference on multimedia and expo | 2001

A video text detection and recognition system

Jie Xi; Xian-Sheng Hua; Xiang-Rong Chen; Liu Wenyin; HongJiang Zhang

In this paper, we propose a new system for text information extraction from news videos. First of all, a method that integrates text detecting and text tracking is developed to locate text areas in the key-frames (images), together with a scheme to evaluate the performance of this approach. To get better recognition results, we then enhance the quality of the detected text blocks by multi-frame averaging. Finally, we use an adaptive thresholding method to binarize the text blocks and recognize the text using an off-the-shelf OCR module. The detection and recognition rate of the proposed system are 94.7% and 67.5% respectively.

international acm sigir conference on research and development in information retrieval | 2003

Building a web thesaurus from web link structure

Zheng Chen; Shengping Liu; Liu Wenyin; Geguang Pu; Wei-Ying Ma

Thesaurus has been widely used in many applications, including information retrieval, natural language processing, and question answering. In this paper, we propose a novel approach to automatically constructing a domain-specific thesaurus from the Web using link structure information. The proposed approach is able to identify new terms and reflect the latest relationship between terms as the Web evolves. First, a set of high quality and representative websites of a specific domain is selected. After filtering out navigational links, link analysis is applied to each website to obtain its content structure. Finally, the thesaurus is constructed by merging the content structures of the selected websites. The experimental results on automatic query expansion based on our constructed thesaurus show 20% improvement in search precision compared to the baseline.

Explore More