Ziming Zhuang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ziming Zhuang is active.

Explore More

Publication

Featured researches published by Ziming Zhuang.

international acm sigir conference on research and development in information retrieval | 2008

Real-time automatic tag recommendation

Yang Song; Ziming Zhuang; Huajing Li; Qiankun Zhao; Jia Li; Wang-Chien Lee; C. Lee Giles

Tags are user-generated labels for entities. Existing research on tag recommendation either focuses on improving its accuracy or on automating the process, while ignoring the efficiency issue. We propose a highly-automated novel framework for real-time tag recommendation. The tagged training documents are treated as triplets of (words, docs, tags), and represented in two bipartite graphs, which are partitioned into clusters by Spectral Recursive Embedding (SRE). Tags in each topical cluster are ranked by our novel ranking algorithm. A two-way Poisson Mixture Model (PMM) is proposed to model the document distribution into mixture components within each cluster and aggregate words into word clusters simultaneously. A new document is classified by the mixture model based on its posterior probabilities so that tags are recommended according to their ranks. Experiments on large-scale tagging datasets of scientific documents (CiteULike) and web pages del.icio.us) indicate that our framework is capable of making tag recommendation efficiently and effectively. The average tagging time for testing a document is around 1 second, with over 88% test documents correctly labeled with the top nine tags we suggested.

web search and data mining | 2008

Collaboration over time: characterizing and modeling network evolution

Jian Huang; Ziming Zhuang; Jia Li; C. Lee Giles

A formal type of scientific and academic collaboration is coauthorship which can be represented by a coauthorship network. Coauthorship networks are among some of the largest social networks and offer us the opportunity to study the mechanisms underlying large-scale real world networks. We construct such a network for the Computer Science field covering research collaborations from 1980 to 2005, based on a large dataset of 451,305 papers authored by 283,174 distinct researchers. By mining this network, we first present a comprehensive study of the network statistical properties for a longitudinal network at the overall network level as well as for the intermediate community level. Major observations are that the database community is the best connected while the AI community is the most assortative, and that the Computer Science field as a whole shows a collaboration pattern more similar to Mathematics than to Biology. Moreover, the small world phenomenon and the scale-free degree distribution accompany the growth of the network. To study the individual collaborations, we propose a novel stochastic model, Stochastic Poisson model with Optimization Tree (Spot)to efficiently predict any increment of collaboration based on the local neighborhood structure. Spot models the non-stationary Poisson process by maximizing the log-likelihood with a tree structure. Empirical results show that Spot outperforms Support Vector Regression by better fitting collaboration records and predicting the rate of collaboration

conference on information and knowledge management | 2006

Re-ranking search results using query logs

Ziming Zhuang; Silviu Cucerzan

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the users search intent, and fresh and relevant pages may not get high ranks for an underspecified query due to their freshness and to the large number of pages that match the query, despite the fact that a large number of users have searched for parts of their content recently. We propose a novel method, Q-Rank, to effectively refine the ranking of search results for any given query by constructing the query context from search query logs. Evaluation results show that Q-Rank gains a considerable advantage over the current ranking system of a large-scale commercial Web search engine, being able to improve the relevance of search results for 82% of the queries.

acm/ieee joint conference on digital libraries | 2007

Measuring conference quality by mining program committee characteristics

Ziming Zhuang; Ergin Elmacioglu; Dongwon Lee; C. Lee Giles

Bibliometrics are important measures for venue quality in digital libraries. Impacts of venues are usually the major consideration for subscription decision-making, and for ranking and recommending high-quality venues and documents. For digital libraries in the Computer Science literature domain, conferences play a major role as an important publication and dissemination outlet. However, with a recent profusion of conferences and rapidly expanding fields, it is increasingly challenging for researchers and librarians to assess the quality of conferences. We propose a set of novel heuristics to automatically discover prestigious (and low-quality) conferences by mining the characteristics of Program Committee members. We examine the proposed cues both in isolation and combination under a classification scheme. Evaluation on a collection of 2,979 conferences and 16,147 PC members shows that our heuristics, when combined, correctly classify about 92% of the conferences, with a low false positive rate of 0.035 and a recall of more than 73% for identifying reputable conferences. Furthermore, we demonstrate empirically that our heuristics can also effectively detect a set of low-quality conferences, with a false positive rate of merely 0.002. We also report our experience of detecting two previously unknown low-quality conferences. Finally, we apply the proposed techniques to the entire quality spectrum by ranking conferences in the collection.

international world wide web conferences | 2007

A large-scale study of robots.txt

Yang Sun; Ziming Zhuang; C. Lee Giles

Search engines largely rely on Web robots to collect information from the Web. Due to the unregulated open-access nature of the Web, robot activities are extremely diverse. Such crawling activities can be regulated from the server side by deploying the Robots Exclusion Protocol in a file called robots.txt. Although it is not an enforcement standard, ethical robots (and many commercial) will follow the rules specified in robots.txt. With our focused crawler, we investigate 7,593 websites from education, government, news, and business domains. Five crawls have been conducted in succession to study the temporal changes. Through statistical analysis of the data, we present a survey of the usage of Web robots rules at the Web scale. The results also show that the usage of robots.txt has increased over time.

international conference on web engineering | 2007

Efficiently detecting webpage updates using samples

Qingzhao Tan; Ziming Zhuang; Prasenjit Mitra; C. Lee Giles

Due to resource constraints, Web archiving systems and search engines usually have difficulties keeping the local repository completely synchronized with the Web. To address this problem, sampling-based techniques periodically poll a subset of webpages in the local repository to detect changes on the Web, and update the local copies accordingly. The goal of such an approach is to discover as many changed webpages as possible within the boundary of the available resources. In this paper we advance the state-of-art of the sampling-based techniques by answering a challenging question: Given a sampled webpage that has been updated, which other webpages are also likely to have changed? We propose a set of sampling policies with various downloading granularities, taking into account the link structure, the directory structure, and the content-based features. We also investigate the update history and the popularity of the webpages to adaptively model the download probability. We ran extensive experiments on a real web data set of about 300,000 distinct URLs distributed among 210 websites. The results showed that our sampling-based algorithm can detect about three times as many changed webpages as the baseline algorithm. It also showed that the changed webpages are most likely to be found in the same directory and the upper directories of the changed sample. By applying clustering algorithm on all the webpages, pages with similar change pattern are grouped together so that updated webpages can be found in the same cluster as the changed sample. Moreover, our adaptive downloading strategies significantly outperform the static ones in detecting changes for the popular webpages.

international world wide web conferences | 2007

Designing efficient sampling techniques to detect webpage updates

Qingzhao Tan; Ziming Zhuang; Prasenjit Mitra; C. Lee Giles

Due to resource constraints, Web archiving systems and search engines usually have difficulties keeping the entire local repository synchronized with the Web. We advance the state-of-art of the sampling-based synchronization techniques by answering a challenging question: Given a sampled webpage and its change status, which other webpages are also likely to change? We present a study of various downloading granularities and policies, and propose an adaptive model based on the update history and the popularity of the webpages. We run extensive experiments on a large dataset of approximately 300,000 webpages to demonstrate that it is most likely to find more updated webpages in the current or upper directories of the changed samples. Moreover, the adaptive strategies outperform the non-adaptive one in terms of detecting important changes.

european conference on principles of data mining and knowledge discovery | 2006

Network flow for collaborative ranking

Ziming Zhuang; Silviu Cucerzan; C. Lee Giles

In query based Web search, a significant percentage of user queries are underspecified, most likely by naive users. Collaborative ranking helps the naive user by exploiting the collective expertise. We present a novel algorithmic model inspired by the network flow theory, which constructs a search network based on search engine logs to describe the relationship between the relevant entities in search: queries, documents, and users. This formal model permits the theoretical investigation of the nature of collaborative ranking in more concrete terms, and the learning of the dependence relations among the different entities. FlowRank, an algorithm derived from this model through an analysis of empirical usage patterns, is implemented and evaluated. We empirically show its potential in experiments involving real-world user relevance ratings and a random sample of 1,334 documents and 100 queries from a popular document search engine. Definite improvements over two baseline ranking algorithms for approximately 47% of the queries are reported.

Archive | 2006