C. Lee Giles | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where C. Lee Giles is active.

Explore More

Publication

Featured researches published by C. Lee Giles.

Nature | 1999

Accessibility of information on the web.

Steve Lawrence; C. Lee Giles

Search engines do not index sites equally, may not index new pages for months, and no engine indexes more than about 16% of the web. As the web becomes a major communications medium, the data on it must be made more accessible.

knowledge discovery and data mining | 2000

Efficient identification of Web communities

Gary William Flake; Steve Lawrence; C. Lee Giles

We de ne a communit y on the web as a set of sites that have more links (in either direction) to members of the community than to non-members. Members of such a community can be eAEciently iden ti ed in a maximum ow / minim um cut framework, where the source is composed of known members, and the sink consists of well-kno wn non-members. A focused crawler that crawls to a xed depth can approximate community membership by augmenting the graph induced by the cra wl with links to a virtual sink node.The effectiveness of the approximation algorithm is demonstrated with several crawl results that iden tify hubs, authorities, w eb rings, and other link topologies that are useful but not easily categorized. Applications of our approach include focused cra wlers and search engines, automatic population of portal categories, and improved ltering.

IEEE Computer | 1999

Digital libraries and autonomous citation indexing

Steve Lawrence; C. Lee Giles; Kurt D. Bollacker

The revolution the Web has brought to information dissemination is not so much due to the availability of data-huge amounts of information has long been available in libraries-but rather the improved efficiency of accessing (improved accessibility to) that information. The Web promises to make more scientific articles more easily available. By making the context of citations easily and quickly browsable, autonomous citation indexing can help to evaluate the importance of individual contributions more accurately and quickly. Digital libraries incorporating ACI can help organize scientific literature and may significantly improve the efficiency of dissemination and feedback. ACI may also help speed the transition to scholarly electronic publishing.

acm international conference on digital libraries | 1998

CiteSeer: an automatic citation indexing system

C. Lee Giles; Kurt D. Bollacker; Steve Lawrence

We present CiteSeer: an autonomous citation indexing system which indexes academic literature in electronic format (e.g. Postscript files on the Web). CiteSeer understands how to parse citations, identify citations to the same paper in different formats, and identify the context of citations in the body of articles. CiteSeer provides most of the advantages of traditional (manually constructed) citation indexes (e.g. the ISI citation indexes), including: literature retrieval by following citation links (e.g. by providing a list of papers that cite a given paper), the evaluation and ranking of papers, authors, journals, etc. based on the number of citations, and the identification of research trends. CiteSeer has many advantages over traditional citation indexes, including the ability to create more up-to-date databases which are not limited to a preselected set of journals or restricted by journal publication delays, completely autonomous operation with a corresponding reduction in cost, and powerful interactive browsing of the literature using the context of citations. Given a particular paper of interest, CiteSeer can display the context of how the paper is cited in subsequent publications. This context may contain a brief summary of the paper, another author’s response to the paper, or subsequent work which builds upon the original article. CiteSeer allows the location of papers by keyword search or by citation links. Papers related to a given paper can be located using common citation information or word vector similarity. CiteSeer will soon be available for public use.

Applied Optics | 1987

Learning, invariance, and generalization in high-order neural networks

C. Lee Giles; Tom Maxwell

High-order neural networks have been shown to have impressive computational, storage, and learning capabilities. This performance is because the order or structure of a high-order neural network can be tailored to the order or structure of a problem. Thus, a neural network designed for a particular class of problems becomes specialized but also very efficient in solving those problems. Furthermore, a priori knowledge, such as geometric invariances, can be encoded in high-order networks. Because this knowledge does not have to be learned, these networks are very efficient in solving problems that utilize this knowledge.

Proceedings of the National Academy of Sciences of the United States of America | 2002

Winners don't take all : Characterizing the competition for links on the web

David M. Pennock; Gary William Flake; Steve Lawrence; Eric J. Glover; C. Lee Giles

As a whole, the World Wide Web displays a striking “rich get richer” behavior, with a relatively small number of sites receiving a disproportionately large share of hyperlink references and traffic. However, hidden in this skewed global distribution, we discover a qualitatively different and considerably less biased link distribution among subcategories of pages—for example, among all university homepages or all newspaper homepages. Although the connectivity distribution over the entire web is close to a pure power law, we find that the distribution within specific categories is typically unimodal on a log scale, with the location of the mode, and thus the extent of the rich get richer phenomenon, varying across different categories. Similar distributions occur in many other naturally occurring networks, including research paper citations, movie actor collaborations, and United States power grid connections. A simple generative model, incorporating a mixture of preferential and uniform attachment, quantifies the degree to which the rich nodes grow richer, and how new (and poorly connected) nodes can compete. The model accurately accounts for the true connectivity distributions of category-specific web pages, the web as a whole, and other social networks.

adaptive agents and multi-agents systems | 1998

CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications

Kurt D. Bollacker; Steve Lawrence; C. Lee Giles

Research papers available on the World Wide Web (WWW or Web) areoften poorly organized, often exist in forms opaque to searchengines (e.g. Postscript), and increase in quantity daily.Significant amounts of time and effort are typically needed inorder to find interesting and relevant publications on the Web. Wehave developed a Web based information agent that assists the userin the process of performing a scientific literature search. Givena set of keywords, the agent uses Web search engines and heuristicsto locate and download papers. The papers are parsed in order toextract information features such as the abstract and individuallyidentified citations. The agents Web interface can be used to findrelevant papers in the database using keyword searches, or bynavigating the links between papers formed by the citations. Linksto both citing and cited publications can be followed. In additionto simple browsing and keyword searches, the agent can find paperswhich are similar to a given paper using word information and byanalyzing common citations made by the papers.

acm/ieee joint conference on digital libraries | 2004

Two supervised learning approaches for name disambiguation in author citations

Hui Han; C. Lee Giles; Hongyuan Zha; Cheng Li; Kostas Tsioutsiouliklis

Due to name abbreviations, identical names, name misspellings, and pseudonyms in publications or bibliographies (citations), an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, Web search, database integration, and may cause improper attribution to authors. We investigate two supervised learning approaches to disambiguate authors in the citations. One approach uses the naive Bayes probability model, a generative model; the other uses support vector machines (SVMs) [V. Vapnik (1995)] and the vector space representation of citations, a discriminative model. Both approaches utilize three types of citation attributes: coauthor names, the title of the paper, and the title of the journal or proceeding. We illustrate these two approaches on two types of data, one collected from the Web, mainly publication lists from homepages, the other collected from the DBLP citation databases.

Machine Learning | 2001

Noisy Time Series Prediction using Recurrent Neural Networks and Grammatical Inference

C. Lee Giles; Steve Lawrence; Ah Chung Tsoi

Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method proposed uses conversion into a symbolic representation with a self-organizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the directionof change for the next day with an error rate of 47.1%. The error rate reduces to around 40% when rejecting examples where the system has low confidence in its prediction. We show that the symbolic representation aids the extraction of symbolic knowledge from the trained recurrent neural networks in the form of deterministic finite state automata. These automata explain the operation of the system and are often relatively simple. Automata rules related to well known behavior such as tr end following and mean reversal are extracted.

international acm sigir conference on research and development in information retrieval | 2008

Real-time automatic tag recommendation

Yang Song; Ziming Zhuang; Huajing Li; Qiankun Zhao; Jia Li; Wang-Chien Lee; C. Lee Giles

Tags are user-generated labels for entities. Existing research on tag recommendation either focuses on improving its accuracy or on automating the process, while ignoring the efficiency issue. We propose a highly-automated novel framework for real-time tag recommendation. The tagged training documents are treated as triplets of (words, docs, tags), and represented in two bipartite graphs, which are partitioned into clusters by Spectral Recursive Embedding (SRE). Tags in each topical cluster are ranked by our novel ranking algorithm. A two-way Poisson Mixture Model (PMM) is proposed to model the document distribution into mixture components within each cluster and aggregate words into word clusters simultaneously. A new document is classified by the mixture model based on its posterior probabilities so that tags are recommended according to their ranks. Experiments on large-scale tagging datasets of scientific documents (CiteULike) and web pages del.icio.us) indicate that our framework is capable of making tag recommendation efficiently and effectively. The average tagging time for testing a document is around 1 second, with over 88% test documents correctly labeled with the top nine tags we suggested.

Explore More