Jeannette C. M. Janssen

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeannette C. M. Janssen is active.

Explore More

Publication

Featured researches published by Jeannette C. M. Janssen.

Knowledge and Information Systems | 2004

Characterizing and Mining the Citation Graph of the Computer Science Literature

Yuan An; Jeannette C. M. Janssen; Evangelos E. Milios

Citation graphs representing a body of scientific literature convey measures of scholarly activity and productivity. In this work we present a study of the structure of the citation graph of the computer science literature. Using a web robot we built several topic-specific citation graphs and their union graph from the digital library ResearchIndex. After verifying that the degree distributions follow a power law, we applied a series of graph theoretical algorithms to elicit an aggregate picture of the citation graph in terms of its connectivity. We discovered the existence of a single large weakly-connected and a single large biconnected component, and confirmed the expected lack of a large strongly-connected component. The large components remained even after removing the strongest authority nodes or the strongest hub nodes, indicating that such tight connectivity is widespread and does not depend on a small subset of important nodes. Finally, minimum cuts between authority papers of different areas did not result in a balanced partitioning of the graph into areas, pointing to the need for more sophisticated algorithms for clustering the graph.

Journal of Algorithms | 2000

Distributed Online Frequency Assignment in Cellular Networks

Jeannette C. M. Janssen; Danny Krizanc; Lata Narayanan; Sunil M. Shende

A cellular network is generally modeled as a subgraph of the triangular lattice. The distributed online frequency assignment problem can be abstracted as a multicoloring problem on a weighted graph, where the weight vector associated with the vertices models the number of calls to be served at the vertices and is assumed to change over time. In this paper, we develop a framework for studying distributed online frequency assignment in cellular networks. We present the first distributed online algorithms for this problem with proven bounds on their competitive ratios. We show a series of algorithms that use at each vertex information about increasingly larger neighborhoods of the vertex, and that achieve better competitive ratios. In contrast, we show lower bounds on the competitive ratios of some natural classes of online algorithms.

Knowledge and Information Systems | 2006

Node similarity in the citation graph

Wangzhong Lu; Jeannette C. M. Janssen; Evangelos E. Milios; Nathalie Japkowicz; Yongzheng Zhang

Published scientific articles are linked together into a graph, the citation graph, through their citations. This paper explores the notion of similarity based on connectivity alone, and proposes several algorithms to quantify it. Our metrics take advantage of the local neighborhoods of the nodes in the citation graph. Two variants of link-based similarity estimation between two nodes are described, one based on the separate local neighborhoods of the nodes, and another based on the joint local neighborhood expanded from both nodes at the same time. The algorithms are implemented and evaluated on a subgraph of the citation graph of computer science in a retrieval context. The results are compared with text-based similarity, and demonstrate the complementarity of link-based and text-based retrieval.

workshop on algorithms and models for the web graph | 2007

A spatial web graph model with local influence regions

William Aiello; Anthony Bonato; Colin Cooper; Jeannette C. M. Janssen; Pawel Pralat

The web graph may be considered as embedded in a topic space, with a metric that expresses the extent to which web pages are related to each other. Using this assumption, we present a new model for the web and other complex networks, based on a spatial embedding of the nodes, called the Spatial Preferred Attachment (SPA) model. In the SPA model, nodes have influence regions of varying size, and new nodes may only link to a node if they fall within its influence region. We prove that our model gives a power law in-degree distribution, with exponent in (2, ∞) depending on the parameters, and with concentration for a wide range of in-degree values. We also show that the model allows for edges that span a large distance in the underlying space, modelling a feature often observed in real-world complex networks.

web information and data management | 2004

Probabilistic models for focused web crawling

Hongyu Liu; Evangelos E. Milios; Jeannette C. M. Janssen

A Focused crawler must use information gleaned from previously crawled page sequences to estimate the relevance of a newly seen URL. Therefore, good performance depends on powerful modelling of context as well as the current observations. Probabilistic models, such as Hidden Markov Models(HMMs) and Conditional Random Fields(CRFs), can potentially capture both formatting and context. In this paper, we present the use of HMM for focused web crawling, and compare it with Best-First strategy. Furthermore, we discuss the concept of using CRFs to overcome the difficulties with HMMs and support the use of many, arbitrary and overlapping features. Finally, we describe a design of a system applying CRFs for focused web crawling, that is currently being implemented.

Internet Mathematics | 2012

Model Selection for Social Networks Using Graphlets

Jeannette C. M. Janssen; Matt Hurshman; Nauzer Kalyaniwalla

Several network models have been proposed to explain the link structure observed in online social networks. This paper addresses the problem of choosing the model that best fits a given real-world network. We implement a model-selection method based on unsupervised learning. An alternating decision tree is trained using synthetic graphs generated according to each of the models under consideration. We use a broad array of features, with the aim of representing different structural aspects of the network. Features include the frequency counts of small subgraphs (graphlets) as well as features capturing the degree distribution and small-world property. Our method correctly classifies synthetic graphs, and is robust under perturbations of the graphs. We show that the graphlet counts alone are sufficient in separating the training data, indicating that graphlet counts are a good way of capturing network structure. We tested our approach on four Facebook graphs from various American universities. The models that best fit these data are those that are based on the principle of preferential attachment.

Bulletin of the American Mathematical Society | 1993