Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marina Danilevsky is active.

Publication


Featured researches published by Marina Danilevsky.


european conference on machine learning | 2010

Graph regularized transductive classification on heterogeneous information networks

Ming Ji; Yizhou Sun; Marina Danilevsky; Jiawei Han; Jing Gao

A heterogeneous information network is a network composed of multiple types of objects and links. Recently, it has been recognized that strongly-typed heterogeneous information networks are prevalent in the real world. Sometimes, label information is available for some objects. Learning from such labeled and unlabeled data via transductive classification can lead to good knowledge extraction of the hidden network structure. However, although classification on homogeneous networks has been studied for decades, classification on heterogeneous networks has not been explored until recently. In this paper, we consider the transductive classification problem on heterogeneous networked data which share a common topic. Only some objects in the given network are labeled, and we aim to predict labels for all types of the remaining objects. A novel graph-based regularization framework, GNetMine, is proposed to model the link structure in information networks with arbitrary network schema and arbitrary number of object/link types. Specifically, we explicitly respect the type differences by preserving consistency over each relation graph corresponding to each type of links separately. Efficient computational schemes are then introduced to solve the corresponding optimization problem. Experiments on the DBLP data set show that our algorithm significantly improves the classification accuracy over existing state-of-the-art methods.


knowledge discovery and data mining | 2011

Ranking-based classification of heterogeneous information networks

Ming Ji; Jiawei Han; Marina Danilevsky

It has been recently recognized that heterogeneous information networks composed of multiple types of nodes and links are prevalent in the real world. Both classification and ranking of the nodes (or data objects) in such networks are essential for network analysis. However, so far these approaches have generally been performed separately. In this paper, we combine ranking and classification in order to perform more accurate analysis of a heterogeneous information network. Our intuition is that highly ranked objects within a class should play more important roles in classification. On the other hand, class membership information is important for determining a quality ranking over a dataset. We believe it is therefore beneficial to integrate classification and ranking in a simultaneous, mutually enhancing process, and to this end, propose a novel ranking-based iterative classification framework, called RankClass. Specifically, we build a graph-based ranking model to iteratively compute the ranking distribution of the objects within each class. At each iteration, according to the current ranking results, the graph structure used in the ranking algorithm is adjusted so that the sub-network corresponding to the specific class is emphasized, while the rest of the network is weakened. As our experiments show, integrating ranking with classification not only generates more accurate classes than the state-of-art classification methods on networked data, but also provides meaningful ranking of objects within each class, serving as a more informative view of the data than traditional classification.


international conference on data mining | 2011

The Joint Inference of Topic Diffusion and Evolution in Social Communities

Cindy Xide Lin; Qiaozhu Mei; Jiawei Han; Yunliang Jiang; Marina Danilevsky

The prevalence of Web 2.0 techniques has led to the boom of various online communities, where topics spread ubiquitously among user-generated documents. Working together with this diffusion process is the evolution of topic content, where novel contents are introduced by documents which adopt the topic. Unlike explicit user behavior (e.g., buying a DVD), both the diffusion paths and the evolutionary process of a topic are implicit, making their discovery challenging. In this paper, we track the evolution of an arbitrary topic and reveal the latent diffusion paths of that topic in a social community. A novel and principled probabilistic model is proposed which casts our task as an joint inference problem, which considers textual documents, social influences, and topic evolution in a unified way. Specifically, a mixture model is introduced to model the generation of text according to the diffusion and the evolution of the topic, while the whole diffusion process is regularized with user-level social influences through a Gaussian Markov Random Field. Experiments on both synthetic data and real world data show that the discovery of topic diffusion and evolution benefits from this joint inference, and the probabilistic model we propose performs significantly better than existing methods.


knowledge discovery and data mining | 2013

A phrase mining framework for recursive construction of a topical hierarchy

Chi Wang; Marina Danilevsky; Nihit Desai; Yinan Zhang; Phuong Nguyen; Thrivikrama Taula; Jiawei Han

A high quality hierarchical organization of the concepts in a dataset at different levels of granularity has many valuable applications such as search, summarization, and content browsing. In this paper we propose an algorithm for recursively constructing a hierarchy of topics from a collection of content-representative documents. We characterize each topic in the hierarchy by an integrated ranked list of mixed-length phrases. Our mining framework is based on a phrase-centric view for clustering, extracting, and ranking topical phrases. Experiments with datasets from three different domains illustrate our ability to generate hierarchies of high quality topics represented by meaningful phrases.


siam international conference on data mining | 2014

Automatic construction and ranking of topical keyphrases on collections of short documents

Marina Danilevsky; Chi Wang; Nihit Desai; Xiang Ren; Jingyi Guo; Jiawei Han

We introduce a framework for topical keyphrase generation and ranking, based on the output of a topic model run on a collection of short documents. By shifting from the unigramcentric traditional methods of keyphrase extraction and ranking to a phrase-centric approach, we are able to directly compare and rank phrases of different lengths. Our method defines a function to rank topical keyphrases so that more highly ranked keyphrases are considered to be more representative phrases for that topic. We study the performance of our framework on multiple real world document collections, and also show that it is more scalable than comparable phrase-generating models.


Knowledge and Information Systems | 2015

Constructing topical hierarchies in heterogeneous information networks

Chi Wang; Jialu Liu; Nihit Desai; Marina Danilevsky; Jiawei Han

Many digital documentary data collections (e.g., scientific publications, enterprise reports, news articles, and social media) can be modeled as a heterogeneous information network, linking text with multiple types of entities. Constructing high-quality hierarchies that can represent topics at multiple granularities benefits tasks such as search, information browsing, and pattern mining. In this work, we present an algorithm for recursively constructing multi-typed topical hierarchies. Contrary to traditional text-based topic modeling, our approach handles both textual phrases and multiple types of entities by a newly designed clustering and ranking algorithm for heterogeneous network data, as well as mining and ranking topical patterns of different types. Our experiments on datasets from two different domains demonstrate that our algorithm yields high-quality, multi-typed topical hierarchies.


knowledge discovery and data mining | 2013

EventCube: multi-dimensional search and mining of structured and text data

Fangbo Tao; Kin Hou Lei; Jiawei Han; ChengXiang Zhai; Xiao Cheng; Marina Danilevsky; Nihit Desai; Bolin Ding; Jing Ge; Heng Ji; Rucha Kanade; Anne Kao; Qi Li; Yanen Li; Cindy Xide Lin; Jialu Liu; Nikunj C. Oza; Ashok N. Srivastava; Rodney Tjoelker; Chi Wang; Duo Zhang; Bo Zhao

A large portion of real world data is either text or structured (e.g., relational) data. Moreover, such data objects are often linked together (e.g., structured specification of products linking with the corresponding product descriptions and customer comments). Even for text data such as news data, typed entities can be extracted with entity extraction tools. The EventCube project constructs TextCube and TopicCube from interconnected structured and text data (or from text data via entity extraction and dimension building), and performs multidimensional search and analysis on such datasets, in an informative, powerful, and user-friendly manner. This proposed EventCube demo will show the power of the system not only on the originally designed ASRS (Aviation Safety Report System) data sets, but also on news datasets collected from multiple news agencies, and academic datasets constructed from the DBLP and web data. The system has high potential to be extended in many powerful ways and serve as a general platform for search, OLAP (online analytical processing) and data mining on integrated text and structured data. After the system demo in the conference, the system will be put on the web for public access and evaluation.


international conference on management of data | 2011

WINACS: construction and analysis of web-based computer science information networks

Tim Weninger; Marina Danilevsky; Fabio Fumarola; Joshua M. Hailpern; Jiawei Han; Thomas J. Johnston; Surya Kallumadi; Hyungsul Kim; Zhijin Li; David McCloskey; Yizhou Sun; Nathan E. TeGrotenhuis; Chi Wang; Xiao Yu

WINACS (Web-based Information Network Analysis for Computer Science) is a project that incorporates many recent, exciting developments in data sciences to construct a Web-based computer science information network and to discover, retrieve, rank, cluster, and analyze such an information network. With the rapid development of the Web, huge amounts of information are available in the form of Web documents, structures, and links. It has been a dream of the database and Web communities to harvest such information and reconcile the unstructured nature of the Web with the neat, semi-structured schemas of the database paradigm. Taking computer science as a dedicated domain, WINACS first discovers related Web entity structures, and then constructs a heterogeneous computer science information network in order to rank, cluster and analyze this network and support intelligent and analytical queries.


international conference on data mining | 2013

Constructing Topical Hierarchies in Heterogeneous Information Networks

Chi Wang; Marina Danilevsky; Jialu Liu; Nihit Desai; Heng Ji; Jiawei Han

A digital data collection (e.g., scientific publications, enterprise reports, news, and social media) can often be modeled as a heterogeneous information network, linking text with multiple types of entities. Constructing high-quality concept hierarchies that can represent topics at multiple granularities benefits tasks such as search, information browsing, and pattern mining. In this work we present an algorithm for recursively constructing multi-typed topical hierarchies. Contrary to traditional text-based topic modeling, our approach handles both textual phrases and multiple types of entities by a newly designed clustering and ranking algorithm for heterogeneous network data, as well as mining and ranking topical patterns of different types. Our experiments on datasets from two different domains demonstrate that our algorithm yields high quality, multi-typed topical hierarchies.


knowledge discovery and data mining | 2013

AMETHYST: a system for mining and exploring topical hierarchies of heterogeneous data

Marina Danilevsky; Chi Wang; Fangbo Tao; Son T. Nguyen; Gong Chen; Nihit Desai; Lidan Wang; Jiawei Han

In this demo we present AMETHYST, a system for exploring and analyzing a topical hierarchy constructed from a heterogeneous information network (HIN). HINs, composed of multiple types of entities and links are very common in the real world. Many have a text component, and thus can benefit from a high quality hierarchical organization of the topics in the network dataset. By organizing the topics into a hierarchy, AMETHYST helps understand search results in the context of an ontology, and explain entity relatedness at different granularities. The automatically constructed topical hierarchy reflects a domain-specific ontology, interacts with multiple types of linked entities, and can be tailored for both free text and OLAP queries.

Collaboration


Dive into the Marina Danilevsky's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yizhou Sun

University of California

View shared research outputs
Top Co-Authors

Avatar

Heng Ji

Rensselaer Polytechnic Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge