Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sangeetha Kutty is active.

Publication


Featured researches published by Sangeetha Kutty.


international acm sigir conference on research and development in information retrieval | 2012

Report on INEX 2008

T. Beckers; Patrice Bellot; Gianluca Demartini; Ludovic Denoyer; C.M. de Vries; Antoine Doucet; Khairun Nisa Fachry; Norbert Fuhr; Patrick Gallinari; Shlomo Geva; Wei-Che Huang; Tereza Iofciu; Jaap Kamps; Gabriella Kazai; Marijn Koolen; Sangeetha Kutty; Monica Landoni; Miro Lehtonen; Véronique Moriceau; Richi Nayak; Ragnar Nordlie; Nils Pharo; Eric SanJuan; Ralf Schenkel; Xavier Tannier; Martin Theobald; James A. Thom; Andrew Trotman; A.P. de Vries

INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2008 evaluation campaign, which consisted of a wide range of tracks: Ad hoc, Book, Efficiency, Entity Ranking, Interactive, QA, Link the Wiki, and XML Mining.


document engineering | 2009

HCX: an efficient hybrid clustering approach for XML documents

Sangeetha Kutty; Richi Nayak; Yuefeng Li

This paper proposes a novel Hybrid Clustering approach for XML documents (HCX) that first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. The empirical analysis reveals that the proposed method is scalable and accurate.


Focused Access to XML Documents | 2008

Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach

Sangeetha Kutty; Tien Tran; Richi Nayak; Yuefeng Li

This paper presents the experimental study conducted over the INEX 2007 Document Mining Challenge corpus employing a frequent subtree-based incremental clustering approach. Using the structural information of the XML documents, the closed frequent subtrees are generated. A matrix is then developed representing the closed frequent subtree distribution in documents. This matrix is used to progressively cluster the XML documents. In spite of the large number of documents in INEX 2007 Wikipedia dataset, the proposed frequent subtree-based incremental clustering approach was successful in clustering the documents.


INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval | 2009

Overview of the INEX 2009 XML mining track: clustering and classification of XML documents

Richi Nayak; Christopher M. De Vries; Sangeetha Kutty; Shlomo Geva; Ludovic Denoyer; Patrick Gallinari

This report explains the objectives, datasets and evaluation criteria of both the clustering and classification tasks set in the INEX 2009 XML Mining track. The report also describes the approaches and results obtained by the different participants.


knowledge discovery and data mining | 2011

XML documents clustering using a tensor space model

Sangeetha Kutty; Richi Nayak; Yuefeng Li

The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information.


conference on information and knowledge management | 2009

XCFS: an XML documents clustering approach using both the structure and the content

Sangeetha Kutty; Richi Nayak; Yuefeng Li

This paper introduces a clustering approach, XML Clustering using Frequent Substructures (XCFS) that considers both the structural and the content information of XML documents in clustering. XCFS uses frequent substructures in the form of a novel representation, Closed Frequent Embedded (CFE) subtrees to constrain the content in the clustering process. The empirical analysis ascertains that XCFS can effectively cluster even very large XML datasets and outperforms other existing methods.


Advances in Focused Retrieval | 2009

Clustering XML Documents Using Frequent Subtrees

Sangeetha Kutty; Tien Tran; Richi Nayak; Yuefeng Li

This paper presents an experimental study conducted over the INEX 2008 Document Mining Challenge corpus using both the structure and the content of XML documents for clustering them. The concise common substructures known as the closed frequent subtrees are generated using the structural information of the XML documents. The closed frequent subtrees are then used to extract the constrained content from the documents. A matrix containing the term distribution of the documents in the dataset is developed using the extracted constrained content. The k-way clustering algorithm is applied to the matrix to obtain the required clusters. In spite of the large number of documents in the INEX 2008 Wikipedia dataset, the proposed frequent subtree-based clustering approach was successful in clustering the documents. This approach significantly reduces the dimensionality of the terms used for clustering without much loss in accuracy.


INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval | 2009

Utilising semantic tags in XML clustering

Sangeetha Kutty; Richi Nayak; Yuefeng Li

This paper presents an overview of the experiments conducted using Hybrid Clustering of XML documents using Constraints (HCXC) method for the clustering task in the INEX 2009 XML Mining track. This techique utilises frequent subtrees generated from the structure to extract the content for clustering the XML documents. It also presents the experimental study using several data representations such as the structure-only, content-only and using both the structure and the content of XML documents for the purpose of clustering them. Unlike previous years, this year the XML documents were marked up using the Wiki tags and contains categories derived by using the YAGO ontology. This paper also presents the results of studying the effect of these tags on XML clustering using the HCXC method.


Advances in Focused Retrieval | 2009

Utilizing the Structure and Content Information for XML Document Clustering

Tien Tran; Sangeetha Kutty; Richi Nayak

This paper reports on the experiments and results of a clustering approach used in the INEX 2008 document mining challenge. The clustering approach utilizes both the structure and content information of the Wikipedia XML document collection. A latent semantic kernel (LSK) is used to measure the semantic similarity between XML documents based on their content features. The construction of a latent semantic kernel involves the computing of singular vector decomposition (SVD). On a large feature space matrix, the computation of SVD is very expensive in terms of time and memory requirements. Thus in this clustering approach, the dimension of the document space of a term-document matrix is reduced before performing SVD. The document space reduction is based on the common structural information of the Wikipedia XML document collection. The proposed clustering approach has shown to be effective on the Wikipedia collection in the INEX 2008 document mining challenge.


international conference on data mining | 2010

XML Documents Clustering Using Tensor Space Model -- A Preliminary Study

Sangeetha Kutty; Richi Nayak; Yuefeng Li

A hierarchical structure is used to represent the content of the semi-structured documents such as XML and XHTML. The traditional Vector Space Model (VSM) is not sufficient to represent both the structure and the content of such web documents. Hence in this paper, we introduce a novel method of representing the XML documents in Tensor Space Model (TSM) and then utilize it for clustering. Empirical analysis shows that the proposed method is scalable for a real-life dataset as well as the factorized matrices produced from the proposed method helps to improve the quality of clusters due to the enriched document representation with both the structure and the content information.

Collaboration


Dive into the Sangeetha Kutty's collaboration.

Top Co-Authors

Avatar

Richi Nayak

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Yuefeng Li

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Lin Chen

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Tien Tran

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Shlomo Geva

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Christopher M. De Vries

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Debra Polson

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Grant Hamilton

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Gregory N. Hearn

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Jared Donovan

Queensland University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge