Sangeetha Kutty
Queensland University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sangeetha Kutty.
international acm sigir conference on research and development in information retrieval | 2012
T. Beckers; Patrice Bellot; Gianluca Demartini; Ludovic Denoyer; C.M. de Vries; Antoine Doucet; Khairun Nisa Fachry; Norbert Fuhr; Patrick Gallinari; Shlomo Geva; Wei-Che Huang; Tereza Iofciu; Jaap Kamps; Gabriella Kazai; Marijn Koolen; Sangeetha Kutty; Monica Landoni; Miro Lehtonen; Véronique Moriceau; Richi Nayak; Ragnar Nordlie; Nils Pharo; Eric SanJuan; Ralf Schenkel; Xavier Tannier; Martin Theobald; James A. Thom; Andrew Trotman; A.P. de Vries
INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2008 evaluation campaign, which consisted of a wide range of tracks: Ad hoc, Book, Efficiency, Entity Ranking, Interactive, QA, Link the Wiki, and XML Mining.
document engineering | 2009
Sangeetha Kutty; Richi Nayak; Yuefeng Li
This paper proposes a novel Hybrid Clustering approach for XML documents (HCX) that first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. The empirical analysis reveals that the proposed method is scalable and accurate.
Focused Access to XML Documents | 2008
Sangeetha Kutty; Tien Tran; Richi Nayak; Yuefeng Li
This paper presents the experimental study conducted over the INEX 2007 Document Mining Challenge corpus employing a frequent subtree-based incremental clustering approach. Using the structural information of the XML documents, the closed frequent subtrees are generated. A matrix is then developed representing the closed frequent subtree distribution in documents. This matrix is used to progressively cluster the XML documents. In spite of the large number of documents in INEX 2007 Wikipedia dataset, the proposed frequent subtree-based incremental clustering approach was successful in clustering the documents.
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval | 2009
Richi Nayak; Christopher M. De Vries; Sangeetha Kutty; Shlomo Geva; Ludovic Denoyer; Patrick Gallinari
This report explains the objectives, datasets and evaluation criteria of both the clustering and classification tasks set in the INEX 2009 XML Mining track. The report also describes the approaches and results obtained by the different participants.
knowledge discovery and data mining | 2011
Sangeetha Kutty; Richi Nayak; Yuefeng Li
The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information.
conference on information and knowledge management | 2009
Sangeetha Kutty; Richi Nayak; Yuefeng Li
This paper introduces a clustering approach, XML Clustering using Frequent Substructures (XCFS) that considers both the structural and the content information of XML documents in clustering. XCFS uses frequent substructures in the form of a novel representation, Closed Frequent Embedded (CFE) subtrees to constrain the content in the clustering process. The empirical analysis ascertains that XCFS can effectively cluster even very large XML datasets and outperforms other existing methods.
Advances in Focused Retrieval | 2009
Sangeetha Kutty; Tien Tran; Richi Nayak; Yuefeng Li
This paper presents an experimental study conducted over the INEX 2008 Document Mining Challenge corpus using both the structure and the content of XML documents for clustering them. The concise common substructures known as the closed frequent subtrees are generated using the structural information of the XML documents. The closed frequent subtrees are then used to extract the constrained content from the documents. A matrix containing the term distribution of the documents in the dataset is developed using the extracted constrained content. The k-way clustering algorithm is applied to the matrix to obtain the required clusters. In spite of the large number of documents in the INEX 2008 Wikipedia dataset, the proposed frequent subtree-based clustering approach was successful in clustering the documents. This approach significantly reduces the dimensionality of the terms used for clustering without much loss in accuracy.
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval | 2009
Sangeetha Kutty; Richi Nayak; Yuefeng Li
This paper presents an overview of the experiments conducted using Hybrid Clustering of XML documents using Constraints (HCXC) method for the clustering task in the INEX 2009 XML Mining track. This techique utilises frequent subtrees generated from the structure to extract the content for clustering the XML documents. It also presents the experimental study using several data representations such as the structure-only, content-only and using both the structure and the content of XML documents for the purpose of clustering them. Unlike previous years, this year the XML documents were marked up using the Wiki tags and contains categories derived by using the YAGO ontology. This paper also presents the results of studying the effect of these tags on XML clustering using the HCXC method.
Advances in Focused Retrieval | 2009
Tien Tran; Sangeetha Kutty; Richi Nayak
This paper reports on the experiments and results of a clustering approach used in the INEX 2008 document mining challenge. The clustering approach utilizes both the structure and content information of the Wikipedia XML document collection. A latent semantic kernel (LSK) is used to measure the semantic similarity between XML documents based on their content features. The construction of a latent semantic kernel involves the computing of singular vector decomposition (SVD). On a large feature space matrix, the computation of SVD is very expensive in terms of time and memory requirements. Thus in this clustering approach, the dimension of the document space of a term-document matrix is reduced before performing SVD. The document space reduction is based on the common structural information of the Wikipedia XML document collection. The proposed clustering approach has shown to be effective on the Wikipedia collection in the INEX 2008 document mining challenge.
international conference on data mining | 2010
Sangeetha Kutty; Richi Nayak; Yuefeng Li
A hierarchical structure is used to represent the content of the semi-structured documents such as XML and XHTML. The traditional Vector Space Model (VSM) is not sufficient to represent both the structure and the content of such web documents. Hence in this paper, we introduce a novel method of representing the XML documents in Tensor Space Model (TSM) and then utilize it for clustering. Empirical analysis shows that the proposed method is scalable for a real-life dataset as well as the factorized matrices produced from the proposed method helps to improve the quality of clusters due to the enriched document representation with both the structure and the content information.