Giovanni Yoko Kristianto
University of Tokyo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Giovanni Yoko Kristianto.
D-lib Magazine | 2014
Giovanni Yoko Kristianto; Goran Topić; Akiko Aizawa
Mathematical concepts and formulations play a fundamental role in many scientific domains. As such, the use of mathematical expressions represents a promising method of interlinking scientific papers. The purpose of this study is to provide guidelines for annotating and detecting natural language descriptions of mathematical expressions, enabling the semantic enrichment of mathematical information in scientific papers. Under the proposed approach, we first manually annotate descriptions of mathematical expressions and assess the coverage of several types of textual span: fixed context window, apposition, minimal noun phrases, and noun phrases. We then developed a method for automatic description extraction, whereby the problem was formulated as a binary classification by pairing each mathematical expression with its description candidates and classifying the pairs as correct or incorrect. Support vector machines (SVMs) with several different features were developed and evaluated for the classification task. Experimental results showed that an SVM model that uses all noun phrases delivers the best performance, achieving an F1-score of 62.25% against the 41.47% of the baseline (nearest noun) method.
Information Retrieval archive | 2017
Giovanni Yoko Kristianto; Goran Topić; Akiko Aizawa
Current mathematical search systems allow math expressions within a document to be queried using math expressions and keywords. To accept such queries, math search systems must index both math expressions and textual information in documents. Each indexed math expression is usually associated with all the words in its surrounding context within a given window size. However, we found that this context is often ineffective for explaining math expressions in scientific papers. The meaning of an expression is usually defined in the early part of a document, and the meaning of each symbol contained in the expression can be useful for explaining the entire expression. This explanation may not be captured within the context of a math expression, unless we set the context to have a very wide window size. However, widening the window size also increases the proportion of words that are unrelated to the expression. This paper proposes the use of dependency relationships between math expressions to enrich the textual information of each expression. We examine the influence of this enrichment in a math search system. The experimental results show that significantly better precision can be obtained using the enriched textual information rather than the math expressions’ own textual information. This indicates that the enrichment of textual information for each math expression using dependency relationships enhances the math search system.
arXiv: Information Retrieval | 2014
Minh-Quoc Nghiem; Giovanni Yoko Kristianto; Goran Topić; Akiko Aizawa
Mathematical content is a valuable information source and retrieving this content has become an important issue. This paper compares two searching strategies for math expressions: presentation-based and content-based approaches. Presentation-based search uses state-of-the-art math search system while content-based search uses semantic enrichment of math expressions to convert math expressions into their content forms and searching is done using these content-based expressions. By considering the meaning of math expressions, the quality of search system is improved over presentation-based systems.
exploiting semantic annotations in information retrieval | 2012
Giovanni Yoko Kristianto; Goran Topić; Minh-Quoc Nghiem; Akiko Aizawa
In recent years, growing numbers of scientific papers have been published in XML format generating a large published base of MathML-style formulas. Although these formulas can be indexed and searched based on their XML tree structures, they generally lack sufficient information for semantic interpretation. We propose an annotation design for linking mathematical formulas to natural language descriptions in the surrounding text. We also introduce potential applications for this annotation framework.
international conference on digital information management | 2014
Giovanni Yoko Kristianto; Goran Topić; Akiko Aizawa
Mathematical expressions are important for communication of scientific information, for instance, to explain or define concepts written in natural language. Despite their importance, current conventional search systems can not establish access to the mathematical expressions contained in a scientific paper. The major focus of current development of mathematical search systems is mathematical tree structure indexing, but utilizing textual information surrounding the expressions in these systems is also important. We examine how textual information contributes to a mathematical search system, primarily in the ranking process. We investigate the impact of two types of textual information in the ranking performances of a mathematical search system: words in context windows (baseline), which is easily extracted from sentence tokenization result, and descriptions, which are extracted using a machine learning method. We also examine the improvement in ranking obtained by utilizing the dependency graph of mathematical expressions. The experiment results show that the use of description and dependency graph together deliver better ranking performance than the use of context or when no textual information is used. The results also show that the dependency graph is crucial for increasing the number of mathematical expressions being assigned descriptions, and thus its use with descriptions together presented higher ranking performance than the use of descriptions only. This study suggests that descriptions represent mathematical expressions better (more precisely) than context windows, and even descriptions from child (indirect) expressions still represent the target expression better than the context from the target expression itself.
arXiv: Digital Libraries | 2013
Minh-Quoc Nghiem; Giovanni Yoko Kristianto; Goran Topić; Akiko Aizawa
In this paper, we present a new approach to the semantic enrichment of mathematical expression problem. Our approach is a combination of statistical machine translation and disambiguation which makes use of surrounding text of the mathematical expressions. We first use Support Vector Machine classifier to disambiguate mathematical terms using both their presentation form and surrounding text. We then use the disambiguation result to enhance the semantic enrichment of a statistical-machine-translation-based system. Experimental results show that our system archives improvements over prior systems.
international conference on asian digital libraries | 2016
Giovanni Yoko Kristianto; Goran Topić; Akiko Aizawa
This paper addresses the challenge of determining the identity of math expressions in scientific documents by linking these expressions to their corresponding Wikipedia articles. Math expressions are frequently used to denote important concepts in scientific documents, yet several of them, for example, famous equations, often have minimal explanation in the documents. This task will allow us to obtain an additional explanation from Wikipedia regarding these math expressions. This paper proposes an approach to this challenge, where the structures and surrounding text of math expressions are used for math entity linking. Our initial evaluation shows that a balanced combination of math structures and textual descriptions is required to obtain reliable linking performance.
Proceedings of the 1st Workshop on Scholarly Web Mining | 2017
Giovanni Yoko Kristianto; Akiko Aizawa
This paper addresses the challenge of determining the identity of mathematical expressions in documents by linking these expressions to their corresponding Wikipedia articles. Math expressions are frequently used to describe important concepts in scientific documents; however, particularly in the case of famous or well-established equations, they are often minimally explained within the documents themselves. Linking to Wikipedia allows readers to obtain additional explanation of these math expressions. This paper proposes a learning-based approach to solve this challenge using common features, such as math and text similarities, as well as the importance of the math expression within the document. Further, we develop a dataset that allowed us to train and test our proposed approach. Experimental results show that our learning-based approach achieves a precision of 83.40%, compared with 6.22% for the baseline method (a straightforward application of MathIR).
NTCIR | 2014
Goran Topić; Giovanni Yoko Kristianto; Minh-Quoc Nghiem; Akiko Aizawa
IEICE Transactions on Information and Systems | 2013
Minh-Quoc Nghiem; Giovanni Yoko Kristianto; Akiko Aizawa