Christian Girardi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christian Girardi is active.

Explore More

Publication

Featured researches published by Christian Girardi.

international workshop on web site evolution | 2003

Using keyword extraction for Web site clustering

Paolo Tonella; Filippo Ricca; Emanuele Pianta; Christian Girardi

Reverse engineering techniques have the potential to support Web site understanding, by providing views that show the organization of a site and its navigational structure. However, representing each Web page as a node in the diagrams that are recovered from the source code of a Web site leads often to huge and unreadable graphs. Moreover, since the level of connectivity is typically high, the edges in such graphs make the overall result still less usable. Clustering can be used to produce cohesive groups of pages that are displayed as a single node in reverse engineered diagrams. In this paper, we propose a clustering method based on the automatic extraction of the keywords of a Web page. The presence of common keywords is exploited to decide when it is appropriate to group pages together. A second usage of the keywords is in the automatic labeling of the recovered clusters of pages.

international conference on software maintenance | 2002

Restructuring multilingual web sites

Paolo Tonella; Filippo Ricca; Emanuele Pianta; Christian Girardi

Current practice of Web site development does not address explicitly the problems related to multilingual sites. The same information, as well as the same navigation paths, page formatting and organization, are expected to be provided by the site independently from the chosen language. This is typically ensured by adopting personal conventions on the way pages are named and on their location in the file system. Updates are then performed manually and consistency depends on the ability of the programmers not to miss any impact of the change. In this paper an extension to XHTML, called MLHTML (MultiLingual XHTML), is proposed as the target representation of a restructuring process aimed at producing a maintainable and consistent multilingual Web site. MLHTML centralizes the language dependent variants of a page in a single representation, where shared parts are not duplicated Existing sites can be migrated to MLHTML by means of the algorithms described in this paper. After classifying the pages according to their language, a page alignment technique is exploited to identify corresponding pages and to eliminate inconsistencies. Transformation into MLHTML can then be achieved automatically.

workshop on program comprehension | 2004

An empirical study on keyword-based Web site clustering

Filippo Ricca; Paolo Tonella; Christian Girardi; Emanuele Pianta

Web site evolution is characterized by a limited support to the understanding activities offered to the developers. In fact, design diagrams are often missing or outdated. A potentially interesting option is to reverse engineer high level views of Web sites from the content of the Web pages. Clustering is a valuable technique that can be used in this respect. Web pages can be clustered together based on the similarity of summary information about their content, represented as a list of automatically extracted keywords. This work presents an empirical study that was conducted to determine the meaningfulness for Web developers of clusters automatically produced from the analysis of the Web page content. Natural language processing (NLP) plays a central role in content analysis and keyword extraction. Thus, a second objective of the study was to assess the contribution of some shallow NLP techniques to the clustering task.

international workshop on web site evolution | 2003

Evaluation methods for Web application clustering

Paolo Tonella; Filippo Ricca; Emanuele Pianta; Christian Girardi

Clustering of the entities composing a Web application (static and dynamic pages) can be used to support program understanding, However, several alternative options are available when a clustering technique is designed for Web applications. The entities to be clustered can be described in different ways (e.g., by their structure, by their connectivity, or by their content), different similarity measures are possible, and alternative procedures can be used to form the clusters. The problem is how to evaluate the competing clustering techniques in order to select the best for program understanding purposes. In this paper, two methods for clustering evaluation are considered, the gold standard and the task oriented approach. The advantages and disadvantages of both of them are analyzed in detail. Definition of a gold standard (reference clustering) is difficult and prone to subjectivity. On the other side, an evaluation based on the level of support given to task execution is expensive and requires careful experimental design. Guidelines and examples are provided for the implementation of both methods.

International Journal of Web Information Systems | 2006

Web crawlers compared

Christian Girardi; Filippo Ricca; Paolo Tonella

Tools for the assessment of the quality and reliability of Web applications are based on the possibility of downloading the target of the analysis. This is achieved through Web crawlers, which can automatically navigate within a Web site and perform proper actions (such as download) during the visit. The most important performance indicators for a Web crawler are its completeness and robustness, measuring respectively the ability to visit the Web site entirely and without errors. The variety of implementation languages and technologies used for Web site development makes these two indicators hard to maximize. We conducted an evaluation study, in which we tested several of the available Web crawlers.

symposium on web systems evolution | 2001

Recovering traceability links in multilingual Web sites

Paolo Tonella; Filippo Ricca; Emanuele Pianta; Christian Girardi

The problem of verifying the consistency between Web site portions devoted to different languages is investigated. The purpose is to support the activity of the site maintainer, who is responsible for the alignment between different site versions. Anomalies that typically occur in such situations include the absence of pages in some languages, differences in the page structure in different languages, missing information and parts not translated. The approach which is proposed for recovering traceability links so as to simplify the update of the site to a consistent state, is based on a mix of structural and textual information extracted from the page. The syntax trees of the pages to be compared drive the page matching process. When structurally corresponding nodes are encountered during the tree visit, their text attributes are considered to see if they are each others translation.

International Workshop on Evaluation of Natural Language and Speech Tool for Italian | 2013

Exploiting Background Knowledge for Clustering Person Names

Roberto Zanoli; Francesco Corcoglioniti; Christian Girardi

Nowadays, surfing the Web and looking for persons seems to be one of the most common activities of Internet users. However, person names could be highly ambiguous and consequently search results are often a collection of documents about different people sharing the same name. In this paper a cross-document coreference system able to identify person names in different documents which refer to the same person entity is presented. The system exploits background knowledge through two mechanisms: (1) the use of a dynamic similarity threshold for clustering person names, which depends on the ambiguity of the name estimated using a phonebook; and (2) the disambiguation of names against a knowledge base containing person descriptions, using an entity linking system and including its output as an additional feature for computing similarity. The paper describes the system and reports its performance tested taking part in the News People Search (NePS) task at Evalita 2011. A version of the system is being used in a real-word application, which requires to corefer millions of names from multimedia sources.

conference on software maintenance and reengineering | 2004

Experimental results on the alignment of multilingual Web sites

Filippo Ricca; Paolo Tonella; Emanuele Pianta; Christian Girardi

Institutions and companies that are based in countries where the main language is not English typically publish Web sites that offer the same information at least in the local language and in English. However, the evolution of these Web sites may be troublesome, if the same pages are replicated for all supported languages. In fact, changes have to be propagated to all translations of a modified page. Algorithms that help ensure the consistency of multilingual Web pages exploit natural language processing (NLP) methods for the comparison of the content in the pages to be aligned. Since such methods are quite expensive from the point of view of the involved linguistic resources as well as of the computation time, a trade off should be considered between the benefits of more advanced techniques and the costs of their implementation. In this paper, an empirical evaluation is conducted to establish the proper NLP methods, combined with structural comparison methods, to use in Web page alignment.

Archive | 2013

Anchoring Background Knowledge to Rich Multimedia Contexts in the KnowledgeStore

Roldano Cattoni; F. Corcoglioniti; Christian Girardi; Bernardo Magnini; Luciano Serafini; Roberto Zanoli

The recent achievements in Natural Language Processing in terms of scalability and performance, and the large availability of background knowledge within the Semantic Web and the Linked Open Data initiative, encourage researchers in doing a further step towards the creation of machines capable of understanding multimedia documents by exploiting background knowledge. To pursue this direction it turns out to be necessary to maintain a clear link between knowledge and the documents containing it. This is achieved in the KnowledgeStore, a scalable content management system that supports the tight integration and storage of multimedia resources and background and extracted knowledge. Integration is done by (i)identifying mentions of named entities in multimedia resources, (ii)establishing mention coreference and either (iii)linking mentions to entities in the background knowledge, or (iv)extending that knowledge with new entities. We present the KnowledgeStore and describe its use in creating a large scale repository of knowledge and multimedia resources in the Italian Trentino region, whose interlinking allows us to explore advanced tasks such as entity-based search and semantic enrichment.

NLPXML '06 Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing | 2006

Representing and accessing multilevel linguistic annotation using the MEANING format

Emanuele Pianta; Luisa Bentivogli; Christian Girardi; Bernardo Magnini

We present an XML annotation format (MEANING Annotation Format, MAF) specifically designed to represent and integrate different levels of linguistic annotations and a tool that provides flexible access to them (MEANING Browser). We describe our experience in integrating linguistic annotations coming from different sources, and the solutions we adopted to implement efficient access to corpora annotated with the Meaning Format.

Explore More