Paolo Atzeni
Sapienza University of Rome
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Paolo Atzeni.
symposium on principles of database systems | 1997
Paolo Atzeni; Giansalvatore Mecca
The paper develops EDITOR, a language for manipulating semi-structured documents, such as the ones typically available on the Web. EDITOR programs allow to search and restructure a document. They are based on two simple ideas, taken from text editors: Search” instructions are used to select regions of interest in a document, and “cut .!Y paste” to restructure them. We study the expressive power and the complexity of these programs. We show that they are computationally complete, in the sense that any computable document restructuring can be expressed in EDITOR. We also study the complexity of a safe subclass of programs, showing that it captures exactly the class of polynomial-time restructurings. The language has been implemented in Java, and is used in the ARANEUS project to build database views over Web sites.
extending database technology | 1998
Paolo Atzeni; Giansalvatore Mecca; Paolo Merialdo
A methodology for designing and maintaining large Web sites is introduced. It would be especially useful if data to be published in the site are managed using a DBMS. The design process is composed of two intertwined activities: database design and hypertext design. Each of these is further divided in a conceptual phase and a logical phase, based on specific data models, proposed in our project. The methodology strongly supports site maintenance: in fact, the various models provide a concise description of the site structure; they allow to reason about the overall organization of pages in the site and possibly to restructure it.
Archive | 2004
Paolo Atzeni; Wesley W. Chu; Hongjun Lu; Shuigeng Zhou; Tok Wang Ling
The envisioned Semantic Web aims to provide richly annotated and explicitly structured Web pages in XML, RDF, or description logics, based upon underlying ontologies and thesauri. Ideally, this should enable a wealth of query processing and semantic reasoning capabilities using XQuery and logical inference engines. However, we believe that the diversity and uncertainty of terminologies and schema-like annotations will make precise querying on a Web scale extremely elusive if not hopeless, and the same argument holds for large-scale dynamic federations of Deep Web sources. Therefore, ontology-based reasoning and querying needs to be enhanced by statistical means, leading to relevanceranked lists as query results. This paper presents steps towards such a “statistically semantic” Web and outlines technical challenges. We discuss how statistically quantified ontological relations can be exploited in XML retrieval, how statistics can help in making Web-scale search efficient, and how statistical information extracted from users’ query logs and click streams can be leveraged for better search result ranking. We believe these are decisive issues for improving the quality of next-generation search engines for intranets, digital libraries, and the Web, and they are crucial also for peer-to-peer collaborative Web search. 1 The Challenge of “Semantic” Information Search The age of information explosion poses tremendous challenges regarding the intelligent organization of data and the effective search of relevant information in business and industry (e.g., market analyses, logistic chains), society (e.g., health care), and virtually all sciences that are more and more data-driven (e.g., gene expression data analyses and other areas of bioinformatics). The problems arise in intranets of large organizations, in federations of digital libraries and other information sources, and in the most humongous and amorphous of all data collections, the World Wide Web and its underlying numerous databases that reside behind portal pages. The Web bears the potential of being the world’s largest encyclopedia and knowledge base, but we are very far from being able to exploit this potential. Database-system and search-engine technologies provide support for organizing and querying information; but all too often they require excessive manual preprocessing, such as designing a schema and cleaning raw data or manually classifying documents into a taxonomy for a good Web portal, or manual postprocessing such as browsing through large result lists with too many irrelevant items or surfing in the vicinity of promising but not truly satisfactory approximate matches. The following are a few example queries where current Web and intranet search engines fall short or where data P. Atzeni et al. (Eds.): ER 2004, LNCS 3288, pp. 3–17, 2004. c
international conference on management of data | 1998
Giansalvatore Mecca; Paolo Atzeni; A. Masci; G. Sindoni; Paolo Merialdo
The paper describes the ARANEUS Wel-Base Management System [l, 5, 4, 61, a system developed at Universitb di Roma Tre, which represents a proposal towards the definition of a new kind of data-repository, designed to manage Web data in the database style. We call a WebBase a collection of data of heterogeneous nature, and more specifically: (i) highly structured data, such as the ones typically stored in relational or objectoriented database systems; (G) semistructured data, in the Web style. We can simplify by saying that it incorporates both databases and Web sites. A Web-Base Management System (WBMS) is a system for managing such Web-bases. More specifically, it should provide functionalities for both database and Web site management. It is natural to think of it as an evolution of ordinary DBMSs, in the sense that it will play in future generation Web-based Information Systems the same role as the one played by database systems today. Three natural requirements arise here:first, the system should be fully distributed: databases and Web sites may be either local or remote resources; second, it should be platform-independent, i.e., it should not be tied to a specific platform or software environment, coherently with the nature of the Internet; finally, all system functionalities should be accessible through a hypertextual user interface, based on HTML-like markup languages, i.e., the system should be a site itself. We can list three main classes of applications that a WBMS should support, in the database spirit: (1) queries: the system should allow to access data in a Web-base in a declarative, high-level fashion; this means that not only structured data can be accessed and queried, but also semistructured data in Web sites; (2) views: data coming from heterogeneous sources should be possibly reorganized and integrated in new Web-bases, in order to provide different views over the original data, to be navigated and queried by end-users; (3) updates: the process of maintaining Web sites is a delicate one which should be carefully supported;
international conference on management of data | 1997
Paolo Atzeni; Giansalvatore Mecca; Paolo Merialdo
Database systems offer efficient and reliable technology to query structured data. However, because of the explosion of the World Wide Web [11], an increasing amount of information is stored in repositories organized according to less rigid structures, usually as hypertextual documents, and data access is based on browsing and information retrieval techniques. Since browsing and search engines present important limitations [8], several query languages [19, 20, 23] for the Web have been recently proposed. These approaches are mainly based on a loose notion of structure, and tend to see the Web as a huge collection of unstructured objects, organized as a graph. Clearly, traditional database techniques are of little use in this field, and new techniques need to be developed. In this paper, we present the approach to the management of Web data as attacked in the ArtANEUS project carried out by the database group at Universith di l=toma Tre. Our approach is based on a generalization of the notion of view to the Web framework. In fact, in traditional databases, views represent an essential tool for restructuring and integrating da ta to be presented to the user. Since the Web is becoming a major computing platform and a uniform interface for sharing data, we believe that also in this field a sophisticate view mechanism is needed, with novel features due to the semi-structured nature of the Web. First, in this context, restructuring and presenting da ta under different perspectives requires the generation of derived Web hypertexts, in order to re-organize and re-use portions of the Web. To do this, da ta from existing Web sites must be extracted, and then queried and integrated in order to build new hypertexts, i.e., hypertextual views over the original sites; these manipulations can be better attained in a more structured framework, in which traditional database technology can be leveraged to analyze and correlate information. Therefore, there seem to be different view levels in this framework: (i) at the first level, da ta are extracted from the sites of interest and given a database structure, which represents a first structured view over the original semi-structured data; (ii) then, further database views can be built by means of reorganizations and integrations based on traditional database techniques; (iii) finally, a derived hypertext can be generated offering an alternative or integrated hypertextual view over the original sites. In the process, data go from a loosely structured organizat ion-the Web pages-to a very structured onethe database--and then again to Web structures.
extending database technology | 1996
Paolo Atzeni; Riccardo Torlone
We describe the development of a tool, called MDM, for the management of multiple models and the translation of database schemes. This tool can be at the basis of an integrated CASE environment, supporting the analysis and design of information systems, that allows different representations for the same data schemes. We first present a graph-theoretic framework that allows us to formally investigate desirable properties of schema translations. The formalism is based on a classification of the constructs used in the known data model into a limited set of types. Then, on the basis of formal results, we develop general methodologies for deriving“good” translations between schemes and, more in general, between models. Finally, we define the architecture and the functionalities of a first prototype that implements the various features of the approach.
Information & Computation | 1986
Paolo Atzeni; Nicola M. Morfuni
Database relations with incomplete information are considered. The no-information interpretation of null values is adopted, due to its characteristics of generality and naturalness. Coherently with the framework and its motivation, two meaningful classes of integrity constraints are studied: (a) functional dependencies, which have been widely investigated in the classical relational theory and (b) constraints on null values, which control the presence of nulls in the relations. Specifically, three types of constraints on null values are taken into account (nullfree subschemes, existence constraints, disjunctive existence constraints), and the interaction of each of them with functional dependencies is studied. In each of the three cases, the inference problem is solved, the complexity of the algorithms for its solution analyzed, and the existence of a complete axiomatization discussed.
very large data bases | 2008
Paolo Atzeni; Paolo Cappellari; Riccardo Torlone; Philip A. Bernstein; Giorgio Gianforme
We discuss a proposal for the implementation of the model management operator ModelGen, which translates schemas from one model to another, for example from object-oriented to SQL or from SQL to XML schema descriptions. The operator can be used to generate database wrappers (e.g., object-oriented or XML to relational), default user interfaces (e.g., relational to forms), or default database schemas from other representations. The approach translates schemas from a model to another, within a predefined, but large and extensible, set of models: given a source schema S expressed in a source model, and a target model TM, it generates a schema S′ expressed in TM that is “equivalent” to S. A wide family of models is handled by using a metamodel in which models can be succinctly and precisely described. The approach expresses the translation as Datalog rules and exposes the source and target of the translation in a generic relational dictionary. This makes the translation transparent, easy to customize and model-independent. The proposal includes automatic generation of translations as composition of basic steps.
ACM Transactions on Internet Technology | 2003
Paolo Merialdo; Paolo Atzeni; Giansalvatore Mecca
Data-intensive Web sites are large sites based on a back-end database, with a fairly complex hypertext structure. The paper develops two main contributions: (a) a specific design methodology for data-intensive Web sites, composed of a set of steps and design transformations that lead from a conceptual specification of the domain of interest to the actual implementation of the site; (b) a tool called Homer, conceived to support the site design and implementation process, by allowing the designer to move through the various steps of the methodology, and to automate the generation of the code needed to implement the actual site.Our approach to site design is based on a clear separation between several design activities, namely database design, hypertext design, and presentation design. All these activities are carried on by using high-level models, all subsumed by an extension of the nested relational model; the mappings between the models can be nicely expressed using an extended relational algebra for nested structures. Based on the design artifacts produced during the design process, and on their representation in the algebraic framework, Homer is able to generate all the code needed for the actual generation of the site, in a completely automatic way.
extending database technology | 2006
Paolo Atzeni; Paolo Cappellari; Philip A. Bernstein
We describe MIDST, an implementation of the model management operator ModelGen, which translates schemas from one model to another, for example from OO to SQL or from SQL to XSD. It extends past approaches by translating database instances, not just their schemas. The operator can be used to generate database wrappers (e.g. OO or XML to relational), default user interfaces (e.g. relational to forms), or default database schemas from other representations. The approach translates both schemas and data: given a source instance I of a schema S expressed in a source model, and a target model TM, it generates a schema S′ expressed in TM that is “equivalent” to S and an instance I′ of S′ “equivalent” to I. A wide family of models is handled by using a metamodel in which models can be succinctly and precisely described. The approach expresses the translation as Datalog rules and exposes the source and target of the translation in a generic relational dictionary. This makes the translation transparent, easy to customize and model-independent.