Frank Neven | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Frank Neven is active.

Explore More

Publication

Featured researches published by Frank Neven.

international conference on database theory | 2003

XPath Containment in the Presence of Disjunction, DTDs, and Variables

Frank Neven; Thomas Schwentick

XPath is a simple language for navigating an XML tree and returning a set of answer nodes. The focus in this paper is on the complexity of the containment problem for various fragments of XPath. In addition to the basic operations (child, descendant, filter, and wildcard), we consider disjunction, DTDs and variables. W.r.t. variables we study two semantics: (1) the value of variables is given by an outer context; (2) the value of variables is defined existentially. We establish an almost complete classification of the complexity of the containment problem w.r.t. these fragments.

ACM Transactions on Computational Logic | 2004

Finite state machines for strings over infinite alphabets

Frank Neven; Thomas Schwentick; Victor Vianu

Motivated by formal models recently proposed in the context of XML, we study automata and logics on strings over infinite alphabets. These are conservative extensions of classical automata and logics defining the regular languages on finite alphabets. Specifically, we consider register and pebble automata, and extensions of first-order logic and monadic second-order logic. For each type of automaton we consider one-way and two-way variants, as well as deterministic, nondeterministic, and alternating control. We investigate the expressiveness and complexity of the automata and their connection to the logics, as well as standard decision problems. Some of our results answer open questions of Kaminski and Francez on register automata.

international conference on management of data | 2002

Automata theory for XML researchers

Frank Neven

The advent of XML initiated a symbiosis between document research, databases and formal languages (see, e.g., the survey by Vianu [38]). This symbiosis resulted, for instance, in the development of unranked tree automata [3]. In brief, unranked trees are finite labeled trees where nodes can have an arbitrary number of children. So, there is no fixed rank associated to each label. As the structure of XML documents can be adequately represented by unranked trees, unranked tree automata can serve XML research in four different ways: (i) as a basis of schema languages [10, 12, 13, 18, 19] and validating of schemas; (ii) as an evaluation mechanism for pattern languages [4, 32, 24]; (iii) as an algorithmic toolbox (e.g., XPath containment [16] and typechecking [15]); and (iv) as a new paradigm: unranked tree automata use regular string languages to deal with unrankedness. The latter simple but effective paradigm found application in several formalisms [14, 20, 21, 22, 27].

international workshop on the web and databases | 2004

DTDs versus XML schema: a practical study

Geert Jan Bex; Frank Neven; Jan Van den Bussche

Among the various proposals answering the shortcomings of Document Type Definitions (DTDs), XML Schema is the most widely used. Although DTDs and XML Schema Definitions (XSDs) differ syntactically, they are still quite related on an abstract level. Indeed, freed from all syntactic sugar, XML Schemas can be seen as an extension of DTDs with a restricted form of specialization. In the present paper, we inspect a number of DTDs and XSDs harvested from the web and try to answer the following questions: (1) which of the extra features/expressiveness of XML Schema not allowed by DTDs are effectively used in practice; and, (2) how sophisticated are the structural properties (i.e. the nature of regular expressions) of the two formalisms. It turns out that at present real-world XSDs only sparingly use the new features introduced by XML Schema: on a structural level the vast majority of them can already be defined by DTDs. Further, we introduce a class of simple regular expressions and obtain that a surprisingly high fraction of the content models belong to this class. The latter result sheds light on the justification of simplifying assumptions that sometimes have to be made in XML research.

Theoretical Computer Science | 2002

Query automata over finite trees

Frank Neven; Thomas Schwentick

A main task in document transformation and information retrieval is locating subtrees satisfying some pattern. Therefore, unary queries, i.e., queries that map a tree to a set of its nodes, play an important role in the context of structured document databases. The motivation of this work is to understand how the natural and well-studied computation model of tree automata can be used to compute such queries. We define a query automaton (QA) as a deterministic two-way finite automaton over trees that has the ability to select nodes depending on the state and the label at those nodes. We study QAs over ranked as well as over unranked trees. Unranked trees differ from ranked ones in that there is no bound on the number of children of nodes. We characterize the expressiveness of the different formalisms as the unary queries definable in monadic second-order logic (MSO). In contrast to the ranked case, special stay transitions had to be added to QAs over unranked trees to capture MSO. We establish the complexity of the non-emptiness, containment, and equivalence of QAs to be complete for EXPTIME.

Information Systems | 2002

A formal model for an expressive fragment of XSLT

Geert Jan Bex; Sebastian Maneth; Frank Neven

The extension of the eXtensible Style sheet Language (XSL) by variables and passing of data values between template rules has generated a powerful XML query language: eXtensible Style sheet Language Transformations (XSLT). An informal introduction to XSTL is given, on the bases of which a formal model of a fragment of XSLT is defined. This formal model is in the spirit of tree transducers, and its semantics is defined by rewrite relations. It is shown that the expressive power of the fragment is already beyond that of most other XML query languages. Finally, important properties such as termination and closure under composition are considered.

Journal of Computer and System Sciences | 2003

XML with data values: typechecking revisited

Noga Alon; Tova Milo; Frank Neven; Dan Suciu; Victor Vianu

We investigate the typechecking problem for XML queries: statically verifying that every answer to a query conforms to a given output DTD, for inputs satisfying a given input DTD. This problem had been studied by a subset of the authors in a simplified framework that captured the structure of XML documents but ignored data values. We revisit here the typechecking problem in the more realistic case when data values are present in documents and tested by queries. In this extended framework, typechecking quickly becomes undecidable. However, it remains decidable for large classes of queries and DTDs of practical interest. The main contribution of the present paper is to trace a fairly tight boundary of decidability for typechecking with data values. The complexity of typechecking in the decidable cases is also considered.

Logical Methods in Computer Science | 2006

On The Complexity Of Xpath Containment In The Presence Of Disjunction, Dtds, And Variables

Frank Neven; Thomas Schwentick

XPath is a simple language for navigating an XML-tree and returning a set of answer nodes. The focus in this paper is on the complexity of the containment problem for various fragments of XPath. We restrict attention to the most common XPath expressions which navigate along the child and/or descendant axis. In addition to basic expressions using only node tests and simple predicates, we also consider disjunction and variables (ranging over nodes). Further, we investigate the containment problem relative to a given DTD. With respect to variables we study two semantics, (1) the original semantics of XPath, where the values of variables are given by an outer context, and (2) an existential semantics introduced by Deutsch and Tannen, in which the values of variables are existentially quantified. In this framework, we establish an exact classification of the complexity of the containment problem for many XPath fragments.

ACM Transactions on Database Systems | 2010

Inference of concise regular expressions and DTDs

Geert Jan Bex; Frank Neven; Thomas Schwentick; Stijn Vansummeren

We consider the problem of inferring a concise Document Type Definition (DTD) for a given set of XML-documents, a problem that basically reduces to learning concise regular expressions from positive examples strings. We identify two classes of concise regular expressions—the single occurrence regular expressions (SOREs) and the chain regular expressions (CHAREs)—that capture the far majority of expressions used in practical DTDs. For the inference of SOREs we present several algorithms that first infer an automaton for a given set of example strings and then translate that automaton to a corresponding SORE, possibly repairing the automaton when no equivalent SORE can be found. In the process, we introduce a novel automaton to regular expression rewrite technique which is of independent interest. When only a very small amount of XML data is available, however (for instance when the data is generated by Web service requests or by answers to queries), these algorithms produce regular expressions that are too specific. Therefore, we introduce a novel learning algorithm crx that directly infers CHAREs (which form a subclass of SOREs) without going through an automaton representation. We show that crx performs very well within its target class on very small datasets.

international conference on database theory | 2003

Typechecking Top-Down Uniform Unranked Tree Transducers

Wim Martens; Frank Neven

We investigate the typechecking problem for XML queries: statically verifying that every answer to a query conforms to a given output schema, for inputs satisfying a given input schema. As typechecking quickly turns undecidable for query languages capable of testing equality of data values, we return to the limited framework where we abstract XML documents as labeled ordered trees. We focus on simple top-down recursive transformations motivated by XSLT and structural recursion on trees. We parameterize the problem by several restrictions on the transformations (deleting, non-deleting, bounded width) and consider both tree automata and DTDs as output schemas. The complexity of the typechecking problems in this scenario range from PTIME to EXPTIME.

Explore More