Thure Etzold
European Bioinformatics Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thure Etzold.
Methods in Enzymology | 1996
Thure Etzold; Anatoly Ulyanov; Patrick Argos
Publisher Summary This chapter presents a retrieval system called “Sequence Retrieval System (SRS)” that acts on data banks in a flat file or text format. It provides a homogeneous interface to about 80 biological databanks for accessing and querying their contents and for navigating among them. SRS is an integrated system that provides a homogeneous interface to all flat file data banks retained in their original format. It is a retrieval system that allows access to, but not the depositing of, data. Several elements are combined into a system that extends the power of normal retrieval systems and that rivals that of real databases, such as a relational system, without compromising speed. These elements include languages for data bank and syntax definition, a programmable parser, an indexing system, support for subentries, a novel system for exploiting links among data banks, and a query language. The database linking is a unique feature that considerably extends the capability of hypertext links.
Bioinformatics | 1993
Thure Etzold; Patrick Argos
SRS (Sequence Retrieval System) is an information indexing and retrieval system designed for libraries with a flat file format such as the EMBL nucleotide sequence databank, the SwissProt protein sequence databank or the Prosite library of protein subsequence consensus patterns. SRS supports the data structure of these libraries by providing special indices for implementing lists of subentities (e.g. feature tables) or hierarchically structured data-fields (e.g. taxonomic classification). A language (ODD) has been designed for the convenient specification of library format and organization, representation of individual data-fields within the system (design of indices) and structuring other data needed during retrieval. This ensures flexibility required for coping with different library formats, which are subject to continuous change. Queries and inspection of retrieved entries can be performed from a user interface with pull-down menus and windows. SRS supports various input and output formats but is particularly well adapted to the GCG programs.
Bioinformatics | 2002
Evgeni M. Zdobnov; Rodrigo Lopez; Rolf Apweiler; Thure Etzold
MOTIVATION The current data explosion is intractable without advanced data management systems. The numerous data sets become really useful when they are interconnected under a uniform interface--representing the domain knowledge. The SRS has become an integration system for both data retrieval and applications for data analysis. It provides capabilities to search multiple databases by shared attributes and to query across databases fast and efficiently. RESULTS Here we present recent developments at the EBI SRS server (http://srs.ebi.ac.uk). The EBI SRS server contains today more than 130 biological databases and integrates more than 10 applications. It is a central resource for molecular biology data as well as a reference server for the latest developments in data integration. One of the latest additions to the EBI SRS server is the InterPro database-Integrated Resource of Protein Domains and Functional Sites. Distributed in XML format it became a turning point in low level XML-SRS integration. We present InterProScan as an example of data analysis applications, describe some advanced features of SRS6, and introduce the SRSQuickSearch JavaScript interfaces to SRS.
Bioinformatics | 1993
Thure Etzold; Patrick Argos
SRS (Sequence Retrieval System), an indexing system for flat file libraries, provides fast access to individual library entries via retrieval by keywords from various data fields. SRS is now also able to build indices using cross-references that most libraries provide. Fifteen libraries of DNA and protein sequences and structures have been selected. These libraries interact with at least one other by means of cross-references. Indexing these cross-references allows a complete network of libraries to be built. In the network an entry from one library can be linked in principle to every other library. If two libraries are not directly cross-referenced, the linkage can be made with a succession of single links between neighbouring, cross-referenced libraries. A new operator has been added to the query language of SRS for convenient specification of links amongst complete libraries or entry sets generated by previous queries on particular libraries. All the information in the network can now be used to retrieve an entry in a specific library, e.g. the full information given in amino acid sequence entries from SwissProt can now be used to retrieve related tertiary structure entries from PDB. Furthermore, a search in a single library can be extended to a search in the complete library network, e.g. all entries in all databases pertaining to elastase can be found.
Trends in Biochemical Sciences | 1999
David P. Kreil; Thure Etzold
First, select ‘SRS World Wide’ from the SRS home page. This leads to a list of known public SRS servers. From here, users can choose to search DATABANKS – typically by databank name or description. More generally, any field present in the databank information pages, as well as site and server characteristics, can be used in a query.The results of the search show all the databanks that matched the search request and at which sites they are available. For convenience, the list of results offers direct links to their remote query forms, which feature a uniform interface that is both easy to use and flexible.Figure 1Figure 1 shows the result of a request for databanks named ‘ENZYME’. As in most cases, several servers maintain a copy of the database, and the list shows alternative sites. The number of indexed entries and the release number (where assigned by the server maintainers) help users to choose a nearby site that has a current version of the database.Figure 1The results of a query for databanks named ‘ENZYME’. The number of indexed entries and the release number (where assigned by the server maintainers) help users to choose a nearby server that offers a current version of the appropriate databank.View Large Image | Download PowerPoint SlideWhen searching for a particular database, users should first restrict the search to a subset of DATABANKS that includes only one site from each group of alternatives. (Currently, this representative site is chosen as the site that has the most extensive databank-information page.) This allows searching for databanks in two steps: (1) identification of the databases of interest; and (2) comparison of the sites at which they are offered (see Fig. 1Fig. 1).Consider a user who is interested in databases that offer sequence alignments, which hold information on well-characterized protein domains or families and can be used for functional assignments or phylogenetic examinations. Selecting the field ‘Description’ in the query form and asking for ‘sequence’ and ‘align’, yields a list of ∼60 databank copies; modified as shown in Fig. 2Fig. 2, however, the query fetches a more manageable list of only 15 representative databanks. In addition to general databases of protein domains or families [such as PFAM (Ref. 7xSonnhammer, E.L. et al. Nucleic Acids Res. 1998; 26: 320–322Crossref | PubMed | Scopus (437)See all ReferencesRef. 7), PRINTS (Ref. 8xAttwood, T.K., Beck, M.E., Bleasby, A.J., and Parry-Smith, D.J. Nucleic Acids Res. 1994; 24: 182–188Crossref | Scopus (22)See all ReferencesRef. 8) or PIRALN (Ref. 9xBarker, W.C. et al. Nucleic Acids Res. 1998; 26: 27–32Crossref | PubMed | Scopus (63)See all ReferencesRef. 9)], a user will also find specialized databases, such as HOVERGEN (vertebrates)10xDuret, L., Mouchiroud, D., and Gouy, M. Nucleic Acids Res. 1994; 22: 2360–2365Crossref | PubMed | Scopus (176)See all References10, AMmtDB (vertebrate mitochondria)11xLanave, C. et al. Nucleic Acids Res. 1999; 27: 134–137Crossref | PubMed | Scopus (7)See all References11, RDP (ribosomes)12xMaidak, B.L. et al. Nucleic Acids Res. 1997; 25: 109–111Crossref | PubMed | Scopus (695)See all References12, FSSP and HSSP (protein structure)13xHolm, L. et al. Protein Sci. 1992; 1: 1691–1698Crossref | PubMedSee all References, 14xSander, C. and Schneider, R. Proteins. 1991; 9: 56–68Crossref | PubMedSee all References, or TRANSFAC (transcription factors)15xHeinemeyer, T. et al. Nucleic Acids Res. 1998; 26: 362–367Crossref | PubMed | Scopus (1224)See all References15. The user can now browse the descriptions of the databases retrieved, and refine or broaden the search.Figure 2Query for databanks that have a description containing the terms ‘sequence’ and ‘align’. The second line of the query form requests that the results be restricted to one representative databank for each group of alternatives.View Large Image | Download PowerPoint SlideEach entry in DATABANKS contains a copy of the SRS databank-information page – as shown by the server it was collected from – and concludes with an overview of alternative sites. A typical entry is shown in Fig. 3Fig. 3. The overview provides direct links for remote queries to each of the sites. If a stable connection to a particular site could not be established, the site is moved to the end of the list of alternatives. In these cases, data from previous runs are used as backup. A record of when the backup was originally retrieved indicates whether it might be out of date.Figure 3A typical DATABANKS entry. The entry contains a copy of the respective remote SRS databank information page, which includes a description, references and links, as well as detailed documentation of database fields and indices. It concludes with a listing of alternative sites that offer ‘ENZYME’. Direct links to these sites and the remote query forms for ENZYME are provided. For users in the network vicinity of a particular DATABANKS server, the relative response times compiled by that server give a clue to the net distances to other sites (N/A indicates problems connecting at the specified time).View Large Image | Download PowerPoint SlideDATABANKS provides occasional users and experts alike with an up-to-date direct gateway into the ever-growing net of databanks, giving convenient access to a wide range of data and services.
Journal of Molecular Biology | 1995
Gerhard Vogt; Thure Etzold; Patrick Argos
Bioinformatics | 2002
Evgeny M. Zdobnov; Rodrigo Lopez; Rolf Apweiler; Thure Etzold
Trends in Genetics | 1998
Heikki Lehväslaiho; Michael Ashburner; Thure Etzold
Archive | 2002
Phil Carter; Thierry Coupaye; David P. Kreil; Thure Etzold
Archive | 2000
Thierry Coupaye; Thure Etzold