Is this you? Create Your Porfile

Eva Huala

Carnegie Institution for Science

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eva Huala is active.

Explore More

Publication

Featured researches published by Eva Huala.

Nucleic Acids Research | 2012

The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools

Philippe Lamesch; Tanya Z. Berardini; Donghui Li; David Swarbreck; Christopher Wilks; Rajkumar Sasidharan; Robert J. Muller; Kate Dreher; Debbie L. Alexander; Margarita Garcia-Hernandez; Athikkattuvalasu S. Karthikeyan; Cynthia Lee; William Nelson; Larry Ploetz; Shanker Singh; April Wensel; Eva Huala

The Arabidopsis Information Resource (TAIR, http://arabidopsis.org) is a genome database for Arabidopsis thaliana, an important reference organism for many fundamental aspects of biology as well as basic and applied plant biology research. TAIR serves as a central access point for Arabidopsis data, annotates gene function and expression patterns using controlled vocabulary terms, and maintains and updates the A. thaliana genome assembly and annotation. TAIR also provides researchers with an extensive set of visualization and analysis tools. Recent developments include several new genome releases (TAIR8, TAIR9 and TAIR10) in which the A. thaliana assembly was updated, pseudogenes and transposon genes were re-annotated, and new data from proteomics and next generation transcriptome sequencing were incorporated into gene models and splice variants. Other highlights include progress on functional annotation of the genome and the release of several new tools including Textpresso for Arabidopsis which provides the capability to carry out full text searches on a large body of research literature.

Nucleic Acids Research | 2007

The Arabidopsis Information Resource (TAIR): gene structure and function annotation

David Swarbreck; Christopher Wilks; Philippe Lamesch; Tanya Z. Berardini; Margarita Garcia-Hernandez; Hartmut Foerster; Donghui Li; Tom Meyer; Robert J. Muller; Larry Ploetz; Amie Radenbaugh; Shanker Singh; Vanessa Swing; Christophe Tissier; Peifen Zhang; Eva Huala

The Arabidopsis Information Resource (TAIR, http://arabidopsis.org) is the model organism database for the fully sequenced and intensively studied model plant Arabidopsis thaliana. Data in TAIR is derived in large part from manual curation of the Arabidopsis research literature and direct submissions from the research community. New developments at TAIR include the addition of the GBrowse genome viewer to the TAIR site, a redesigned home page, navigation structure and portal pages to make the site more intuitive and easier to use, the launch of several TAIR web services and a new genome annotation release (TAIR7) in April 2007. A combination of manual and computational methods were used to generate this release, which contains 27 029 protein-coding genes, 3889 pseudogenes or transposable elements and 1123 ncRNAs (32 041 genes in all, 37 019 gene models). A total of 681 new genes and 1002 new splice variants were added. Overall, 10 098 loci (one-third of all loci from the previous TAIR6 release) were updated for the TAIR7 release.

Nucleic Acids Research | 2003

The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community

Seung Y. Rhee; William D. Beavis; Tanya Z. Berardini; Guanghong Chen; David A. Dixon; Aisling Doyle; Margarita Garcia-Hernandez; Eva Huala; Gabriel C. Lander; Mary Montoya; Neil Miller; Lukas A. Mueller; Suparna Mundodi; Leonore Reiser; Julie Tacklind; Dan C. Weems; Yihe Wu; Iris Xu; Daniel Yoo; Jungwon Yoon; Peifen Zhang

Arabidopsis thaliana is the most widely-studied plant today. The concerted efforts of over 11 000 researchers and 4000 organizations around the world are generating a rich diversity and quantity of information and materials. This information is made available through a comprehensive on-line resource called the Arabidopsis Information Resource (TAIR) (http://arabidopsis.org), which is accessible via commonly used web browsers and can be searched and downloaded in a number of ways. In the last two years, efforts have been focused on increasing data content and diversity, functionally annotating genes and gene products with controlled vocabularies, and improving data retrieval, analysis and visualization tools. New information include sequence polymorphisms including alleles, germplasms and phenotypes, Gene Ontology annotations, gene families, protein information, metabolic pathways, gene expression data from microarray experiments and seed and DNA stocks. New data visualization and analysis tools include SeqViewer, which interactively displays the genome from the whole chromosome down to 10 kb of nucleotide sequence and AraCyc, a metabolic pathway database and map tool that allows overlaying expression data onto the pathway diagrams. Finally, we have recently incorporated seed and DNA stock information from the Arabidopsis Biological Resource Center (ABRC) and implemented a shopping-cart style on-line ordering system.

Plant Physiology | 2004

Functional Annotation of the Arabidopsis Genome Using Controlled Vocabularies

Tanya Z. Berardini; Suparna Mundodi; Leonore Reiser; Eva Huala; Margarita Garcia-Hernandez; Peifen Zhang; Lukas A. Mueller; Jungwoon Yoon; Aisling Doyle; Gabriel C. Lander; Nick Moseyko; Danny Yoo; Iris Xu; Brandon Zoeckler; Mary Montoya; Neil Miller; Dan C. Weems; Seung Y. Rhee

Controlled vocabularies are increasingly used by databases to describe genes and gene products because they facilitate identification of similar genes within an organism or among different organisms. One of The Arabidopsis Information Resources goals is to associate all Arabidopsis genes with terms developed by the Gene Ontology Consortium that describe the molecular function, biological process, and subcellular location of a gene product. We have also developed terms describing Arabidopsis anatomy and developmental stages and use these to annotate published gene expression data. As of March 2004, we used computational and manual annotation methods to make 85,666 annotations representing 26,624 unique loci. We focus on associating genes to controlled vocabulary terms based on experimental data from the literature and use The Arabidopsis Information Resource-developed PubSearch software to facilitate this process. Each annotation is tagged with a combination of evidence codes, evidence descriptions, and references that provide a robust means to assess data quality. Annotation of all Arabidopsis genes will allow quantitative comparisons between sets of genes derived from sources such as microarray experiments. The Arabidopsis annotation data will also facilitate annotation of newly sequenced plant genomes by using sequence similarity to transfer annotations to homologous genes. In addition, complete and up-to-date annotations will make unknown genes easy to identify and target for experimentation. Here, we describe the process of Arabidopsis functional annotation using a variety of data sources and illustrate several ways in which this information can be accessed and used to infer knowledge about Arabidopsis and other plant species.

PLOS Biology | 2015

Finding Our Way through Phenotypes

Andrew R. Deans; Suzanna E. Lewis; Eva Huala; Salvatore S. Anzaldo; Michael Ashburner; James P. Balhoff; David C. Blackburn; Judith A. Blake; J. Gordon Burleigh; Bruno Chanet; Laurel Cooper; Mélanie Courtot; Sándor Csösz; Hong Cui; Wasila M. Dahdul; Sandip Das; T. Alexander Dececchi; Agnes Dettai; Rui Diogo; Robert E. Druzinsky; Michel Dumontier; Nico M. Franz; Frank Friedrich; George V. Gkoutos; Melissa Haendel; Luke J. Harmon; Terry F. Hayamizu; Yongqun He; Heather M. Hines; Nizar Ibrahim

Imagine if we could compute across phenotype data as easily as genomic data; this article calls for efforts to realize this vision and discusses the potential benefits.

Functional & Integrative Genomics | 2002

TAIR: a resource for integrated Arabidopsis data.

Margarita Garcia-Hernandez; Tanya Z. Berardini; Guanghong Chen; Debbie Crist; Aisling Doyle; Eva Huala; Emma M. Knee; Mark Lambrecht; Neil Miller; Lukas A. Mueller; Suparna Mundodi; Leonore Reiser; Seung Y. Rhee; Randy Scholl; Julie Tacklind; Dan C. Weems; Yihe Wu; Iris Xu; Daniel Yoo; Jungwon Yoon; Peifen Zhang

Abstract. The Arabidopsis Information Resource (TAIR; http://arabidopsis.org) provides an integrated view of genomic data for Arabidopsis thaliana. The information is obtained from a battery of sources, including the Arabidopsis user community, the literature, and the major genome centers. Currently TAIR provides information about genes, markers, polymorphisms, maps, sequences, clones, DNA and seed stocks, gene families and proteins. In addition, users can find Arabidopsis publications and information about Arabidopsis researchers. Our emphasis is now on incorporating functional annotations of genes and gene products, genome-wide expression, and biochemical pathway data. Among the tools developed at TAIR, the most notable is the Sequence Viewer, which displays gene annotation, clones, transcripts, markers and polymorphisms on the Arabidopsis genome, and allows zooming in to the nucleotide level. A tool recently released is AraCyc, which is designed for visualization of biochemical pathways. We are also developing tools to extract information from the literature in a systematic way, and building controlled vocabularies to describe biological concepts in collaboration with other database groups. A significant new feature is the integration of the ABRC database functions and stock ordering system, which allows users to place orders for seed and DNA stocks directly from the TAIR site.

Genesis | 2015

The arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genome

Tanya Z. Berardini; Leonore Reiser; Donghui Li; Yarik Mezheritsky; Robert J. Muller; Emily Strait; Eva Huala

The Arabidopsis Information Resource (TAIR) is a continuously updated, online database of genetic and molecular biology data for the model plant Arabidopsis thaliana that provides a global research community with centralized access to data for over 30,000 Arabidopsis genes. TAIRs biocurators systematically extract, organize, and interconnect experimental data from the literature along with computational predictions, community submissions, and high throughput datasets to present a high quality and comprehensive picture of Arabidopsis gene function. TAIR provides tools for data visualization and analysis, and enables ordering of seed and DNA stocks, protein chips, and other experimental resources. TAIR actively engages with its users who contribute expertise and data that augments the work of the curatorial staff. TAIRs focus in an extensive and evolving ecosystem of online resources for plant biology is on the critically important role of extracting experimentally based research findings from the literature and making that information computationally accessible. In response to the loss of government grant funding, the TAIR team founded a nonprofit entity, Phoenix Bioinformatics, with the aim of developing sustainable funding models for biological databases, using TAIR as a test case. Phoenix has successfully transitioned TAIR to subscription‐based funding while still keeping its data relatively open and accessible. genesis 53:474–485, 2015.

Database | 2012

Text mining for the biocuration workflow

Lynette Hirschman; Gully A. P. C. Burns; Martin Krallinger; Cecilia N. Arighi; K. Bretonnel Cohen; Alfonso Valencia; Cathy H. Wu; Andrew Chatr-aryamontri; Karen G. Dowell; Eva Huala; Anália Lourenço; Robert Nash; Anne-Lise Veuthey; Thomas C. Wiegers; Andrew Winter

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.

Plant and Cell Physiology | 2013

The Plant Ontology as a Tool for Comparative Plant Anatomy and Genomic Analyses

Laurel Cooper; Ramona L. Walls; Justin Elser; Maria A. Gandolfo; Dennis W. Stevenson; Barry Smith; Justin Preece; Balaji Athreya; Christopher J. Mungall; Stefan A. Rensing; Manuel Hiss; Daniel Lang; Ralf Reski; Tanya Z. Berardini; Donghui Li; Eva Huala; Mary L. Schaeffer; Naama Menda; Elizabeth Arnaud; Rosemary Shrestha; Yukiko Yamazaki; Pankaj Jaiswal

The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary (‘ontology’) of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.

BMC Bioinformatics | 2011

BioCreative III interactive task: an overview

Cecilia N. Arighi; Phoebe M. Roberts; Shashank Agarwal; Sanmitra Bhattacharya; Gianni Cesareni; Andrew Chatr-aryamontri; Simon Clematide; Pascale Gaudet; Michelle G. Giglio; Ian Harrow; Eva Huala; Martin Krallinger; Ulf Leser; Donghui Li; Feifan Liu; Zhiyong Lu; Lois J Maltais; Naoaki Okazaki; Livia Perfetto; Fabio Rinaldi; Rune Sætre; David Salgado; Padmini Srinivasan; Philippe Thomas; Luca Toldo; Lynette Hirschman; Cathy H. Wu

BackgroundThe BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining teams in developing basic capabilities relevant to biological curation, but they did not address the issues of system usage, insertion into the workflow and adoption by curators. Thus in BioCreative III (BC-III), the InterActive Task (IAT) was introduced to address the utility and usability of text mining tools for real-life biocuration tasks. To support the aims of the IAT in BC-III, involvement of both developers and end users was solicited, and the development of a user interface to address the tasks interactively was requested.ResultsA User Advisory Group (UAG) actively participated in the IAT design and assessment. The task focused on gene normalization (identifying gene mentions in the article and linking these genes to standard database identifiers), gene ranking based on the overall importance of each gene mentioned in the article, and gene-oriented document retrieval (identifying full text papers relevant to a selected gene). Six systems participated and all processed and displayed the same set of articles. The articles were selected based on content known to be problematic for curation, such as ambiguity of gene names, coverage of multiple genes and species, or introduction of a new gene name. Members of the UAG curated three articles for training and assessment purposes, and each member was assigned a system to review. A questionnaire related to the interface usability and task performance (as measured by precision and recall) was answered after systems were used to curate articles. Although the limited number of articles analyzed and users involved in the IAT experiment precluded rigorous quantitative analysis of the results, a qualitative analysis provided valuable insight into some of the problems encountered by users when using the systems. The overall assessment indicates that the system usability features appealed to most users, but the system performance was suboptimal (mainly due to low accuracy in gene normalization). Some of the issues included failure of species identification and gene name ambiguity in the gene normalization task leading to an extensive list of gene identifiers to review, which, in some cases, did not contain the relevant genes. The document retrieval suffered from the same shortfalls. The UAG favored achieving high performance (measured by precision and recall), but strongly recommended the addition of features that facilitate the identification of correct gene and its identifier, such as contextual information to assist in disambiguation.DiscussionThe IAT was an informative exercise that advanced the dialog between curators and developers and increased the appreciation of challenges faced by each group. A major conclusion was that the intended users should be actively involved in every phase of software development, and this will be strongly encouraged in future tasks. The IAT Task provides the first steps toward the definition of metrics and functional requirements that are necessary for designing a formal evaluation of interactive curation systems in the BioCreative IV challenge.

Explore More