Daniele Toti
Roma Tre University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniele Toti.
evolutionary computation machine learning and data mining in bioinformatics | 2011
Paolo Atzeni; Fabio Polticelli; Daniele Toti
We propose a methodology to identify and resolve proteinrelated abbreviations found in the full texts of scientific papers, as part of a semi-automatic process implemented in our PRAISED framework. The identification of biological acronyms is carried out via an effective syntactical approach, by taking advantage of lexical clues and using mostly domain-independent metrics, resulting in considerably high levels of recall as well as extremely low execution time. The subsequent abbreviation resolution uses both syntactical and semantic criteria in order to match an abbreviation with its potential explanation, as discovered among a number of contiguous words proportional to the abbreviations length. We have tested our system against the Medstract Gold Standard corpus and a relevant set of manually annotated PubMed papers, obtaining significant results and high performance levels, while at the same time allowing for great customization, lightness and scalability.
Journal of Computer-aided Molecular Design | 2017
Elena Di Muzio; Daniele Toti; Fabio Polticelli
Molecular docking is a powerful technique that helps uncover the structural and energetic bases of the interaction between macromolecules and substrates, endogenous and exogenous ligands, and inhibitors. Moreover, this technique plays a pivotal role in accelerating the screening of large libraries of compounds for drug development purposes. The need to promote community-driven drug development efforts, especially as far as neglected diseases are concerned, calls for user-friendly tools to allow non-expert users to exploit the full potential of molecular docking. Along this path, here is described the implementation of DockingApp, a freely available, extremely user-friendly, platform-independent application for performing docking simulations and virtual screening tasks using AutoDock Vina. DockingApp sports an intuitive graphical user interface which greatly facilitates both the input phase and the analysis of the results, which can be visualized in graphical form using the embedded JMol applet. The application comes with the DrugBank set of more than 1400 ready-to-dock, FDA-approved drugs, to facilitate virtual screening and drug repurposing initiatives. Furthermore, other databases of compounds such as ZINC, available also in AutoDock format, can be readily and easily plugged in.
international conference on legal knowledge and information systems | 2013
Gaia Arosio; Giuliana Bagnara; Nicola Capuano; Elisabetta Fersini; Daniele Toti
We describe a system for computer-assisted writing of legal documents via a question-based mechanism. This system relies upon an underlying ontological structure meant to represent the data flow from the user’s input, and a corresponding resolution algorithm, implemented within a local engine based on a LastState Next-State model, for navigating the structure and providing the user with meaningful domain-specific support and insight. This system has been successfully applied to the scenario of civil liability for motor vehicles and is part of a larger framework for self-litigation and legal support.
international conference on data engineering | 2011
Paolo Atzeni; Fabio Polticelli; Daniele Toti
We propose a framework for identifying, disambiguating and storing protein-related abbreviations as found in the full texts of scientific papers, in order to build and maintain a publicly available abbreviation repository via a semi-automatic process. This process involves information extraction methods and techniques for acronym identification and resolution, based on lexical clues and syntactical, largely domain-independent criteria. A dictionary and an ontology for proteins provide the means for matching and disambiguating the biological entities. User feedback is gathered at the end of the process and the confirmed entries are then stored and made available to the scientific community for further reviewing.
Bio-Algorithms and Med-Systems | 2012
Daniele Toti; Paolo Atzeni; Fabio Polticelli
Abstract This paper describes a methodology for discovering and resolving protein names abbreviations from the full-text versions of scientific articles, implemented in the PRAISED framework with the ultimate purpose of building up a publicly available abbreviation repository. Three processing steps lie at the core of the framework: i) an abbreviation identification phase, carried out via domain-independent metrics, whose purpose is to identify all possible abbreviations within a scientific text; ii) an abbreviation resolution phase, which takes into account a number of syntactical and semantic criteria in order to match an abbreviation with its potential explanation; and iii) a dictionary-based protein name identification, which is meant to select only those abbreviations belonging to the protein science domain. A local copy of the UniProt database is used as a source repository for all the known proteins. The PRAISED implementation has been tested against several known annotated corpora, such as the Medstract Gold Standard Corpus, the AB3P Corpus, the BioText Corpus and the Ao and Takagi Corpus, obtaining significantly high levels of recall and extremely fast performance, while also keeping promising levels of precision and overall f-measure, in comparison to the most relevant similar methods. This comparison has been carried out up to Phase 2, since those methods stop at expanding abbreviations, without performing any entity recognition. Instead, the entity recognition performed in the last phase provides PRAISED with an effective strategy for protein discovery, thus moving further from existing context-free techniques. Furthermore, this implementation also addresses the complexity of full-text papers, instead of the simpler abstracts more generally used. As such, the whole PRAISED process (Phase 1, 2 and 3) has been also tested against a manually annotated subset of full-text papers retrieved from the PubMed repository, with significant results as well.
Bioinformatics | 2015
Le Viet Hung; Silvia Caprari; Massimiliano Bizai; Daniele Toti; Fabio Polticelli
MOTIVATION In recent years, structural genomics and ab initio molecular modeling activities are leading to the availability of a large number of structural models of proteins whose biochemical function is not known. The aim of this study was the development of a novel software tool that, given a proteins structural model, predicts the presence and identity of active sites and/or ligand binding sites. RESULTS The algorithm implemented by ligand binding site recognition application (LIBRA) is based on a graph theory approach to find the largest subset of similar residues between an input protein and a collection of known functional sites. The algorithm makes use of two predefined databases for active sites and ligand binding sites, respectively, derived from the Catalytic Site Atlas and the Protein Data Bank. Tests indicate that LIBRA is able to identify the correct binding/active site in 90% of the cases analyzed, 90% of which feature the identified site as ranking first. As far as ligand binding site recognition is concerned, LIBRA outperforms other structure-based ligand binding sites detection tools with which it has been compared. AVAILABILITY AND IMPLEMENTATION The application, developed in Java SE 7 with a Swing GUI embedding a JMol applet, can be run on any OS equipped with a suitable Java Virtual Machine (JVM), and is available at the following URL: http://www.computationalbiology.it/software/LIBRAv1.zip.
international conference on bioinformatics | 2011
Paolo Atzeni; Fabio Polticelli; Daniele Toti
We report and comment the experimental results of the PRAISED system, which implements an automatic method for discovering and resolving a wide range of protein name abbreviations from the full-text versions of scientific articles. This system has been recently proposed as part of a framework for creating and maintaining a publicly-accessible abbreviation repository. The testing phase was carried out against the widely used Medstract Gold Standard Corpus and a relevant subset of real scientific papers extracted from the PubMed database. As far as the Medstract corpus is concerned, we obtained significantly high results in terms of recall, precision and overall correctness. As for the full-text papers, results inevitably varied, due to the complex and often chaotic nature of the confronted domain; even so, we detected encouraging levels of recall and extremely fast execution times. The major strength of the system lies in addressing the unstructuredness of the scientific publications and being able to save time and effort for extracting protein-related information in an automatic fashion, while at the same time keeping computational overhead to a minimum thanks to its light-weight approach.
signal-image technology and internet-based systems | 2012
Daniele Toti; Paolo Atzeni; Fabio Polticelli
We describe a methodology for identifying characterizing terms from a source text or paper and automatically building an ontology around them, with the purpose of semantically categorizing a paper corpus where documents sharing similar subjects may be subsequently clustered together by means of ontology alignment. We first employ a Natural Language Processing pipeline to extract relevant terms from the source text, and then use a combination of a pattern-based and machine-learning approach to establish semantic relationships among those terms, with some users feedback required in-between. This methodology for discovering characterizing knowledge from textual sources finds its inception as an extension of PRAISED, our abbreviation discovery framework, in order to enhance its resolution capabilities. By moving from a paper-by-paper, mainly syntactical process to a corpus-based, semantic approach, it was in fact possible to overcome earlier limits of the system related to abbreviations whose explanation could not be found within the same paper they were cited in. At the same time, though, the methodology we present is not tied to this specific task, but is instead of relevance for a variety of contexts, and might therefore be used to build a stand-alone system for advanced knowledge extraction and semantic categorization.
intelligent networking and collaborative systems | 2016
Daniele Toti; Marco Rinelli
This work introduces CONCEPTUM, an advanced knowledge discovery system for speed-reading natural language texts and allowing faster and more effective learning. CONCEPTUM sports a huge plethora of features, ranging from language detection and conceptualization, up to semantic categorization, named entity recognition and automatic ontology building, effectively turning an unstructured textual source into concepts, topics, relationships and summaries to quickly and easily browse it and classify it. The system does not require any training or configuration and at present can be applied as-is on general-purpose English and Italian texts, providing disparate kinds of users with a powerful means to significantly speed up and improve their learning and research activities. In this work, a challenging experimentation on the Biochemistry field is reported to highlight and discuss the arising critical issues in the application of the system on a highly-technical domain.
Bioinformatics | 2018
Daniele Toti; Le Viet Hung; Valentina Tortosa; Valentina Brandi; Fabio Polticelli
Summary Recently, LIBRA, a tool for active/ligand binding site prediction, was described. LIBRAs effectiveness was comparable to similar state‐of‐the‐art tools; however, its scoring scheme, output presentation, dependence on local resources and overall convenience were amenable to improvements. To solve these issues, LIBRA‐WA, a web application based on an improved LIBRA engine, has been developed, featuring a novel scoring scheme consistently improving LIBRAs performance, and a refined algorithm that can identify binding sites hosted at the interface between different subunits. LIBRA‐WA also sports additional functionalities like ligand clustering and a completely redesigned interface for an easier analysis of the output. Extensive tests on 373 apoprotein structures indicate that LIBRA‐WA is able to identify the biologically relevant ligand/ligand binding site in 357 cases (˜96%), with the correct prediction ranking first in 349 cases (˜98% of the latter, ˜94% of the total). The earlier stand‐alone tool has also been updated and dubbed LIBRA+, by integrating LIBRA‐WAs improved engine for cross‐compatibility purposes. Availability and implementation LIBRA‐WA and LIBRA+ are available at: http://www.computationalbiology.it/software.html. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.