Attila Kertesz-Farkas
International Centre for Genetic Engineering and Biotechnology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Attila Kertesz-Farkas.
Nature | 2015
Bruna Marini; Attila Kertesz-Farkas; Hashim Ali; Bojana Lucic; Kamil Lisek; Lara Manganaro; Sándor Pongor; Roberto Luzzati; Fulvio Mavilio; Mauro Giacca; Marina Lusic
Long-standing evidence indicates that human immunodeficiency virus type 1 (HIV-1) preferentially integrates into a subset of transcriptionally active genes of the host cell genome. However, the reason why the virus selects only certain genes among all transcriptionally active regions in a target cell remains largely unknown. Here we show that HIV-1 integration occurs in the outer shell of the nucleus in close correspondence with the nuclear pore. This region contains a series of cellular genes, which are preferentially targeted by the virus, and characterized by the presence of active transcription chromatin marks before viral infection. In contrast, the virus strongly disfavours the heterochromatic regions in the nuclear lamin-associated domains and other transcriptionally active regions located centrally in the nucleus. Functional viral integrase and the presence of the cellular Nup153 and LEDGF/p75 integration cofactors are indispensable for the peripheral integration of the virus. Once integrated at the nuclear pore, the HIV-1 DNA makes contact with various nucleoporins; this association takes part in the transcriptional regulation of the viral genome. These results indicate that nuclear topography is an essential determinant of the HIV-1 life cycle.
Bioinformatics | 2011
Emily Doughty; Attila Kertesz-Farkas; Olivier Bodenreider; Gary Thompson; Asa Adadey; Thomas A. Peterson; Maricel G. Kann
MOTIVATION A major goal of biomedical research in personalized medicine is to find relationships between mutations and their corresponding disease phenotypes. However, most of the disease-related mutational data are currently buried in the biomedical literature in textual form and lack the necessary structure to allow easy retrieval and visualization. We introduce a high-throughput computational method for the identification of relevant disease mutations in PubMed abstracts applied to prostate (PCa) and breast cancer (BCa) mutations. RESULTS We developed the extractor of mutations (EMU) tool to identify mutations and their associated genes. We benchmarked EMU against MutationFinder--a tool to extract point mutations from text. Our results show that both methods achieve comparable performance on two manually curated datasets. We also benchmarked EMUs performance for extracting the complete mutational information and phenotype. Remarkably, we show that one of the steps in our approach, a filter based on sequence analysis, increases the precision for that task from 0.34 to 0.59 (PCa) and from 0.39 to 0.61 (BCa). We also show that this high-throughput approach can be extended to other diseases. DISCUSSION Our method improves the current status of disease-mutation databases by significantly increasing the number of annotated mutations. We found 51 and 128 mutations manually verified to be related to PCa and Bca, respectively, that are not currently annotated for these cancer types in the OMIM or Swiss-Prot databases. EMUs retrieval performance represents a 2-fold improvement in the number of annotated mutations for PCa and BCa. We further show that our method can benefit from full-text analysis once there is an increase in Open Access availability of full-text articles. AVAILABILITY Freely available at: http://bioinf.umbc.edu/EMU/ftp.
Journal of Proteome Research | 2014
Sean McIlwain; Kaipo Tamura; Attila Kertesz-Farkas; Charles E. Grant; Benjamin J. Diament; Barbara Frewen; J. Jeffry Howbert; Michael R. Hoopmann; Lukas Käll; Jimmy K. Eng; Michael J. MacCoss; William Stafford Noble
Efficiently and accurately analyzing big protein tandem mass spectrometry data sets requires robust software that incorporates state-of-the-art computational, machine learning, and statistical methods. The Crux mass spectrometry analysis software toolkit (http://cruxtoolkit.sourceforge.net) is an open source project that aims to provide users with a cross-platform suite of analysis tools for interpreting protein mass spectrometry data.
Journal of Proteome Research | 2015
Uri Keich; Attila Kertesz-Farkas; William Stafford Noble
Interpreting the potentially vast number of hypotheses generated by a shotgun proteomics experiment requires a valid and accurate procedure for assigning statistical confidence estimates to identified tandem mass spectra. Despite the crucial role such procedures play in most high-throughput proteomics experiments, the scientific literature has not reached a consensus about the best confidence estimation methodology. In this work, we evaluate, using theoretical and empirical analysis, four previously proposed protocols for estimating the false discovery rate (FDR) associated with a set of identified tandem mass spectra: two variants of the target-decoy competition protocol (TDC) of Elias and Gygi and two variants of the separate target-decoy search protocol of Käll et al. Our analysis reveals significant biases in the two separate target-decoy search protocols. Moreover, the one TDC protocol that provides an unbiased FDR estimate among the target PSMs does so at the cost of forfeiting a random subset of high-scoring spectrum identifications. We therefore propose the mix-max procedure to provide unbiased, accurate FDR estimates in the presence of well-calibrated scores. The method avoids biases associated with the two separate target-decoy search protocols and also avoids the propensity for target-decoy competition to discard a random subset of high-scoring target identifications.
Journal of Proteome Research | 2015
Attila Kertesz-Farkas; Uri Keich; William Stafford Noble
Accurate assignment of peptide sequences to observed fragmentation spectra is hindered by the large number of hypotheses that must be considered for each observed spectrum. A high score assigned to a particular peptide–spectrum match (PSM) may not end up being statistically significant after multiple testing correction. Researchers can mitigate this problem by controlling the hypothesis space in various ways: considering only peptides resulting from enzymatic cleavages, ignoring possible post-translational modifications or single nucleotide variants, etc. However, these strategies sacrifice identifications of spectra generated by rarer types of peptides. In this work, we introduce a statistical testing framework, cascade search, that directly addresses this problem. The method requires that the user specify a priori a statistical confidence threshold as well as a series of peptide databases. For instance, such a cascade of databases could include fully tryptic, semitryptic, and nonenzymatic peptides or peptides with increasing numbers of modifications. Cascaded search then gradually expands the list of candidate peptides from more likely peptides toward rare peptides, sequestering at each stage any spectrum that is identified with a specified statistical confidence. We compare cascade search to a standard procedure that lumps all of the peptides into a single database, as well as to a previously described group FDR procedure that computes the FDR separately within each database. We demonstrate, using simulated and real data, that cascade search identifies more spectra at a fixed FDR threshold than with either the ungrouped or grouped approach. Cascade search thus provides a general method for maximizing the number of identified spectra in a statistically rigorous fashion.
Database | 2013
Roberto Vera; Yasset Perez-Riverol; Sonia Perez; Balázs Ligeti; Attila Kertesz-Farkas; Sándor Pongor
The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framework that allows a user to build his/her own integrated database from the most popular data sources. JBioWH can be used for intensive querying of multiple data sources and the creation of streamlined task-specific data sets on local PCs. JBioWH is based on a MySQL relational database scheme and includes JAVA API parser functions for retrieving data from 20 public databases (e.g. NCBI, KEGG, etc.). It also includes a client desktop application for (non-programmer) users to query data. In addition, JBioWH can be tailored for use in specific circumstances, including the handling of massive queries for high-throughput analyses or CPU intensive calculations. The framework is provided with complete documentation and application examples and it can be downloaded from the Project Web site at http://code.google.com/p/jbiowh. A MySQL server is available for demonstration purposes at hydrax.icgeb.trieste.it:3307. Database URL: http://code.google.com/p/jbiowh
PLOS ONE | 2014
János Juhász; Attila Kertesz-Farkas; Dóra Szabó; Sándor Pongor
Multispecies bacterial communities such as the microbiota of the gastrointestinal tract can be remarkably stable and resilient even though they consist of cells and species that compete for resources and also produce a large number of antimicrobial agents. Computational modeling suggests that horizontal transfer of resistance genes may greatly contribute to the formation of stable and diverse communities capable of protecting themselves with a battery of antimicrobial agents while preserving a varied metabolic repertoire of the constituent species. In other words horizontal transfer of resistance genes makes a community compatible in terms of exoproducts and capable to maintain a varied and mature metagenome. The same property may allow microbiota to protect a host organism, or if used as a microbial therapy, to purge pathogens and restore a protective environment.
Current Protein & Peptide Science | 2010
Somdutta Dhir; Mircea Pacurar; Dino Franklin; Zoltán Gáspári; Attila Kertesz-Farkas; András Kocsor; Frank Eisenhaber; Sándor Pongor
SBASE is a project initiated to detect known domain types and predicting domain architectures using sequence similarity searching (Simon et al., Protein Seq Data Anal, 5: 39-42, 1992, Pongor et al, Nucl. Acids. Res. 21:3111-3115, 1992). The current approach uses a curated collection of domain sequences - the SBASE domain library - and standard similarity search algorithms, followed by postprocessing which is based on a simple statistics of the domain similarity network (http://hydra.icgeb.trieste.it/sbase/). It is especially useful in detecting rare, atypical examples of known domain types which are sometimes missed even by more sophisticated methodologies. This approach does not require multiple alignment or machine learning techniques, and can be a useful complement to other domain detection methodologies. This article gives an overview of the project history as well as of the concepts and principles developed within this the project.
Bioinformatics | 2014
Attila Kertesz-Farkas; Beáta Reiz; Roberto Vera; Michael P. Myers; Sándor Pongor
MOTIVATION Tandem mass spectrometry has become a standard tool for identifying post-translational modifications (PTMs) of proteins. Algorithmic searches for PTMs from tandem mass spectrum data (MS/MS) tend to be hampered by noisy data as well as by a combinatorial explosion of search space. This leads to high uncertainty and long search-execution times. RESULTS To address this issue, we present PTMTreeSearch, a new algorithm that uses a large database of known PTMs to identify PTMs from MS/MS data. For a given peptide sequence, PTMTreeSearch builds a computational tree wherein each path from the root to the leaves is labeled with the amino acids of a peptide sequence. Branches then represent PTMs. Various empirical tree pruning rules have been designed to decrease the search-execution time by eliminating biologically unlikely solutions. PTMTreeSearch first identifies a relatively small set of high confidence PTM types, and in a second stage, performs a more exhaustive search on this restricted set using relaxed search parameter settings. An analysis of experimental data shows that using the same criteria for false discovery, PTMTreeSearch annotates more peptides than the current state-of-the-art methods and PTM identification algorithms, and achieves this at roughly the same execution time. PTMTreeSearch is implemented as a plugable scoring function in the X!Tandem search engine. AVAILABILITY The source code of PTMTreeSearch and a demo server application can be found at http://net.icgeb.org/ptmtreesearch
Archive | 2009
Attila Kertesz-Farkas; András Kocsor; Sándor Pongor
Text compressor algorithms can be used to construct metric distance measures (CBDs) suitable for character sequences. Here we review the principle of various types of compressor algorithms and describe their general behaviour with respect to the comparison of protein and DNA sequences. We employ reduced and enlarged alphabets, and model biological rearrangements like domain shuffling. In the classification experiments evaluated with ROC analysis, CBDs perform less well than substring-based methods such as the BLAST and the Smith–Waterman algorithms, but perform better than distances based on word composition. CBDs outperformed substring methods with respect to domain shuffling, and in some cases showed an increased performance when the alphabet was reduced.
Collaboration
Dive into the Attila Kertesz-Farkas's collaboration.
International Centre for Genetic Engineering and Biotechnology
View shared research outputsInternational Centre for Genetic Engineering and Biotechnology
View shared research outputsInternational Centre for Genetic Engineering and Biotechnology
View shared research outputsInternational Centre for Genetic Engineering and Biotechnology
View shared research outputs