Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Markus Bundschus is active.

Publication


Featured researches published by Markus Bundschus.


BMC Bioinformatics | 2008

Extraction of semantic biomedical relations from text using conditional random fields

Markus Bundschus; Mathaeus Dejori; Martin Stetter; Volker Tresp; Hans-Peter Kriegel

BackgroundThe increasing amount of published literature in biomedicine represents an immense source of knowledge, which can only efficiently be accessed by a new generation of automated information extraction tools. Named entity recognition of well-defined objects, such as genes or proteins, has achieved a sufficient level of maturity such that it can form the basis for the next step: the extraction of relations that exist between the recognized entities. Whereas most early work focused on the mere detection of relations, the classification of the type of relation is also of great importance and this is the focus of this work. In this paper we describe an approach that extracts both the existence of a relation and its type. Our work is based on Conditional Random Fields, which have been applied with much success to the task of named entity recognition.ResultsWe benchmark our approach on two different tasks. The first task is the identification of semantic relations between diseases and treatments. The available data set consists of manually annotated PubMed abstracts. The second task is the identification of relations between genes and diseases from a set of concise phrases, so-called GeneRIF (Gene Reference Into Function) phrases. In our experimental setting, we do not assume that the entities are given, as is often the case in previous relation extraction work. Rather the extraction of the entities is solved as a subproblem. Compared with other state-of-the-art approaches, we achieve very competitive results on both data sets. To demonstrate the scalability of our solution, we apply our approach to the complete human GeneRIF database. The resulting gene-disease network contains 34758 semantic associations between 4939 genes and 1745 diseases. The gene-disease network is publicly available as a machine-readable RDF graph.ConclusionWe extend the framework of Conditional Random Fields towards the annotation of semantic relations from text and apply it to the biomedical domain. Our approach is based on a rich set of textual features and achieves a performance that is competitive to leading approaches. The model is quite general and can be extended to handle arbitrary biological entities and relation types. The resulting gene-disease network shows that the GeneRIF database provides a rich knowledge source for text mining. Current work is focused on improving the accuracy of detection of entities as well as entity boundaries, which will also greatly improve the relation extraction performance.


PLOS ONE | 2011

Gene-Disease Network Analysis Reveals Functional Modules in Mendelian, Complex and Environmental Diseases

Anna Bauer-Mehren; Markus Bundschus; Michael Rautschka; Miguel Angel Mayer; Ferran Sanz; Laura I. Furlong

Background Scientists have been trying to understand the molecular mechanisms of diseases to design preventive and therapeutic strategies for a long time. For some diseases, it has become evident that it is not enough to obtain a catalogue of the disease-related genes but to uncover how disruptions of molecular networks in the cell give rise to disease phenotypes. Moreover, with the unprecedented wealth of information available, even obtaining such catalogue is extremely difficult. Principal Findings We developed a comprehensive gene-disease association database by integrating associations from several sources that cover different biomedical aspects of diseases. In particular, we focus on the current knowledge of human genetic diseases including mendelian, complex and environmental diseases. To assess the concept of modularity of human diseases, we performed a systematic study of the emergent properties of human gene-disease networks by means of network topology and functional annotation analysis. The results indicate a highly shared genetic origin of human diseases and show that for most diseases, including mendelian, complex and environmental diseases, functional modules exist. Moreover, a core set of biological pathways is found to be associated with most human diseases. We obtained similar results when studying clusters of diseases, suggesting that related diseases might arise due to dysfunction of common biological processes in the cell. Conclusions For the first time, we include mendelian, complex and environmental diseases in an integrated gene-disease association database and show that the concept of modularity applies for all of them. We furthermore provide a functional analysis of disease-related modules providing important new biological insights, which might not be discovered when considering each of the gene-disease association repositories independently. Hence, we present a suitable framework for the study of how genetic and environmental factors, such as drugs, contribute to diseases. Availability The gene-disease networks used in this study and part of the analysis are available at http://ibi.imim.es/DisGeNET/DisGeNETweb.html#Download.


knowledge discovery and data mining | 2008

Anticipating annotations and emerging trends in biomedical literature

Fabian Mörchen; Mathäus Dejori; Dmitriy Fradkin; Julien Etienne; Bernd Wachmann; Markus Bundschus

The BioJournalMonitor is a decision support system for the analysis of trends and topics in the biomedical literature. Its main goal is to identify potential diagnostic and therapeutic biomarkers for specific diseases. Several data sources are continuously integrated to provide the user with up-to-date information on current research in this field. State-of-the-art text mining technologies are deployed to provide added value on top of the original content, including named entity detection, relation extraction, classification, clustering, ranking, summarization, and visualization. We present two novel technologies that are related to the analysis of temporal dynamics of text archives and associated ontologies. Currently, the MeSH ontology is used to annotate the scientific articles entering the PubMed database with medical terms. Both the maintenance of the ontology as well as the annotation of new articles is performed largely manually. We describe how probabilistic topic models can be used to annotate recent articles with the most likely MeSH terms. This provides our users with a competitive advantage because, when searching for MeSH terms, articles are found long before they are manually annotated. We further present a study on how to predict the inclusion of new terms in the MeSH ontology. The results suggest that early prediction of emerging trends is possible. The trend ranking functions are deployed in our system to enable interactive searches for the hottest new trends relating to a disease.


international conference on data mining | 2009

Hierarchical Bayesian Models for Collaborative Tagging Systems

Markus Bundschus; Shipeng Yu; Volker Tresp; Achim Rettinger; Mathaeus Dejori; Hans-Peter Kriegel

Collaborative tagging systems with user generated content have become a fundamental element of websites such as Delicious, Flickr or CiteULike. By sharing common knowledge, massively linked semantic data sets are generated that provide new challenges for data mining. In this paper, we reduce the data complexity in these systems by finding meaningful topics that serve to group similar users and serve to recommend tags or resources to users. We propose a well-founded probabilistic approach that can model every aspect of a collaborative tagging system. By integrating both user information and tag information into the well-known Latent Dirichlet Allocation framework, the developed models can be used to solve a number of important information extraction and retrieval tasks.


international semantic web conference | 2008

Towards Machine Learning on the Semantic Web

Volker Tresp; Markus Bundschus; Achim Rettinger; Yi Huang

In this paper we explore some of the opportunities and challenges for machine learning on the Semantic Web. The Semantic Web provides standardized formats for the representation of both data and ontological background knowledge. Semantic Web standards are used to describe meta data but also have great potential as a general data format for data communication and data integration. Within a broad range of possible applications machine learning will play an increasingly important role: Machine learning solutions have been developed to support the management of ontologies, for the semi-automatic annotation of unstructured data, and to integrate semantic information into web mining. Machine learning will increasingly be employed to analyze distributed data sources described in Semantic Web formats and to support approximate Semantic Web reasoning and querying. In this paper we discuss existing and future applications of machine learning on the Semantic Web with a strong focus on learning algorithms that are suitable for the relational character of the Semantic Webs data structure. We discuss some of the particular aspects of learning that we expect will be of relevance for the Semantic Web such as scalability, missing and contradicting data, and the potential to integrate ontological background knowledge. In addition we review some of the work on the learning of ontologies and on the population of ontologies, mostly in the context of textual data.


inductive logic programming | 2010

Multivariate prediction for learning on the semantic web

Yi Huang; Volker Tresp; Markus Bundschus; Achim Rettinger; Hans-Peter Kriegel

One of the main characteristics of Semantic Web (SW) data is that it is notoriously incomplete: in the same domain a great deal might be known for some entities and almost nothing might be known for others. A popular example is the well known friend-of-a-friend data set where some members document exhaustive private and social information whereas, for privacy concerns and other reasons, almost nothing is known for other members. Although deductive reasoning can be used to complement factual knowledge based on the ontological background, still a tremendous number of potentially true statements remain to be uncovered. The paper is focused on the prediction of potential relationships and attributes by exploiting regularities in the data using statistical relational learning algorithms. We argue that multivariate prediction approaches are most suitable for dealing with the resulting high-dimensional sparse data matrix. Within the statistical framework, the approach scales up to large domains and is able to deal with highly sparse relationship data. A major goal of the presented work is to formulate an inductive learning approach that can be used by people with little machine learning background. We present experimental results using a friend-of-a-friend data set.


computer software and applications conference | 2009

Towards a Next-Generation Matrix Library for Java

Holger Arndt; Markus Bundschus; Andreas Naegele

Matrices are essential in many fields of computer science, especially when large amounts of data must be handled efficiently. Despite this demand for matrix software, we were unable to find a Java library which was flexible enough to match all our needs. In this paper, we present the Universal Java Matrix Package (UJMP), an innovative software architecture in Java to store and process matrices with interfaces to external data sources such as Excel files or SQL-databases, allowing to handle data which would not fit into main memory.In contrast to all other approaches which we are aware of, our package is the only one to support very large matrices with up to 2^63 rows or columns. In addition, the use of variable argument lists provides a convenient way for accessing multi-dimensional data without the need to specify dimensionality at compile time. Arbitrary data types be handled through the use of Java Generics. Another key feature is the strict separation of interfaces and classes, making data storage implementation and math engine exchangeable, at runtime. This flexible architecture allows the user to decide whether operations should be optimized for speed or memory usage and makes the system easily extendable through existing libraries, incorporating their individual strengths.


Journal of Biomedical Semantics | 2013

Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed

Daniel Eisinger; George Tsatsaronis; Markus Bundschus; Ulrich Wieneke; Michael Schroeder

Document search on PubMed, the pre-eminent database for biomedical literature, relies on the annotation of its documents with relevant terms from the Medical Subject Headings ontology (MeSH) for improving recall through query expansion. Patent documents are another important information source, though they are considerably less accessible. One option to expand patent search beyond pure keywords is the inclusion of classification information: Since every patent is assigned at least one class code, it should be possible for these assignments to be automatically used in a similar way as the MeSH annotations in PubMed. In order to develop a system for this task, it is necessary to have a good understanding of the properties of both classification systems. This report describes our comparative analysis of MeSH and the main patent classification system, the International Patent Classification (IPC). We investigate the hierarchical structures as well as the properties of the terms/classes respectively, and we compare the assignment of IPC codes to patents with the annotation of PubMed documents with MeSH terms.Our analysis shows a strong structural similarity of the hierarchies, but significant differences of terms and annotations. The low number of IPC class assignments and the lack of occurrences of class labels in patent texts imply that current patent search is severely limited. To overcome these limits, we evaluate a method for the automated assignment of additional classes to patent documents, and we propose a system for guided patent search based on the use of class co-occurrence information and external resources.


arXiv: Computers and Society | 2016

Going Digital: A Survey on Digitalization and Large-Scale Data Analytics in Healthcare

Volker Tresp; J. Marc Overhage; Markus Bundschus; Shahrooz Rabizadeh; Peter A. Fasching; Shipeng Yu

We provide an overview of the recent trends toward digitalization and large-scale data analytics in healthcare. It is expected that these trends are instrumental in the dramatic changes in the way healthcare will be organized in the future. We discuss the recent political initiatives designed to shift care delivery processes from paper to electronic, with the goals of more effective treatments with better outcomes; cost pressure is a major driver of innovation. We describe newly developed networks of healthcare providers, research organizations, and commercial vendors to jointly analyze data for the development of decision support systems. We address the trend toward continuous healthcare where health is monitored by wearable and stationary devices; a related development is that patients increasingly assume responsibility for their own health data. Finally, we discuss recent initiatives toward a personalized medicine, based on advances in molecular medicine, data management, and data analytics.


Drug Discovery Today | 2016

Reflection of successful anticancer drug development processes in the literature

Fabian Heinemann; Torsten Huber; Christian Meisel; Markus Bundschus; Ulf Leser

The development of cancer drugs is time-consuming and expensive. In particular, failures in late-stage clinical trials are a major cost driver for pharmaceutical companies. This puts a high demand on methods that provide insights into the success chances of new potential medicines. In this study, we systematically analyze publication patterns emerging along the drug discovery process of targeted cancer therapies, starting from basic research to drug approval - or failure. We find clear differences in the patterns of approved drugs compared with those that failed in Phase II/III. Feeding these features into a machine learning classifier allows us to predict the approval or failure of a targeted cancer drug significantly better than educated guessing. We believe that these findings could lead to novel measures for supporting decision making in drug development.

Collaboration


Dive into the Markus Bundschus's collaboration.

Top Co-Authors

Avatar

Achim Rettinger

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Daniel Eisinger

Dresden University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Michael Schroeder

Dresden University of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge