Matko Bošnjak
University College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Matko Bošnjak.
PLOS ONE | 2011
Fran Supek; Matko Bošnjak; Nives Škunca; Tomislav Šmuc
Outcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). The resulting lists of GO terms may be large and highly redundant, and thus difficult to interpret. REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures. Furthermore, REVIGO visualizes this non-redundant GO term set in multiple ways to assist in interpretation: multidimensional scaling and graph-based visualizations accurately render the subdivisions and the semantic relationships in the data, while treemaps and tag clouds are also offered as alternative views. REVIGO is freely available at http://revigo.irb.hr/.
empirical methods in natural language processing | 2016
Ben Eisner; Tim Rocktäschel; Isabelle Augenstein; Matko Bošnjak; Sebastian Riedel
Many current natural language processing applications for social media rely on representation learning and utilize pre-trained word embeddings. There currently exist several publicly-available, pre-trained sets of word embeddings, but they contain few or no emoji representations even as emoji usage in social media has increased. In this paper we release emoji2vec, pre-trained embeddings for all Unicode emoji which are learned from their description in the Unicode emoji standard. The resulting emoji embeddings can be readily used in downstream social natural language processing applications alongside word2vec. We demonstrate, for the downstream task of sentiment analysis, that emoji embeddings learned from short descriptions outperforms a skip-gram model trained on a large collection of tweets, while avoiding the need for contexts in which emoji need to appear frequently in order to estimate a representation.
meeting of the association for computational linguistics | 2014
Tim Rocktäschel; Matko Bošnjak; Sameer Singh; Sebastian Riedel
Many machine reading approaches, from shallow information extraction to deep semantic parsing, map natural language to symbolic representations of meaning. Representations such as first-order logic capture the richness of natural language and support complex reasoning, but often fail in practice due to their reliance on logical background knowledge and the difficulty of scaling up inference. In contrast, low-dimensional embeddings (i.e. distributional representations) are efficient and enable generalization, but it is unclear how reasoning with embeddings could support the full power of symbolic representations such as first-order logic. In this proof-ofconcept paper we address this by learning embeddings that simulate the behavior of first-order logic.
PLOS Computational Biology | 2013
Nives Škunca; Matko Bošnjak; Anita Kriško; Panče Panov; Sašo Džeroski; Tomislav Šmuc; Fran Supek
New microbial genomes are sequenced at a high pace, allowing insight into the genetics of not only cultured microbes, but a wide range of metagenomic collections such as the human microbiome. To understand the deluge of genomic data we face, computational approaches for gene functional annotation are invaluable. We introduce a novel model for computational annotation that refines two established concepts: annotation based on homology and annotation based on phyletic profiling. The phyletic profiling-based model that includes both inferred orthologs and paralogs—homologs separated by a speciation and a duplication event, respectively—provides more annotations at the same average Precision than the model that includes only inferred orthologs. For experimental validation, we selected 38 poorly annotated Escherichia coli genes for which the model assigned one of three GO terms with high confidence: involvement in DNA repair, protein translation, or cell wall synthesis. Results of antibiotic stress survival assays on E. coli knockout mutants showed high agreement with our models estimates of accuracy: out of 38 predictions obtained at the reported Precision of 60%, we confirmed 25 predictions, indicating that our confidence estimates can be used to make informed decisions on experimental validation. Our work will contribute to making experimental validation of computational predictions more approachable, both in cost and time. Our predictions for 998 prokaryotic genomes include ∼400000 specific annotations with the estimated Precision of 90%, ∼19000 of which are highly specific—e.g. “penicillin binding,” “tRNA aminoacylation for protein translation,” or “pathogenesis”—and are freely available at http://gorbi.irb.hr/.
empirical methods in natural language processing | 2015
Ellery Smith; Nicola Greco; Matko Bošnjak; Andreas Vlachos
Machine comprehension of text is the overarching goal of a great deal of research in natural language processing. The Machine Comprehension Test (Richardson et al., 2013) was recently proposed to assess methods on an open-domain, extensible, and easy-to-evaluate task consisting of two datasets. In this paper we develop a lexical matching method that takes into account multiple context windows, question types and coreference resolution. We show that the proposed method outperforms the baseline of Richardson et al. (2013), and despite its relative simplicity, is comparable to recent work using machine learning. We hope that our approach will inform future work on this task. Furthermore, we argue that MC500 is harder than MC160 due to the way question answer pairs were created.
discovery science | 2014
Nino Antulov-Fantulin; Matko Bošnjak; Vinko Zlatić; Miha Grčar; Tomislav Šmuc
Personalized recommender systems rely on each user’s personal usage data in the system, in order to assist in decision making. However, privacy policies protecting users’ rights prevent these highly personal data from being publicly available to a wider researcher audience. In this work, we propose a memory biased random walk model on a multilayer sequence network, as a generator of synthetic sequential data for recommender systems. We demonstrate the applicability of the generated synthetic data in training recommender system models in cases when privacy policies restrict clickstream publishing.
international conference on machine learning | 2017
Matko Bošnjak; Tim Rocktäschel; Jason Naradowsky; Sebastian Riedel
european conference on principles of data mining and knowledge discovery | 2011
Nino Antulov-Fantulin; Matko Bošnjak; Martin Žnidaršič; Miha Grčar; Mikołaj Morzy; Tomislav Šmuc
meeting of the association for computational linguistics | 2018
Dirk Weissenborn; Pasquale Minervini; Isabelle Augenstein; Johannes Welbl; Tim Rocktäschel; Matko Bošnjak; Jeff Mitchell; Thomas Demeester; Tim Dettmers; Pontus Stenetorp; Sebastian Riedel
meeting of the association for computational linguistics | 2018
Dirk Weissenborn; Pasquale Minervini; Isabelle Augenstein; Johannes Welbl; Tim Rocktäschel; Matko Bošnjak; Jeff Mitchell; Thomas Demeester; Tim Dettmers; Pontus Stenetorp; Sebastian Riedel