Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Meelis Kull is active.

Publication


Featured researches published by Meelis Kull.


Nucleic Acids Research | 2007

g:Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments

Jüri Reimand; Meelis Kull; Hedi Peterson; Jaanus Hansen; Jaak Vilo

g:Profiler (http://biit.cs.ut.ee/gprofiler/) is a public web server for characterising and manipulating gene lists resulting from mining high-throughput genomic data. g:Profiler has a simple, user-friendly web interface with powerful visualisation for capturing Gene Ontology (GO), pathway, or transcription factor binding site enrichments down to individual gene levels. Besides standard multiple testing corrections, a new improved method for estimating the true effect of multiple testing over complex structures like GO has been introduced. Interpreting ranked gene lists is supported from the same interface with very efficient algorithms. Such ordered lists may arise when studying the most significantly affected genes from high-throughput data or genes co-expressed with the query gene. Other important aspects of practical data analysis are supported by modules tightly integrated with g:Profiler. These are: g:Convert for converting between different database identifiers; g:Orth for finding orthologous genes from other species; and g:Sorter for searching a large body of public gene expression data for co-expression. g:Profiler supports 31 different species, and underlying data is updated regularly from sources like the Ensembl database. Bioinformatics communities wishing to integrate with g:Profiler can use alternative simple textual outputs.


Nucleic Acids Research | 2004

Expression Profiler: next generation—an online platform for analysis of microarray data

Misha Kapushesky; Patrick Kemmeren; Aedín C. Culhane; Steffen Durinck; Jan Ihmels; Christine Körner; Meelis Kull; Aurora Torrente; Ugis Sarkans; Jaak Vilo; Alvis Brazma

Expression Profiler (EP, http://www.ebi.ac.uk/expressionprofiler) is a web-based platform for microarray gene expression and other functional genomics-related data analysis. The new architecture, Expression Profiler: next generation (EP:NG), modularizes the original design and allows individual analysis-task-related components to be developed by different groups and yet still seamlessly to work together and share the same user interface look and feel. Data analysis components for gene expression data preprocessing, missing value imputation, filtering, clustering methods, visualization, significant gene finding, between group analysis and other statistical components are available from the EBI (European Bioinformatics Institute) web site. The web-based design of Expression Profiler supports data sharing and collaborative analysis in a secure environment. Developed tools are integrated with the microarray gene expression database ArrayExpress and form the exploratory analytical front-end to those data. EP:NG is an open-source project, encouraging broad distribution and further extensions from the scientific community.


Genomics | 2009

ASTD: The Alternative Splicing and Transcript Diversity database

Gautier Koscielny; Vincent Le Texier; Chellappa Gopalakrishnan; Vasudev Kumanduri; Jean-Jack Riethoven; Francesco Nardone; Eleanor Stanley; Christine Fallsehr; Oliver Hofmann; Meelis Kull; Eoghan D. Harrington; Stephanie Boue; Eduardo Eyras; Mireya Plass; Fabrice Lopez; William Ritchie; Virginie Moucadel; Takeshi Ara; Heike Pospisil; Alexander M. Herrmann; Jens G. Reich; Roderic Guigó; Peer Bork; Magnus von Knebel Doeberitz; Jaak Vilo; Winston Hide; Rolf Apweiler; Thangavel Alphonse Thanaraj; Daniel Gautheret

The Alternative Splicing and Transcript Diversity database (ASTD) gives access to a vast collection of alternative transcripts that integrate transcription initiation, polyadenylation and splicing variant data. Alternative transcripts are derived from the mapping of transcribed sequences to the complete human, mouse and rat genomes using an extension of the computational pipeline developed for the ASD (Alternative Splicing Database) and ATD (Alternative Transcript Diversity) databases, which are now superseded by ASTD. For the human genome, ASTD identifies splicing variants, transcription initiation variants and polyadenylation variants in 68%, 68% and 62% of the gene set, respectively, consistent with current estimates for transcription variation. Users can access ASTD through a variety of browsing and query tools, including expression state-based queries for the identification of tissue-specific isoforms. Participating laboratories have experimentally validated a subset of ASTD-predicted alternative splice forms and alternative polyadenylation forms that were not previously reported. The ASTD database can be accessed at http://www.ebi.ac.uk/astd.


Genome Biology | 2009

Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods

Priit Adler; Meelis Kull; Aleksandr Tkachenko; Hedi Peterson; Jüri Reimand; Jaak Vilo

We present a web resource MEM (Multi-Experiment Matrix) for gene expression similarity searches across many datasets. MEM features large collections of microarray datasets and utilizes rank aggregation to merge information from different datasets into a single global ordering with simultaneous statistical significance estimation. Unique features of MEM include automatic detection, characterization and visualization of datasets that includes the strongest coexpression patterns. MEM is freely available at http://biit.cs.ut.ee/mem/.


Genome Biology | 2010

Comprehensive transcriptome analysis of mouse embryonic stem cell adipogenesis unravels new processes of adipocyte development

Nathalie Billon; Jüri Reimand; Miguel C. Monteiro; Meelis Kull; Hedi Peterson; Konstantin Tretyakov; Priit Adler; Brigitte Wdziekonski; Jaak Vilo; Christian Dani

BackgroundThe current epidemic of obesity has caused a surge of interest in the study of adipose tissue formation. While major progress has been made in defining the molecular networks that control adipocyte terminal differentiation, the early steps of adipocyte development and the embryonic origin of this lineage remain largely unknown.ResultsHere we performed genome-wide analysis of gene expression during adipogenesis of mouse embryonic stem cells (ESCs). We then pursued comprehensive bioinformatic analyses, including de novo functional annotation and curation of the generated data within the context of biological pathways, to uncover novel biological functions associated with the early steps of adipocyte development. By combining in-depth gene regulation studies and in silico analysis of transcription factor binding site enrichment, we also provide insights into the transcriptional networks that might govern these early steps.ConclusionsThis study supports several biological findings: firstly, adipocyte development in mouse ESCs is coupled to blood vessel morphogenesis and neural development, just as it is during mouse development. Secondly, the early steps of adipocyte formation involve major changes in signaling and transcriptional networks. A large proportion of the transcription factors that we uncovered in mouse ESCs are also expressed in the mouse embryonic mesenchyme and in adipose tissues, demonstrating the power of our approach to probe for genes associated with early developmental processes on a genome-wide scale. Finally, we reveal a plethora of novel candidate genes for adipocyte development and present a unique resource that can be further explored in functional assays.


international conference on machine learning and applications | 2014

LaCova: A Tree-Based Multi-label Classifier Using Label Covariance as Splitting Criterion

Reem M Al-Otaibi; Meelis Kull; Peter A. Flach

Dealing with multiple labels is a supervised learning problem of increasing importance. Multi-label classifiers face the challenge of exploiting correlations between labels. While in existing work these correlations are often modelled globally, in this paper we use the divide-and-conquer approach of decision trees which enables taking local decisions about how best to model label dependency. The resulting algorithm establishes a tree-based multi-label classifier called LaCova which dynamically interpolates between two well-known baseline methods: Binary Relevance, which assumes all labels independent, and Label Power set, which learns the joint label distribution. The key idea is a splitting criterion based on the label covariance matrix at that node, which allows us to choose between a horizontal split (branching on a feature) and a vertical split (separating the labels). Empirical results on 12 data sets show strong performance of the proposed method, particularly on data sets with hundreds of labels.


Biodata Mining | 2008

Fast approximate hierarchical clustering using similarity heuristics

Meelis Kull; Jaak Vilo

BackgroundAgglomerative hierarchical clustering (AHC) is a common unsupervised data analysis technique used in several biological applications. Standard AHC methods require that all pairwise distances between data objects must be known. With ever-increasing data sizes this quadratic complexity poses problems that cannot be overcome by simply waiting for faster computers.ResultsWe propose an approximate AHC algorithm HappieClust which can output a biologically meaningful clustering of a large dataset more than an order of magnitude faster than full AHC algorithms. The key to the algorithm is to limit the number of calculated pairwise distances to a carefully chosen subset of all possible distances. We choose distances using a similarity heuristic based on a small set of pivot objects. The heuristic efficiently finds pairs of similar objects and these help to mimic the greedy choices of full AHC. Quality of approximate AHC as compared to full AHC is studied with three measures. The first measure evaluates the global quality of the achieved clustering, while the second compares biological relevance using enrichment of biological functions in every subtree of the clusterings. The third measure studies how well the contents of subtrees are conserved between the clusterings.ConclusionThe HappieClust algorithm is well suited for large-scale gene expression visualization and analysis both on personal computers as well as public online web applications. The software is available from the URL http://www.quretec.com/HappieClust


Machine Learning | 2016

Cost-sensitive boosting algorithms: Do we really need them?

Nikolaos Nikolaou; Narayanan Unny Edakunni; Meelis Kull; Peter A. Flach; Gavin Brown

We provide a unifying perspective for two decades of work on cost-sensitive Boosting algorithms. When analyzing the literature 1997–2016, we find 15 distinct cost-sensitive variants of the original algorithm; each of these has its own motivation and claims to superiority—so who should we believe? In this work we critique the Boosting literature using four theoretical frameworks: Bayesian decision theory, the functional gradient descent view, margin theory, and probabilistic modelling. Our finding is that only three algorithms are fully supported—and the probabilistic model view suggests that all require their outputs to be calibrated for best performance. Experiments on 18 datasets across 21 degrees of imbalance support the hypothesis—showing that once calibrated, they perform equivalently, and outperform all others. Our final recommendation—based on simplicity, flexibility and performance—is to use the original Adaboost algorithm with a shifted decision threshold and calibrated probability estimates.


Nucleic Acids Research | 2009

VisHiC—hierarchical functional enrichment analysis of microarray data

Darya Krushevskaya; Hedi Peterson; Jüri Reimand; Meelis Kull; Jaak Vilo

Measuring gene expression levels with microarrays is one of the key technologies of modern genomics. Clustering of microarray data is an important application, as genes with similar expression profiles may be regulated by common pathways and involved in related functions. Gene Ontology (GO) analysis and visualization allows researchers to study the biological context of discovered clusters and characterize genes with previously unknown functions. We present VisHiC (Visualization of Hierarchical Clustering), a web server for clustering and compact visualization of gene expression data combined with automated function enrichment analysis. The main output of the analysis is a dendrogram and visual heatmap of the expression matrix that highlights biologically relevant clusters based on enriched GO terms, pathways and regulatory motifs. Clusters with most significant enrichments are contracted in the final visualization, while less relevant parts are hidden altogether. Such a dense representation of microarray data gives a quick global overview of thousands of transcripts in many conditions and provides a good starting point for further analysis. VisHiC is freely available at http://biit.cs.ut.ee/vishic.


european conference on machine learning | 2014

Reliability maps: a tool to enhance probability estimates and improve classification accuracy

Meelis Kull; Peter A. Flach

We propose a general method to assess the reliability of two-class probabilities in an instance-wise manner. This is relevant, for instance, for obtaining calibrated multi-class probabilities from two-class probability scores. The LS-ECOC method approaches this by performing least-squares fitting over a suitable error-correcting output code matrix, where the optimisation resolves potential conflicts in the input probabilities. While this gives all input probabilities equal weight, we would like to spend less effort fitting unreliable probability estimates. We introduce the concept of a reliability map to accompany the more conventional notion of calibration map; and LS-ECOC-R which modifies LS-ECOC to take reliability into account. We demonstrate on synthetic data that this gets us closer to the Bayes-optimal classifier, even if the base classifiers are linear and hence have high bias. Results on UCI data sets demonstrate that multi-class accuracy also improves.

Collaboration


Dive into the Meelis Kull's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hao Song

University of Bristol

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tom Diethe

University College London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge