Janez Brank | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Janez Brank is active.

Explore More

Publication

Featured researches published by Janez Brank.

international acm sigir conference on research and development in information retrieval | 2004

Feature selection using linear classifier weights: interaction with classification models

Dunja Mladenic; Janez Brank; Marko Grobelnik; Natasa Milic-Frayling

This paper explores feature scoring and selection based on weights from linear classification models. It investigates how these methods combine with various learning models. Our comparative analysis includes three learning algorithms: Naïve Bayes, Perceptron, and Support Vector Machines (SVM) in combination with three feature weighting methods: Odds Ratio, Information Gain, and weights from linear models, the linear SVM and Perceptron. Experiments show that feature selection using weights from linear SVMs yields better classification performance than other feature weighting methods when combined with the three explored learning algorithms. The results support the conjecture that it is the sophistication of the feature weighting method rather than its apparent compatibility with the learning algorithm that improves classification performance.

WIT Transactions on Information and Communication Technologies | 2002

Feature Selection Using Support Vector Machines

Janez Brank; Marko Grobelnik; Natasa Milic-Frayling; Dunja Mladenic

Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training classifiers on large collections of documents, both the time and memory requirements connected with processing of these vectors may be prohibitive. This calls for using a feature selection method, not only to reduce the number of features but also to increase the sparsity of document vectors. We propose a feature selection method based on linear Support Vector Machines (SVMs). First, we train the linear SVM on a subset of training data and retain only those features that correspond to highly weighted components (in absolute value sense) of the normal to the resulting hyperplane that separates positive and negative examples. This reduced feature space is then used to train a classifier over a larger training set because more documents now fit into the same amount of memory. In our experiments we compare the effectiveness of the SVM -based feature selection with that of more traditional feature selection methods, such as odds ratio and information gain, in achieving the desired tradeoff between the vector sparsity and the classification performance. Experimental results indicate that, at the same level of vector sparsity, feature selection based on SVM normals yields better classification performance than odds ratioor information gainbased feature selection when linear SVM classifiers are used.

international world wide web conferences | 2014

Event registry: learning about world events from news

Gregor Leban; Blaz Fortuna; Janez Brank; Marko Grobelnik

Event Registry is a system that can analyze news articles and identify in them mentioned world events. The system is able to identify groups of articles that describe the same event. It can identify groups of articles in different languages that describe the same event and represent them as a single event. From articles in each event it can then extract events core information, such as event location, date, who is involved and what is it about. Extracted information is stored in a database. A user interface is available that allows users to search for events using extensive search options, to visualize and aggregate the search results, to inspect individual events and to identify related events.

Archive | 2007

Automatic Evaluation of Ontologies

Janez Brank; Marko Grobelnik; Dunja Mladenic

We can observe that the focus of modern information systems is moving from “data processing” towards “concept processing,” meaning that the basic unit of processing is less and less an atomic piece of data and is becoming more a semantic concept which carries an interpretation and exists in a context with other concepts. An ontology is commonly used as a structure capturing knowledge about a certain area by providing relevant concepts and relations between them. Analysis of textual data plays an important role in construction and usage of ontologies, especially with the growing popularity of semi-automated ontology construction (here referred to also as ontology learning). Different knowledge discovery methods have been adopted for the problem of semi-automated ontology construction [10] including unsupervised, semi-supervised and supervised learning over a collection of text documents, using natural language processing to obtain semantic graph of a document, visualization of documents, information extraction to find relevant concepts, visualization of context of named entities in a document collection.

information technology interfaces | 2006

Using DMoz for constructing ontology from data stream

Marko Grobelnik; Janez Brank; Dunja Mladenic; B. Novak; Blaž Fortuna

This paper presents an approach for constructing an ontology from a stream of documents. Named entities extracted from the documents are used as instances of the ontology. Entities and co-occurring entity pairs are represented by feature vectors based on the content of the documents where they occurred. In general, concepts and relations can be formed into an ontological structure either by clustering or by classification into an existing topic hierarchy. We propose the latter using DMoz as an existing topic hierarchy. The approach is efficient and can scale to large data sets. We propose a framework that incorporates the stream mining process into a formal definition of the ontology. We describe a software component implementing this approach, and present experiments using a large collection of news

asian semantic web conference | 2009

Modeling Common Real-Word Relations Using Triples Extracted from n-Grams

Ruben Sipos; Dunja Mladenic; Marko Grobelnik; Janez Brank

In this paper, we present an approach providing generalized relations for automatic ontology building based on frequent word n-grams. Using publicly available Google n-grams as our data source we can extract relations in form of triples and compute generalized and more abstract models. We propose an algorithm for building abstractions of the extracted triples using WordNet as background knowledge. We also present a novel approach to triple extraction using heuristics, which achieves notably better results than deep parsing applied on n-grams. This allows us to represent information gathered from the web as a set of triples modeling the common and frequent relations expressed in natural language. Our results have potential for usage in different settings including providing for a knowledge base for reasoning or simply as statistical data useful in improving understanding of natural languages.

asian semantic web conference | 2008

Predicting Category Additions in a Topic Hierarchy

Janez Brank; Marko Grobelnik; Dunja Mladenic

This paper discusses the problem of predicting the structural changes in an ontology. It addresses ontologies that contain instances in addition to concepts. The focus is on an ontology where the instances are textual documents, but the approach presented in this document is general enough to also work with other kinds of instances, as long as a similarity measure can be defined over them. We examine the changes in the Open Directory Project ontology of Web pages over a period of several years and analyze the most common types of structural changes that took place during that time. We then present an approach for predicting one of the more common types of structural changes, namely the addition of a new concept that becomes the subconcept of an existing parent concept and adopts a few instances of this existing parent concept. We describe how this task can be formulated as a machine-learning problem and present an experimental evaluation of this approach that shows promising results of the proposed approach.

information technology interfaces | 2005

Experimental evaluation of documents classification represented using concept decomposition

J. Dobsa; Dunja Mladenic; Marko Grobelnik; Janez Brank

The paper presents experimental evaluation of dimensionality reduction technique based on concept indexing applied on document categorization. The experiments were conducted on three collections of documents, a standard Reuters news collection in English, and two hierarchies of Web documents (Slovenian and Croatian). In the experiments on classification into the Reuters collection the method of concept indexing was more successful than latent semantic indexing for the small number of vectors on which we project the documents. We have seen that concept indexing has improved classification performance of the used Support Vector Machines classifier on some categories in all the three document collections.

Archive | 2005