Michael Cochez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Cochez is active.

Explore More

Publication

Featured researches published by Michael Cochez.

web intelligence, mining and semantics | 2017

Biased graph walks for RDF graph embeddings

Michael Cochez; Petar Ristoski; Simone Paolo Ponzetto; Heiko Paulheim

Knowledge Graphs have been recognized as a valuable source for background information in many data mining, information retrieval, natural language processing, and knowledge extraction tasks. However, obtaining a suitable feature vector representation from RDF graphs is a challenging task. In this paper, we extend the RDF2Vec approach, which leverages language modeling techniques for unsupervised feature extraction from sequences of entities. We generate sequences by exploiting local information from graph substructures, harvested by graph walks, and learn latent numerical representations of entities in RDF graphs. We extend the way we compute feature vector representations by comparing twelve different edge weighting functions for performing biased walks on the RDF graph, in order to generate higher quality graph embeddings. We evaluate our approach using different machine learning, as well as entity and document modeling benchmark data sets, and show that the naive RDF2Vec approach can be improved by exploiting Biased Graph Walks.

international semantic web conference | 2017

Global RDF Vector Space Embeddings

Michael Cochez; Petar Ristoski; Simone Paolo Ponzetto; Heiko Paulheim

Vector space embeddings have been shown to perform well when using RDF data in data mining and machine learning tasks. Existing approaches, such as RDF2Vec, use local information, i.e., they rely on local sequences generated for nodes in the RDF graph. For word embeddings, global techniques, such as GloVe, have been proposed as an alternative. In this paper, we show how the idea of global embeddings can be transferred to RDF embeddings, and show that the results are competitive with traditional local techniques like RDF2Vec.

international conference on application of information and communication technologies | 2013

How Do Computer Science Students Use Distributed Version Control Systems

Michael Cochez; Ville Isomöttönen; Ville Tirronen; Jonne Itkonen

The inclusion of version control systems into computing curricula enables educators to promote competences needed in real-life situations. The use of a version control system also has several potential benefits for the teacher. The teacher might, for instance, use the tool to monitor students’ progress and to give feedback efficiently. This study analyzes how students used the distributed version control system Git in several computing courses. We analyzed students’ commit log data in two advanced programming courses, a second-year introductory software engineering course, and two courses where students developed software products. This enables us to compare Git usage between introductory level and master’s level students, and between exercise-driven and product-driven courses. We found out that students which are using the version control system in a software product development setting used it in a more graceful manner. The students which were further given introduction to branching in the system also used this to not have to wait until the practical session to commit their changes. We also found the amount of garbage in the repositories is strongly relayed to the students’ awareness of the version control process and the need of keeping the workspace clean.

international semantic web conference | 2016

Knowledge Representation on the Web Revisited : The Case for Prototypes

Michael Cochez; Stefan Decker; Eric Prud’hommeaux

Recently, RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this paper we examine an alternative way to represent knowledge based on Prototypes. This Prototype-based representation has different properties, which we argue to be more suitable for data sharing and reuse on the Web. Prototypes avoid the distinction between classes and instances and provide a means for object-based data sharing and reuse.

Lecture Notes in Computer Science | 2016

TB-Structure: Collective Intelligence for Exploratory Keyword Search

Vagan Y. Terziyan; Mariia Golovianko; Michael Cochez

In this paper we address an exploratory search challenge by presenting a new (structure-driven) collaborative filtering technique. The aim is to increase search effectiveness by predicting implicit seeker’s intents at an early stage of the search process. This is achieved by uncovering behavioral patterns within large datasets of preserved collective search experience. We apply a specific tree-based data structure called a TB (There-and-Back) structure for compact storage of search history in the form of merged query trails – sequences of queries approaching iteratively a seeker’s goal. The organization of TB-structures allows inferring new implicit trails for the prediction of a seeker’s intents. We used experiments to demonstrate both: the storage compactness and inference potential of the proposed structure.

international conference on application of information and communication technologies | 2014

Challenges and Confusions in Learning Version Control with Git

Ville Isomöttönen; Michael Cochez

Scholars agree on the importance of incorporating use of version control systems (VCSs) into computing curricula, so as to be able to prepare students for today’s distributed and collaborative work places. One of the present-day distributed version control systems (DVCSs) is Git, the system we have used on several courses. In this paper, we report on the challenges for learning and using the system based on a survey data collected from a project-based course and our own teaching experiences during several different kinds of computing courses. The results of this analysis are discussed and recommendations are made.

frontiers of information technology | 2016

Using Distributional Semantics for Automatic Taxonomy Induction

Bushra Zafar; Michael Cochez; Usman Qamar

Semantic taxonomies are powerful tools that provide structured knowledge to Natural Language Processing (NLP), Information Retreval (IR), and general Artificial Intelligence (AI) systems. These taxonomies are extensively used for solving knowledge rich problems such as textual entailment and question answering. In this paper, we present a taxonomy induction system and evaluate it using the benchmarks provided in the Taxonomy Extraction Evaluation (TExEval2) Task. The task is to identify hyponym-hypernym relations and to construct a taxonomy from a given domain specific list. Our approach is based on a word embedding, trained from a large corpus and string-matching approaches. The overall approach is semi-supervised. We propose a generic algorithm that utilizes the vectors from the embedding effectively, to identify hyponym-hypernym relations and to induce the taxonomy. The system generated taxonomies on English language for three different domains (environment, food and science) which are evaluated against gold standard taxonomies. The system achieved good results for hyponym-hypernym identification and taxonomy induction, especially when compared to other tools using similar background knowledge.

Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) on | 2014

Locality-Sensitive Hashing for Massive String-Based Ontology Matching

Michael Cochez

This paper reports initial research results related to the use of locality-sensitive hashing (LSH) for string-based matching of big ontologies. Two ways of transforming the matching problem into a LSH problem are proposed and experimental results are reported. The performed experiments show that using LSH for ontology matching could lead to a very fast matching process. The quality of the alignment achieved in these experiments is comparable to state-of-the-art matchers, but much faster. Further research is needed to find out whether the use of different metrics or specific hardware would improve the results.

ieee symposium series on computational intelligence | 2015

Scalable Hierarchical Clustering: Twister Tries with a Posteriori Trie Elimination

Michael Cochez; Ferrante Neri

Exact methods for Agglomerative Hierarchical Clustering (AHC) with average linkage do not scale well when the number of items to be clustered is large. The best known algorithms are characterized by quadratic complexity. This is a generally accepted fact and cannot be improved without using specifics of certain metric spaces. Twister tries is an algorithm that produces a dendrogram (i.e., Outcome of a hierarchical clustering) which resembles the one produced by AHC, while only needing linear space and time. However, twister tries are sensitive to rare, but still possible, hash evaluations. These might have a disastrous effect on the final outcome. We propose the use of a metaheuristic algorithm to overcome this sensitivity and show how approximate computations of dendrogram quality can help to evaluate the heuristic within reasonable time. The proposed metaheuristic is based on an evolutionary framework and integrates a surrogate model of the fitness within it to enhance the algorithmic performance in terms of computational time.

Lecture Notes in Computer Science | 2015

Balanced Large Scale Knowledge Matching Using LSH Forest

Michael Cochez; Vagan Y. Terziyan; Vadim Ermolayev

Evolving Knowledge Ecosystems were proposed recently to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investigate the use of LSH Forest a self-tuning indexing schema based on locality-sensitive hashing for solving the problem of placing new knowledge tokens in the right contexts of the environment. We argue and show experimentally that LSH Forest possesses required properties and could be used for large distributed set-ups.

Explore More