Zsolt Tivadar Kardkovács
Budapest University of Technology and Economics
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zsolt Tivadar Kardkovács.
Journal of the American Medical Informatics Association | 2009
Illés Solt; Domonkos Tikk; Viktor Gál; Zsolt Tivadar Kardkovács
OBJECTIVE Automated and disease-specific classification of textual clinical discharge summaries is of great importance in human life science, as it helps physicians to make medical studies by providing statistically relevant data for analysis. This can be further facilitated if, at the labeling of discharge summaries, semantic labels are also extracted from text, such as whether a given disease is present, absent, questionable in a patient, or is unmentioned in the document. The authors present a classification technique that successfully solves the semantic classification task. DESIGN The authors introduce a context-aware rule-based semantic classification technique for use on clinical discharge summaries. The classification is performed in subsequent steps. First, some misleading parts are removed from the text; then the text is partitioned into positive, negative, and uncertain context segments, then a sequence of binary classifiers is applied to assign the appropriate semantic labels. Measurement For evaluation the authors used the documents of the i2b2 Obesity Challenge and adopted its evaluation measures: F(1)-macro and F(1)-micro for measurements. RESULTS On the two subtasks of the Obesity Challenge (textual and intuitive classification) the system performed very well, and achieved a F(1)-macro = 0.80 for the textual and F(1)-macro = 0.67 for the intuitive tasks, and obtained second place at the textual and first place at the intuitive subtasks of the challenge. CONCLUSIONS The authors show in the paper that a simple rule-based classifier can tackle the semantic classification task more successfully than machine learning techniques, if the training data are limited and some semantic labels are very sparse.
Sigkdd Explorations | 2005
Zsolt Tivadar Kardkovács; Domonkos Tikk; Zoltán Bánsághi
In this paper, we present a general solution for the KDD Cup 2005 problem. It uses the Internet as source of knowledge and extends it to categorize very short (less than 5 words) documents with reasonable accuracy. Our approach consists of three main parts: i.) a central knowledge filter ii.) an on-demand web crawler and iii.) a very efficient categorizer system. Our solution obtained Creativity and Precision Runner-up Awards at the competition. The main idea of Ferrety Algorithm can be generalized for mapping one taxonomy to another if training documents are available.
Sigkdd Explorations | 2006
Domonkos Tikk; Zsolt Tivadar Kardkovács; Ferenc Szidarovszky
This paper presents our winner solution for the KDD Cup 2006 problem. It is based on the results of three different supervised learning techniques which are then combined in a classifier committee, and finally a single solution is obtained with a voting procedure. The voting procedure assigns weights to each member of the committee according to their average performance on a ten-fold cross-validation test and it also takes into account the confidence values returned by the three algorithms. The final decision of the committee is determined by means of a parameterized veto strategy, which takes into consideration the maximal allowed error rate beside the confidence values of the committee members. The solution presented here won Task 2 and became runner-up at Task 1 in the competition.
Lecture Notes in Computer Science | 2003
Zsolt Tivadar Kardkovács; Gábor Mihály Surányi; Sándor Gajdos
The problem of integrating independent data banks is manifold. One of the arising issues is that data models of the same or related subjects vary among implementations. Therefore, identification of analogous, (fundamentally) identical or just similar concepts is a must. In this paper we deal with this problem. Based on preliminary descriptions of concepts, our novel integration method reorganises all data by means of a so-called catalogue. The strict mathematical basis of the catalogue enables 1) discovering the correspondence among concepts of the sources irrespectively of their language, 2) effective searching for exactly matching, parallel or alternative, most similar elements in the unified data bank. Our approach affords those databases the ability of transparently providing extended, personalisable services of better quality to the clients, which is highly demanded and applicable by modern web agents.
international conference natural language processing | 2005
Zsolt Tivadar Kardkovács
In our ongoing project called “In the Web of Words” (WoW) we aimed to create a complex search interface that incorporates a deep web search engine module based on a Hungarian question processor. One of the most crucial part of the system was the transformation of genitive relations to adequate SQL queries, since e.g. questions begin with “Who” and “What” mostly contain such a relation. The genitive relation is one of the most complex semantic structures, since it could express wide range of different connection types between entities, even in a single language. Thus, transformation of its syntactic form to a formal computer language is far from clear. In the last decade, several natural language database interfaces (NLIDBs) have been proposed, however, a detailed or a general description of this problem is still missing in the literature. In this paper, we describe how to translate genitive phrases into SQL queries in general, i.e. we omit Hungarian-dependent optimizations.
advances in databases and information systems | 2004
Gábor Mihály Surányi; Zsolt Tivadar Kardkovács; Sándor Gajdos
If vast quantities of data elements are considered, catalogues provide an intuitive way of organisation. While in common use the term catalogue refers to a tree-shaped specialisation hierarchy, we allow any transitively reduced acyclic digraph of a transitive relation such as the specialisation relationship to be a representation of a catalogue. This conforms to real-life scenarios, where each element can be classified differently depending on the actual point of view. As shown by the application of catalogues for database integration purposes, the inherent definition of similarity among categories is extremely useful. In this paper, we investigate whether catalogues as a physical organisation method also have some benefit. We accomplish this task by precisely defining the data structure and thoroughly analysing the time and space complexity properties of its management routines. The results are also compared to those of relevant alternative organisation methods.
international conference on intelligent engineering systems | 2006
Domonkos Tikk; Zsolt Tivadar Kardkovács; Zoltán Bánsághi
Capturing the meaning of Internet search queries can significantly improve the effectiveness of search retrieval. Users often face with the problem of finding the relevant answer on the result pages for their Internet search, particularly, when the posted query is ambiguous. The orientation of the user can be greatly facilitated, if answers are grouped into topics of a fixed subject taxonomy. In this manner, the original problem can be transformed to the labelling of queries - and consequently, the answers - with the topic names. This is clearly a categorization problem, i.e. it requires supervised machine learning. This paper introduces our approach, called Ferrety algorithm that performs topic assignment, which also works when there is no directly available training data that describes the semantics of the subject taxonomy. It is presented via the example of ACM KDD Cup 2005 problem, where Ferrety was awarded for precision and creativity
congress of the italian association for artificial intelligence | 2005
Domonkos Tikk; Ferenc Szidarovszky; Zsolt Tivadar Kardkovács; Gábor Magyar
In our ongoing research and development project, called “In the Web of Words” (WoW), funded by the National R+D Program in Hungary, we aim to create a complex search interface that incorporates—beside the usual keyword-based search functionality—(1) deep web search, (2) Hungarian natural language question processing, (3) image search support by visual thesaurus. This paper focuses on a particular and crucial part of the question processing problem (2): recognition of entities. Entities are expressions that have fixed form, and that are assigned context specific information in the dictionary. Due to the agglutinative feature of Hungarian language they often appear in the text differently as in the dictionary, therefore their detection requires special algorithms at processing.
Lecture Notes in Computer Science | 2004
Zsolt Tivadar Kardkovács; Gábor Mihály Surányi; Sándor Gajdos
The amount of structured information published on the World Wide Web is huge and steadily increases. The demand for uniform management of these heterogeneous and autonomous sources is also increasing. In this paper we present a method which is capable of virtually integrating data sources with logical rules into a catalogue and more importantly, discovering similar entities across the information sources. We prove that the technique is algorithmically decidable. The resulting system is primarily suitable for applications which require an integrated view of all the distributed database components for querying only, such as web shops.
international conference on web engineering | 2003
Zsolt Tivadar Kardkovács; Gábor Mihály Surányi
Current research activities in the computer science area focus on integration of systems with diverse purposes and architecture. They include the usage of mobile devices just like any immobile computer and the efficient, combined presentation and utilisation of contents from multiple providers together on the web. Even though these two tasks are naturally tackled in different ways, in this paper we propose a single solution to both problems based on logic, which, furthermore, makes their seamless integration possible. That is, without scrapping the well known models that have proven good, we offer a scalable architecture that supports wireless access to the most valuable pieces of the Internet keeping parts of the existing infrastructure and with no duplication of functionality.