Yaakov HaCohen-Kerner
Jerusalem College of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yaakov HaCohen-Kerner.
international conference on computational linguistics | 2005
Yaakov HaCohen-Kerner; Zuriel Gross; Asaf Masa
Many academic journals and conferences require that each article include a list of keyphrases. These keyphrases should provide general information about the contents and the topics of the article. Keyphrases may save precious time for tasks such as filtering, summarization, and categorization. In this paper, we investigate automatic extraction and learning of keyphrases from scientific articles written in English. Firstly, we introduce various baseline extraction methods. Some of them, formalized by us, are very successful for academic papers. Then, we integrate these methods using different machine learning methods. The best results have been achieved by J48, an improved variant of C4.5. These results are significantly better than those achieved by previous extraction systems, regarded as the state of the art.
international conference on knowledge-based and intelligent information and engineering systems | 2003
Yaakov HaCohen-Kerner
The rapid increasing of online information is hard to handle. Summaries such as abstracts help us to reduce this problem. Keywords, which can be regarded as very short summaries, may help even more. Filtering documents by using keywords may save precious time while searching. However, most of the documents do not include keywords. In this paper we present a model that extracts keywords from abstracts and titles. This model has been implemented in a prototype system. We have tested our model on a set of abstracts of Academic papers containing keywords composed by their authors. Results show that keywords extracted from abstracts and titles may be a primary tool for researchers.
meeting of the association for computational linguistics | 2008
Yaakov HaCohen-Kerner; Ariel Kass; Ariel Peretz
A process that attempts to solve abbreviation ambiguity is presented. Various context-related features and statistical features have been explored. Almost all features are domain independent and language independent. The application domain is Jewish Law documents written in Hebrew. Such documents are known to be rich in ambiguous abbreviations. Various implementations of the one sense per discourse hypothesis are used, improving the features with new variants. An accuracy of 96.09% has been achieved by SVM.
international conference natural language processing | 2004
Yaakov HaCohen-Kerner; Ariel Kass; Ariel Peretz
In many languages, abbreviations are widely used either in writing or talking. However, abbreviations are likely to be ambiguous. Therefore, there is a need for disambiguation. That is, abbreviations should be expanded correctly. Disambiguation of abbreviations is critical for correct understanding not only for the abbreviations themselves but also for the whole text. Little research has been done concerning disambiguation of abbreviations for documents in English and Latin. Nothing has been done for the Hebrew language. In this ongoing work, we investigate a basic model, which expands abbreviations contained in Jewish Law Documents written in Hebrew. This model has been implemented in a prototype system. Currently, experimental results show that abbreviations are expanded correctly in a rate of almost 60%.
information retrieval facility conference | 2014
Dimitris Liparas; Yaakov HaCohen-Kerner; Anastasia Moumtzidou; Stefanos Vrochidis; Ioannis Kompatsiaris
This research investigates the problem of news articles classification. The classification is performed using N-gram textual features extracted from text and visual features generated from one representative image. The application domain is news articles written in English that belong to four categories: Business-Finance, Lifestyle-Leisure, Science-Technology and Sports downloaded from three well-known news web-sites (BBC, Reuters, and TheGuardian). Various classification experiments have been performed with the Random Forests machine learning method using N-gram textual features and visual features from a representative image. Using the N-gram textual features alone led to much better accuracy results (84.4%) than using the visual features alone (53%). However, the use of both N-gram textual features and visual features led to slightly better accuracy results (86.2%). The main contribution of this work is the introduction of a news article classification framework based on Random Forests and multimodal features (textual and visual), as well as the late fusion strategy that makes use of Random Forests operational capabilities.
Applied Artificial Intelligence | 2010
Yaakov HaCohen-Kerner; Hananya Beck; Elchai Yehudai; Dror Mughaz
This research investigates classification of documents according to the ethnic group of their authors and/or to the historical period when the documents were written. The classification is done using various combinations of six sets of stylistic features: quantitative, orthographic, topographic, lexical, function, and vocabulary richness. The application domain is Jewish Law articles written in Hebrew-Aramaic, languages that are rich in their morphological forms. Four popular machine learning methods have been applied. The logistic regression method led to the best accuracy results: about 99.6% while classifying to the ethnic group of their authors or to the historical period when the articles were written and about 98.3% while classifying to both classifications. The quantitative feature set was found as very successful and superior to all other sets. The lexical and function feature sets have also been found to be useful. The quantitative and the function features are domain independent and language independent. These two feature sets might be generalized to similar classification tasks for other languages and can therefore be useful for the text classification community at large.
Cybernetics and Systems | 1999
Yaakov HaCohen-Kerner; Nahum Cohen; Erez Shasha
Computer chess programs achieve outstanding results at playing chess. However, no existing program can compose adequate chess problems. In this paper, we present a model that is capable of improving the quality of some of the existing chess problems. We have formalized a major part of the knowledge needed for evaluating the quality of chess problems. In the model, we attempt to improve a given problem by a series of meaningful chess transformations, using a hill-climbing search, while satisfying several criteria at each step. This model has been implemented in a working system called Improver of Chess Problems (ICP). The results of the experiment we carried out show that the majority of the problems examined were optimal. However, the software has improved almost one-third of the tested problems most of them needing only slight changes. General lessons learned from this research may be useful in other composition domains.
Language, Culture, Computation (3) | 2014
Ephraim Nissan; Yaakov HaCohen-Kerner
In the present paper, we illustrate on animal names (zoonyms) the specification and design of the phono-semantic matching (PSM) module which within the architecture of GALLURA, should be upstream in the control flow. The PSM module takes a word (e.g., a zoonym, or then a place-name) and an indication of a target-language (in practice, Hebrew). The desired output of the PSM module should be a set of alternative segmentations or sets of such native (i.e., Hebrew) words or roots that are derivationally “relevant” (in a folk-etymological sense). That output of the PSM module should then be an input for the story-telling module of GALLURA. The desired output of GALLURA is a combination of folk-etymology and storytelling, a humorous aetiological (i.e., explanatory) tale. A story is sought, that by bridging through some narrative trajectory the input and output of the PSM module, would back up the folk-etymology proposed by the output of the PSM module. Phono-semantic matching (PSM), itself not an easy task, is only part of the skills required. Here however we focus on the processing in the designed PSM module.
Cybernetics and Systems | 2008
Yaakov HaCohen-Kerner; Dror Mughaz; Hananya Beck; Elchai Yehudai
Text classification presents challenges due to the large number of features, their dependencies, and the large number of training documents. In this research, we investigate whether the use of words as features is appropriate for classification of documents to the ethnic group of their authors and/or to the historical period when they were written. To the best of our knowledge, these kinds of classifications have not been explored before by others. In addition, we investigate Formans (2003) claim about not using common words for classification tasks. The application domain was articles referring to Jewish law written in Hebrew-Aramaic, which have been little studied. Different experiments using SVM and InfoGain present highly successful results (more than 95%). The results indicate that the use of common words as features contribute to make the learning task efficient and more accurate.
Cybernetics and Systems | 2007
Yaakov HaCohen-Kerner; Ittay Stern; David Korkus; Erick Fredj
Keyphrases extracted from documents may save precious time for tasks such as filtering, summarization, and categorization. A few such systems are available for documents written in English. In this paper, we propose a model called LEH_KEY (Learning to Extract Hebrew KEYphrases) that for the first time learns to extract keyphrases for documents written in Hebrew. Firstly, we introduce a relatively high number (15) of baseline extraction methods as opposed to other related systems that use combinations of a low number (two/three) of baseline extraction methods. In contrast, we have investigated various combinations of larger number of baseline methods and various machine learning methods have been tested. The best results have been achieved by a combination of six baseline methods using J48 (an improved variant of C4.5). Our results have been found to be at least of equal quality to those achieved by extraction systems for documents written in English, which are regarded as state-of-the art.