Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dmitriy Meyerzon is active.

Publication


Featured researches published by Dmitriy Meyerzon.


acm/ieee joint conference on digital libraries | 2005

Automatic extraction of titles from general documents using machine learning

Yunhua Hu; Hang Li; Yunbo Cao; Dmitriy Meyerzon; Qinghua Zheng

We propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that can belong to any one of a number of specific genres, including presentations, book chapters, technical papers, brochures, reports, and letters. Previously, methods have been proposed mainly for title extraction from research papers. It has not been clear whether it could be possible to conduct automatic title extraction from general documents. As a case study, we consider extraction from Office including Word and PowerPoint. In our approach, we annotate titles in sample documents (for Word and PowerPoint respectively) and take them as training data, train machine learning models, and perform title extraction using the trained models. Our method is unique in that we mainly utilize formatting information such as font size as features in the models. It turns out that the use of formatting information can lead to quite accurate extraction from general documents. Precision and recall for title extraction from Word is 0.810 and 0.837 respectively, and precision and recall for title extraction from PowerPoint is 0.875 and 0.895 respectively in an experiment on intranet data. Other important new findings in this work include that we can train models in one domain and apply them to another domain, and more surprisingly we can even train models in one language and apply them to another language. Moreover, we can significantly improve search ranking results in do document retrieval by using the extracted titles


Archive | 1998

Automatic tagging of documents and exclusion by content

Dmitriy Meyerzon; William G. Nichols


Archive | 1999

Method and system for detecting duplicate documents in web crawls

Dmitriy Meyerzon; Srikanth Shoroff; F. Soner Terek; Scott Norin


Archive | 1998

Synchronizing crawler with notification source

Dmitriy Meyerzon; Sankrant Sanu


Archive | 1997

Method of web crawling utilizing address mapping

Sankrant Sanu; Dmitriy Meyerzon


Archive | 1999

Method and system for incremental web crawling

Dmitriy Meyerzon; Srikanth Shoroff; F. Soner Terek; Sankrant Sanu


Archive | 2004

Scoping queries in a search engine

Kyle G. Peltonen; Dmitriy Meyerzon


Archive | 2004

Ranking search results using feature extraction

Dmitriy Meyerzon; Hang Li


Archive | 2008

Techniques to perform relative ranking for search results

Vladimir Tankovich; Dmitriy Meyerzon; Michael J. Taylor; Stephen E. Robertson


Archive | 1998

Method of web crawling utilizing crawl numbers

Dmitriy Meyerzon; Sankrant Sanu

Collaboration


Dive into the Dmitriy Meyerzon's collaboration.

Researchain Logo
Decentralizing Knowledge