Balint Miklos | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Balint Miklos is active.

Explore More

Publication

Featured researches published by Balint Miklos.

knowledge discovery and data mining | 2016

Smart Reply: Automated Response Suggestion for Email

Anjuli Kannan; Karol Kurach; Sujith Ravi; Tobias Kaufmann; Andrew Tomkins; Balint Miklos; Greg Corrado; László Lukács; Marina Ganea; Peter Young; Vivek Ramavajjala

In this paper we propose and investigate a novel end-to-end method for automatically generating short email responses, called Smart Reply. It generates semantically diverse suggestions that can be used as complete email responses with just one tap on mobile. The system is currently used in Inbox by Gmail and is responsible for assisting with 10% of all mobile responses. It is designed to work at very high throughput and process hundreds of millions of messages daily. The system exploits state-of-the-art, large-scale deep learning. We describe the architecture of the system as well as the challenges that we faced while building it, like response diversity and scalability. We also introduce a new method for semantic clustering of user-generated content that requires only a modest amount of explicitly labeled data.

web search and data mining | 2016

Hierarchical Label Propagation and Discovery for Machine Generated Email

James B. Wendt; Michael Bendersky; Lluis Garcia-Pueyo; Vanja Josifovski; Balint Miklos; Ivo Krka; Amitabh Saikia; Jie Yang; Marc-Allen Cartright; Sujith Ravi

Machine-generated documents such as email or dynamic web pages are single instantiations of a pre-defined structural template. As such, they can be viewed as a hierarchy of template and document specific content. This hierarchical template representation has several important advantages for document clustering and classification. First, templates capture common topics among the documents, while filtering out the potentially noisy variabilities such as personal information. Second, template representations scale far better than document representations since a single template captures numerous documents. Finally, since templates group together structurally similar documents, they can propagate properties between all the documents that match the template. In this paper, we use these advantages for document classification by formulating an efficient and effective hierarchical label propagation and discovery algorithm. The labels are propagated first over a template graph (constructed based on either term-based or topic-based similarities), and then to the matching documents. We evaluate the performance of the proposed algorithm using a large donated email corpus and show that the resulting template graph is significantly more compact than the corresponding document graph and the hierarchical label propagation is both efficient and effective in increasing the coverage of the baseline document classification algorithm. We demonstrate that the template label propagation achieves more than 91% precision and 93% recall, while increasing the label coverage by more than 11%.

international world wide web conferences | 2017

Template Induction over Unstructured Email Corpora

Julia Proskurnia; Marc-Allen Cartright; Lluis Garcia-Pueyo; Ivo Krka; James B. Wendt; Tobias Kaufmann; Balint Miklos

Unsupervised template induction over email data is a central component in applications such as information extraction, document classification, and auto-reply. The benefits of automatically generating such templates are known for structured data, e.g. machine generated HTML emails. However much less work has been done in performing the same task over unstructured email data. We propose a technique for inducing high quality templates from plain text emails at scale based on the suffix array data structure. We evaluate this method against an industry-standard approach for finding similar content based on shingling, running both algorithms over two corpora: a synthetically created email corpus for a high level of experimental control, as well as user-generated emails from the well-known Enron email corpus. Our experimental results show that the proposed method is more robust to variations in cluster quality than the baseline and templates contain more text from the emails, which would benefit extraction tasks by identifying transient parts of the emails. Our study indicates templates induced using suffix arrays contain approximately half as much noise (measured as entropy) as templates induced using shingling. Furthermore, the suffix array approach is substantially more scalable, proving to be an order of magnitude faster than shingling even for modestly-sized training clusters. Public corpus analysis shows that email clusters contain on average 4 segments of common phrases, where each of the segments contains on average 9 words, thus showing that templatization could help users reduce the email writing effort by an average of 35 words per email in an assistance or auto-reply related task.

Archive | 2014

SYSTEMS AND METHODS FOR ESTIMATING MESSAGE SIMILARITY

Ivo Krka; Itamar Gilad; Karol Kurach; Andrew M. Dai; Liam MacDermed; Peter J. Liu; Balint Miklos; Alexandru Damian

arXiv: Computation and Language | 2017

Efficient Natural Language Response Suggestion for Smart Reply

Matthew Henderson; Rami Al-Rfou; Brian Strope; Yun-hsuan Sung; László Lukács; Ruiqi Guo; Sanjiv Kumar; Balint Miklos; Ray Kurzweil

Archive | 2013

SYSTEMS AND METHODS FOR MESSAGE CATEGORIZATION MANAGEMENT

Itamar Gilad; Greg Bullock; Thompson Alexander Ivor Gawley; Andrew Ward Moedinger; Kevin Smilak; Jeroen Daniël Jillissen; Balint Miklos; Jason Briggs Cornwell

Archive | 2017

METHODS AND APPARATUS FOR DETERMINING NON-TEXTUAL REPLY CONTENT FOR INCLUSION IN A REPLY TO AN ELECTRONIC COMMUNICATION

Balint Miklos; Ijeoma Emeagwali; Phillip Sharp

Archive | 2017

METHODS AND APPARATUS FOR DETERMINING, BASED ON FEATURES OF AN ELECTRONIC COMMUNICATION AND SCHEDULE DATA OF A USER, REPLY CONTENT FOR INCLUSION IN A REPLY BY THE USER TO THE ELECTRONIC COMMUNICATION

Balint Miklos; Ijeoma Emeagwali; Stella Schieffer; Katie Hart; Jonathan Aroner; Phillip Sharp; Jung-won Shin

Archive | 2016

CLASSIFYING DOCUMENTS BY CLUSTER

Mike Bendersky; Jie Yang; Amitabh Saikia; Marc-Allen Cartright; Sujith Ravi; Balint Miklos; Ivo Krka; Vanja Josifovski; James B. Wendt; Luis Garcia Pueyo

Archive | 2016