Balint Miklos
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Balint Miklos.
knowledge discovery and data mining | 2016
Anjuli Kannan; Karol Kurach; Sujith Ravi; Tobias Kaufmann; Andrew Tomkins; Balint Miklos; Greg Corrado; László Lukács; Marina Ganea; Peter Young; Vivek Ramavajjala
In this paper we propose and investigate a novel end-to-end method for automatically generating short email responses, called Smart Reply. It generates semantically diverse suggestions that can be used as complete email responses with just one tap on mobile. The system is currently used in Inbox by Gmail and is responsible for assisting with 10% of all mobile responses. It is designed to work at very high throughput and process hundreds of millions of messages daily. The system exploits state-of-the-art, large-scale deep learning. We describe the architecture of the system as well as the challenges that we faced while building it, like response diversity and scalability. We also introduce a new method for semantic clustering of user-generated content that requires only a modest amount of explicitly labeled data.
web search and data mining | 2016
James B. Wendt; Michael Bendersky; Lluis Garcia-Pueyo; Vanja Josifovski; Balint Miklos; Ivo Krka; Amitabh Saikia; Jie Yang; Marc-Allen Cartright; Sujith Ravi
Machine-generated documents such as email or dynamic web pages are single instantiations of a pre-defined structural template. As such, they can be viewed as a hierarchy of template and document specific content. This hierarchical template representation has several important advantages for document clustering and classification. First, templates capture common topics among the documents, while filtering out the potentially noisy variabilities such as personal information. Second, template representations scale far better than document representations since a single template captures numerous documents. Finally, since templates group together structurally similar documents, they can propagate properties between all the documents that match the template. In this paper, we use these advantages for document classification by formulating an efficient and effective hierarchical label propagation and discovery algorithm. The labels are propagated first over a template graph (constructed based on either term-based or topic-based similarities), and then to the matching documents. We evaluate the performance of the proposed algorithm using a large donated email corpus and show that the resulting template graph is significantly more compact than the corresponding document graph and the hierarchical label propagation is both efficient and effective in increasing the coverage of the baseline document classification algorithm. We demonstrate that the template label propagation achieves more than 91% precision and 93% recall, while increasing the label coverage by more than 11%.
international world wide web conferences | 2017
Julia Proskurnia; Marc-Allen Cartright; Lluis Garcia-Pueyo; Ivo Krka; James B. Wendt; Tobias Kaufmann; Balint Miklos
Unsupervised template induction over email data is a central component in applications such as information extraction, document classification, and auto-reply. The benefits of automatically generating such templates are known for structured data, e.g. machine generated HTML emails. However much less work has been done in performing the same task over unstructured email data. We propose a technique for inducing high quality templates from plain text emails at scale based on the suffix array data structure. We evaluate this method against an industry-standard approach for finding similar content based on shingling, running both algorithms over two corpora: a synthetically created email corpus for a high level of experimental control, as well as user-generated emails from the well-known Enron email corpus. Our experimental results show that the proposed method is more robust to variations in cluster quality than the baseline and templates contain more text from the emails, which would benefit extraction tasks by identifying transient parts of the emails. Our study indicates templates induced using suffix arrays contain approximately half as much noise (measured as entropy) as templates induced using shingling. Furthermore, the suffix array approach is substantially more scalable, proving to be an order of magnitude faster than shingling even for modestly-sized training clusters. Public corpus analysis shows that email clusters contain on average 4 segments of common phrases, where each of the segments contains on average 9 words, thus showing that templatization could help users reduce the email writing effort by an average of 35 words per email in an assistance or auto-reply related task.
Archive | 2014
Ivo Krka; Itamar Gilad; Karol Kurach; Andrew M. Dai; Liam MacDermed; Peter J. Liu; Balint Miklos; Alexandru Damian
arXiv: Computation and Language | 2017
Matthew Henderson; Rami Al-Rfou; Brian Strope; Yun-hsuan Sung; László Lukács; Ruiqi Guo; Sanjiv Kumar; Balint Miklos; Ray Kurzweil
Archive | 2013
Itamar Gilad; Greg Bullock; Thompson Alexander Ivor Gawley; Andrew Ward Moedinger; Kevin Smilak; Jeroen Daniël Jillissen; Balint Miklos; Jason Briggs Cornwell
Archive | 2017
Balint Miklos; Ijeoma Emeagwali; Phillip Sharp
Archive | 2017
Balint Miklos; Ijeoma Emeagwali; Stella Schieffer; Katie Hart; Jonathan Aroner; Phillip Sharp; Jung-won Shin
Archive | 2016
Mike Bendersky; Jie Yang; Amitabh Saikia; Marc-Allen Cartright; Sujith Ravi; Balint Miklos; Ivo Krka; Vanja Josifovski; James B. Wendt; Luis Garcia Pueyo
Archive | 2016
Phillip Sharp; Prabhakar Raghavan; Thompson Alexander Ivor Gawley; Balint Miklos; Karol Kurach; Tobias Kaufmann; Gregory S. Corrado; László Lukács