Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Noam Shazeer is active.

Publication


Featured researches published by Noam Shazeer.


Artificial Intelligence | 2002

A probabilistic approach to solving crossword puzzles

Michael L. Littman; Greg A. Keim; Noam Shazeer

We attacked the problem of solving crossword puzzles by computer: given a set of clues and a crossword grid, try to maximize the number of words correctly filled in. After an analysis of a large collection of puzzles, we decided to use an open architecture in which independent programs specialize in solving specific types of clues, drawing on ideas from information retrieval, database search, and machine learning. Each expert module generates a (possibly empty) candidate list for each clue, and the lists are merged together and placed into the grid by a centralized solver. We used a probabilistic representation as a common interchange language between subsystems and to drive the search for an optimal solution. PROVERB, the complete system, averages 95.3% words correct and 98.1% letters correct in under 15 minutes per puzzle on a sample of 370 puzzles taken from the New York Times and several other puzzle sources. This corresponds to missing roughly 3 words or 4 letters on a daily 1515 puzzle, making PROVERB a better-than-average cruciverbalist (crossword solver).


conference of the international speech communication association | 2016

NN-Grams: Unifying Neural Network and n-Gram Language Models for Speech Recognition.

Babak Damavandi; Shankar Kumar; Noam Shazeer; Antoine Bruguier

We present NN-grams, a novel, hybrid language model integrating n-grams and neural networks (NN) for speech recognition. The model takes as input both word histories as well as n-gram counts. Thus, it combines the memorization capacity and scalability of an n-gram model with the generalization ability of neural networks. We report experiments where the model is trained on 26B words. NN-grams are efficient at run-time since they do not include an output soft-max layer. The model is trained using noise contrastive estimation (NCE), an approach that transforms the estimation problem of neural networks into one of binary classification between data samples and noise samples. We present results with noise samples derived from either an n-gram distribution or from speech recognition lattices. NN-grams outperforms an n-gram model on an Italian speech recognition dictation task.


neural information processing systems | 2017

Attention is All you Need

Ashish Vaswani; Noam Shazeer; Niki Parmar; Jakob Uszkoreit; Llion Jones; Aidan N. Gomez; Lukasz Kaiser; Illia Polosukhin


Archive | 2003

Serving content-relevant advertisements with client-side device support

Darrell Anderson; Paul Buchheit; Jeffrey A. Dean; Georges R. Harik; Carl Laurence Gonsalves; Noam Shazeer; Narayanan Shivakumar


arXiv: Computation and Language | 2016

Exploring the limits of language modeling

Rafal Jozefowicz; Oriol Vinyals; Mike Schuster; Noam Shazeer; Yonghui Wu


Archive | 2004

Suggesting and/or providing targeting criteria for advertisements

Ross Koningstein; Valentin I. Spitkovsky; Georges R. Harik; Noam Shazeer


Archive | 2003

Method and apparatus for characterizing documents based on clusters of related words

Georges R. Harik; Noam Shazeer


Archive | 2004

Using concepts for ad targeting

Ross Koningstein; Valentin I. Spitkovsky; Georges R. Harik; Noam Shazeer


international conference on learning representations | 2017

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Noam Shazeer; Azalia Mirhoseini; Krzysztof Stanislaw Maziarz; Andy Davis; Quoc V. Le; Geoffrey E. Hinton; Jeffrey Dean


Archive | 2003

Ranking documents based on large data sets

Jeremy Bem; Georges R. Harik; Joshua L. Levenberg; Noam Shazeer; Simon Tong

Collaboration


Dive into the Noam Shazeer's collaboration.

Researchain Logo
Decentralizing Knowledge