Umar Syed | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Umar Syed is active.

Explore More

Publication

Featured researches published by Umar Syed.

Methods of Molecular Biology | 2009

Enzyme function prediction with interpretable models.

Umar Syed; Golan Yona

Enzymes play central roles in metabolic pathways, and the prediction of metabolic pathways in newly sequenced genomes usually starts with the assignment of genes to enzymatic reactions. However, genes with similar catalytic activity are not necessarily similar in sequence, and therefore the traditional sequence similarity-based approach often fails to identify the relevant enzymes, thus hindering efforts to map the metabolome of an organism.Here we study the direct relationship between basic protein properties and their function. Our goal is to develop a new tool for functional prediction (e.g., prediction of Enzyme Commission number), which can be used to complement and support other techniques based on sequence or structure information. In order to define this mapping we collected a set of 453 features and properties that characterize proteins and are believed to be related to structural and functional aspects of proteins. We introduce a mixture model of stochastic decision trees to learn the set of potentially complex relationships between features and function. To study these correlations, trees are created and tested on the Pfam classification of proteins, which is based on sequence, and the EC classification, which is based on enzymatic function. The model is very effective in learning highly diverged protein families or families that are not defined on the basis of sequence. The resulting tree structures highlight the properties that are strongly correlated with structural and functional aspects of protein families, and can be used to suggest a concise definition of a protein family.

meeting of the association for computational linguistics | 2008

Using Automatically Transcribed Dialogs to Learn User Models in a Spoken Dialog System

Umar Syed; Jason D. Williams

We use an EM algorithm to learn user models in a spoken dialog system. Our method requires automatically transcribed (with ASR) dialog corpora, plus a model of transcription errors, but does not otherwise need any manual transcription effort. We tested our method on a voice-controlled telephone directory application, and show that our learned models better replicate the true distribution of user actions than those trained by simpler methods and are very similar to user models estimated from manually transcribed dialogs.

algorithmic learning theory | 2015

Learning with Deep Cascades

Giulia DeSalvo; Mehryar Mohri; Umar Syed

We introduce a broad learning model formed by cascades of predictors, Deep Cascades, that is structured as general decision trees in which leaf predictors or node questions may be members of rich function families. We present new data-dependent theoretical guarantees for learning with Deep Cascades with complex leaf predictors and node questions in terms of the Rademacher complexities of the sub-families composing these sets of predictors and the fraction of sample points reaching each leaf that are correctly classified. These guarantees can guide the design of a variety of different algorithms for deep cascade models and we give a detailed description of two such algorithms. Our second algorithm uses as node and leaf classifiers SVM predictors and we report the results of experiments comparing its performance with that of SVM combined with polynomial kernels.

symposium on cloud computing | 2017

SQML: large-scale in-database machine learning with pure SQL

Umar Syed; Sergei Vassilvitskii

Many enterprises have migrated their data from an on-site database to a cloud-based database-as-a-service that handles all database-related administrative tasks while providing a simple SQL interface to the end user. Businesses are also increasingly relying on machine learning to understand their customers and develop new products. Given these converging trends, there is a pressing need for database-as-a-service providers to add support for sophisticated machine learning algorithms to the core functionality of their products.

economics and computation | 2016

Where to Sell: Simulating Auctions From Learning Algorithms

Hamid Nazerzadeh; Renato Paes Leme; Afshin Rostamizadeh; Umar Syed

Ad exchange platforms connect online publishers and advertisers and facilitate the sale of billions of impressions every day. We study these environments from the perspective of a publisher who wants to find the profit-maximizing exchange in which to sell his inventory. Ideally, the publisher would run an auction among exchanges. However, this is not usually possible due to practical business considerations. Instead, the publisher must send each impression to only one of the exchanges, along with an asking price. We model the problem as a variation of the multi-armed bandits problem in which exchanges (arms) can behave strategically in order to maximizes their own profit. We propose e mechanisms that find the best exchange with sub-linear regret and have desirable incentive properties.

Social Science Research Network | 2016

Where to Sell: Simulating Auctions from Learning Algorithms

Hamid Nazerzadeh; Renato Paes Leme; Afshin Rostamizadeh; Umar Syed

Ad Exchange platforms connect online publishers and advertisers and facilitate selling billions of impressions every day. We study these environments from the perspective of a publisher who wants to find the profit maximizing exchange to sell his inventory. Ideally, the publisher would run an auction among exchanges. However, this is not possible due to technological and other practical considerations. The publisher needs to send each impression to one of the exchanges with an asking price. We model the problem as a variation of multi-armed bandits where exchanges (arms) can behave strategically in order to maximizes their own profit. We propose a mechanism that finds the best exchange with sub-linear regret and has desirable incentive properties.

conference on information and knowledge management | 2015

An Optimal Online Algorithm For Retrieving Heavily Perturbed Statistical Databases In The Low-Dimensional Querying Model

Krzysztof Choromanski; Afshin Rostamizadeh; Umar Syed

We give the first Õ(1 over √ T)-error online algorithm for reconstructing noisy statistical databases, where T is the number of (online) sample queries received. The algorithm is optimal up to the poly(log(T)) factor in terms of the error and requires only O(log T) memory. It aims to learn a hidden database-vector w* Ε in ℜ D in order to accurately answer a stream of queries regarding the hidden database, which arrive in an online fashion from some unknown distribution D. We assume the distribution D is defined on the neighborhood of a low-dimensional manifold. The presented algorithm runs in O(dD)-time per query, where d is the dimensionality of the query-space. Contrary to the classical setting, there is no separate training set that is used by the algorithm to learn the database --- the stream on which the algorithm will be evaluated must also be used to learn the database-vector. The algorithm only has access to a binary oracle Ο that answers whether a particular linear function of the database-vector plus random noise is larger than a threshold, which is specified by the algorithm. We note that we allow for a significant O(D) amount of noise to be added while other works focused on the low noise o(√D)-setting. For a stream of T queries our algorithm achieves an average error Õ(1 over √T) by filtering out random noise, adapting threshold values given to the oracle based on its previous answers and, as a consequence, recovering with high precision a projection of a database-vector w* onto the manifold defining the query-space. Our algorithm may be also applied in the adversarial machine learning context to compromise machine learning engines by heavily exploiting the vulnerabilities of the systems that output only binary signal and in the presence of significant noise.

neural information processing systems | 2007