Mehran Sahami | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mehran Sahami is active.

Explore More

Publication

Featured researches published by Mehran Sahami.

conference on information and knowledge management | 1998

Inductive learning algorithms and representations for text categorization

Susan T. Dumais; John Platt; David Heckerman; Mehran Sahami

1. ABSTRACT Text categorization – the assignment of natural language texts to one or more predefined categories based on their content – is an important component in many information organization and management tasks. We compare the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, realtime classification speed, and classification accuracy. We also examine training set size, and alternative document representations. Very accurate text classifiers can be learned automatically from training examples. Linear Support Vector Machines (SVMs) are particularly promising because they are very accurate, quick to train, and quick to evaluate. 1.1

international world wide web conferences | 2006

A web-based kernel function for measuring the similarity of short text snippets

Mehran Sahami; Timothy Dharma Heilman

Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if any, terms in common between two short text snippets. We address this problem by introducing a novel method for measuring the similarity between short text snippets (even those without any overlapping terms) by leveraging web search results to provide greater context for the short texts. In this paper, we define such a similarity kernel function, mathematically analyze some of its properties, and provide examples of its efficacy. We also show the use of this kernel function in a large-scale system for suggesting related queries to search engine users.

international conference on management of data | 2001

Probe, count, and classify: categorizing hidden web databases

Panagiotis G. Ipeirotis; Luis Gravano; Mehran Sahami

The contents of many valuable web-accessible databases are only accessible through search interfaces and are hence invisible to traditional web “crawlers.” Recent studies have estimated the size of this “hidden web” to be 500 billion pages, while the size of the “crawlable” web is only an estimated two billion pages. Recently, commercial web sites have started to manually organize web-accessible databases into Yahoo!-like hierarchical classification schemes. In this paper, we introduce a method for automating this classification process by using a small number of query probes. To classify a database, our algorithm does not retrieve or inspect any documents or pages from the database, but rather just exploits the number of matches that each query probe generates at the database in question. We have conducted an extensive experimental evaluation of our technique over collections of real documents, including over one hundred web-accessible databases. Our experiments show that our system has low overhead and achieves high classification accuracy across a variety of databases.

Communications of The ACM | 2013

Reflections on Stanford's MOOCs

Steve Cooper; Mehran Sahami

New possibilities in online education create new challenges.

technical symposium on computer science education | 2012

Modeling how students learn to program

Chris Piech; Mehran Sahami; Daphne Koller; Stephen Cooper; Paulo Blikstein

Despite the potential wealth of educational indicators expressed in a students approach to homework assignments, how students arrive at their final solution is largely overlooked in university courses. In this paper we present a methodology which uses machine learning techniques to autonomously create a graphical model of how students in an introductory programming course progress through a homework assignment. We subsequently show that this model is predictive of which students will struggle with material presented later in the class.

international conference on data mining | 2005

Adaptive product normalization: using online learning for record linkage in comparison shopping

Mikhail Bilenko; S. Basil; Mehran Sahami

The problem of record linkage focuses on determining whether two object descriptions refer to the same underlying entity. Addressing this problem effectively has many practical applications, e.g., elimination of duplicate records in databases and citation matching for scholarly articles. In this paper, we consider a new domain where the record linkage problem is manifested: Internet comparison shopping. We address the resulting linkage setting that requires learning a similarity function between record pairs from streaming data. The learned similarity function is subsequently used in clustering to determine which records are co-referent and should be linked. We present an online machine learning method for addressing this problem, where a composite similarity function based on a linear combination of basis functions is learned incrementally. We illustrate the efficacy of this approach on several real-world datasets from an Internet comparison shopping site, and show that our method is able to effectively learn various distance functions for product data with differing characteristics. We also provide experimental results that show the importance of considering multiple performance measures in record linkage evaluation.

The Journal of the Learning Sciences | 2014

Programming Pluralism: Using Learning Analytics to Detect Patterns in the Learning of Computer Programming

Paulo Blikstein; Marcelo Worsley; Chris Piech; Mehran Sahami; Steven Cooper; Daphne Koller

New high-frequency, automated data collection and analysis algorithms could offer new insights into complex learning processes, especially for tasks in which students have opportunities to generate unique open-ended artifacts such as computer programs. These approaches should be particularly useful because the need for scalable project-based and student-centered learning is growing considerably. In this article, we present studies focused on how students learn computer programming, based on data drawn from 154,000 code snapshots of computer programs under development by approximately 370 students enrolled in an introductory undergraduate programming course. We use methods from machine learning to discover patterns in the data and try to predict final exam grades. We begin with a set of exploratory experiments that use fully automated techniques to investigate how much students change their programming behavior throughout all assignments in the course. The results show that students’ change in programming patterns is only weakly predictive of course performance. We subsequently hone in on 1 single assignment, trying to map students’ learning process and trajectories and automatically identify productive and unproductive (sink) states within these trajectories. Results show that our process-based metric has better predictive power for final exams than the midterm grades. We conclude with recommendations about the use of such methods for assessment, real-time feedback, and course improvement.

international conference on image processing | 2004

Efficient face orientation discrimination

Shumeet Baluja; Mehran Sahami; Henry A. Rowley

The paper presents efficient methods to address the problem of discriminating between live facial orientations. We present the most efficient methods for this task to date, which can accurately discriminate between five facial orientations with approximately 92% accuracy using fewer than 30 pixel comparisons and greater than 99% accuracy using 150 pixel comparisons. We achieve these rates by using a boosting method to select from a large set of extremely simple features. Comparisons to other methods are given.

pacific rim international conference on artificial intelligence | 2004

The happy searcher: challenges in Web information retrieval

Mehran Sahami; Vibhu Mittal; Shumeet Baluja; Henry A. Rowley

Search has arguably become the dominant paradigm for finding information on the World Wide Web. In order to build a successful search engine, there are a number of challenges that arise where techniques from artificial intelligence can be used to have a significant impact. In this paper, we explore a number of problems related to finding information on the web and discuss approaches that have been employed in various research programs, including some of those at Google. Specifically, we examine issues of such as web graph analysis, statistical methods for inferring meaning in text, and the retrieval and analysis of newsgroup postings, images, and sounds. We show that leveraging the vast amounts of data on web, it is possible to successfully address problems in innovative ways that vastly improve on standard, but often data impoverished, methods. We also present a number of open research problems to help spur further research in these areas.

international workshop on the web and databases | 2000

Automatic Classification of Text Databases Through Query Probing

Panagiotis G. Ipeirotis; Luis Gravano; Mehran Sahami

Many text databases on the web are “hidden” behind search interfaces, and their documents are only accessible through querying. Traditional search engines typically ignore the contents of such searchonly databases. Recently, Yahoo-like directories have started to manually organize these databases into categories that users can browse to find these valuable resources. We propose a novel strategy to automate the classification of search-only text databases. Our technique starts by training a rule-based document classifier, and then uses the classifier’s rules to generate probing queries. The queries are sent to the text databases, which are then classified based on the number of matches that they produce for each query. We report some initial exploratory experiments that show that our approach is promising to automatically characterize the contents of text databases accessible on the web.

Explore More