Himabindu Lakkaraju | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Himabindu Lakkaraju is active.

Explore More

Publication

Featured researches published by Himabindu Lakkaraju.

knowledge discovery and data mining | 2016

Interpretable Decision Sets: A Joint Framework for Description and Prediction

Himabindu Lakkaraju; Stephen H. Bach; Jure Leskovec

One of the most important obstacles to deploying predictive models is the fact that humans do not understand and trust them. Knowing which variables are important in a models prediction and how they are combined can be very powerful in helping people understand and trust automatic decision making systems. Here we propose interpretable decision sets, a framework for building predictive models that are highly accurate, yet also highly interpretable. Decision sets are sets of independent if-then rules. Because each rule can be applied independently, decision sets are simple, concise, and easily interpretable. We formalize decision set learning through an objective function that simultaneously optimizes accuracy and interpretability of the rules. In particular, our approach learns short, accurate, and non-overlapping rules that cover the whole feature space and pay attention to small but important classes. Moreover, we prove that our objective is a non-monotone submodular function, which we efficiently optimize to find a near-optimal set of rules. Experiments show that interpretable decision sets are as accurate at classification as state-of-the-art machine learning techniques. They are also three times smaller on average than rule-based models learned by other methods. Finally, results of a user study show that people are able to answer multiple-choice questions about the decision boundaries of interpretable decision sets and write descriptions of classes based on them faster and more accurately than with other rule-based models that were designed for interpretability. Overall, our framework provides a new approach to interpretable machine learning that balances accuracy, interpretability, and computational efficiency.

conference on information and knowledge management | 2011

Attention prediction on social media brand pages

Himabindu Lakkaraju; Jitendra Ajmera

In this paper, we deal with the problem of predicting how much attention a newly submitted post would receive from fellow community members of closed communities in social networking sites. Though the concept of attention is subjective, the number of comments received by a post serves as a very good indicator of the same. Unlike previous work which primarily made use of either content features or the network features (friendship links on the network), we exploit both the content features and community level features (for instance, what time of the day is the community more active) for tackling this problem. Further, we focus on dedicated pages of corporate brands on social media websites and accordingly extract important features from the content and community activity of such brand pages. The attention prediction task finds direct application in the listening, monitoring and engaging activities of the businesses that have such brand-pages. In this paper, we formulate the problem of attention prediction on social media brand pages. We further propose Attention Prediction (AP) framework which integrates the various features that influence the attention received by a post using classification and regression based approaches. Experimental results on real world data extracted from some highly active brand pages on Facebook demonstrate the efficacy of the proposed framework.

Quarterly Journal of Economics | 2017

Human Decisions and Machine Predictions

Jon M. Kleinberg; Himabindu Lakkaraju; Jure Leskovec; Jens Ludwig; Sendhil Mullainathan

Can machine learning improve human decision making? Bail decisions provide a good test case. Millions of times each year, judges make jail-or-release decisions that hinge on a prediction of what a defendant would do if released. The concreteness of the prediction task combined with the volume of data available makes this a promising machine-learning application. Yet comparing the algorithm to judges proves complicated. First, the available data are generated by prior judge decisions. We only observe crime outcomes for released defendants, not for those judges detained. This makes it hard to evaluate counterfactual decision rules based on algorithmic predictions. Second, judges may have a broader set of preferences than the variable the algorithm predicts; for instance, judges may care specifically about violent crimes or about racial inequities. We deal with these problems using different econometric strategies, such as quasi-random assignment of cases to judges. Even accounting for these concerns, our results suggest potentially large welfare gains: one policy simulation shows crime reductions up to 24.7% with no change in jailing rates, or jailing rate reductions up to 41.9% with no increase in crime rates. Moreover, all categories of crime, including violent crimes, show reductions; and these gains can be achieved while simultaneously reducing racial disparities. These results suggest that while machine learning can be valuable, realizing this value requires integrating these tools into an economic framework: being clear about the link between predictions and decisions; specifying the scope of payoff functions; and constructing unbiased decision counterfactuals. JEL Codes: C10 (Econometric and statistical methods and methodology), C55 (Large datasets: Modeling and analysis), K40 (Legal procedure, the legal system, and illegal behavior).

knowledge discovery and data mining | 2015

A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes

Himabindu Lakkaraju; Everaldo Aguiar; Carl Shan; David Miller; Nasir Bhanpuri; Rayid Ghani; Kecia L. Addison

Many school districts have developed successful intervention programs to help students graduate high school on time. However, identifying and prioritizing students who need those interventions the most remains challenging. This paper describes a machine learning framework to identify such students, discusses features that are useful for this task, applies several classification algorithms, and evaluates them using metrics important to school administrators. To help test this framework and make it practically useful, we partnered with two U.S. school districts with a combined enrollment of approximately 200,000 students. We together designed several evaluation metrics to assess the goodness of machine learning algorithms from an educators perspective. This paper focuses on students at risk of not finishing high school on time, but our framework lays a strong foundation for future work on other adverse academic outcomes.

learning analytics and knowledge | 2015

Who, when, and why: a machine learning approach to prioritizing students at risk of not graduating high school on time

Everaldo Aguiar; Himabindu Lakkaraju; Nasir Bhanpuri; David Miller; Ben Yuhas; Kecia L. Addison

Several hundred thousand students drop out of high school every year in the United States. Interventions can help those who are falling behind in their educational goals, but given limited resources, such programs must focus on the right students, at the right time, and with the right message. In this paper, we describe an incremental approach that can be used to select and prioritize students who may be at risk of not graduating high school on time, and to suggest what may be the predictors of particular students going off-track. These predictions can then be used to inform targeted interventions for these students, hopefully leading to better outcomes.

international conference on data mining | 2012

Dynamic Multi-relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media

Himabindu Lakkaraju; Indrajit Bhattacharya; Chiranjib Bhattacharyya

We study the problem of analyzing influence of various factors affecting individual messages posted in social media. The problem is challenging because of various types of influences propagating through the social media network that act simultaneously on any user. Additionally, the topic composition of the influencing factors and the susceptibility of users to these influences evolve over time. This problem has not been studied before, and off-the-shelf models are unsuitable for this purpose. To capture the complex interplay of these various factors, we propose a new non-parametric model called the Dynamic Multi-Relational Chinese Restaurant Process. This accounts for the user network for data generation and also allows the parameters to evolve over time. Designing inference algorithms for this model suited for large scale social-media data is another challenge. To this end, we propose a scalable and multi-threaded inference algorithm based on online Gibbs Sampling. Extensive evaluations on large-scale Twitter and Face book data show that the extracted topics when applied to authorship and commenting prediction outperform state-of-the-art baselines. More importantly, our model produces valuable insights on topic trends and user personality trends beyond the capability of existing approaches.

knowledge discovery and data mining | 2017

The Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of Unobservables

Himabindu Lakkaraju; Jon M. Kleinberg; Jure Leskovec; Jens Ludwig; Sendhil Mullainathan

Evaluating whether machines improve on human performance is one of the central questions of machine learning. However, there are many domains where the data is selectively labeled, in the sense that the observed outcomes are themselves a consequence of the existing choices of the human decision-makers. For instance, in the context of judicial bail decisions, we observe the outcome of whether a defendant fails to return for their court appearance only if the human judge decides to release the defendant on bail. This selective labeling makes it harder to evaluate predictive models as the instances for which outcomes are observed do not represent a random sample of the population. Here we propose a novel framework for evaluating the performance of predictive models on selectively labeled data. We develop an approach called contraction which allows us to compare the performance of predictive models and human decision-makers without resorting to counterfactual inference. Our methodology harnesses the heterogeneity of human decision-makers and facilitates effective evaluation of predictive models even in the presence of unmeasured confounders (unobservables) which influence both human decisions and the resulting outcomes. Experimental results on real world datasets spanning diverse domains such as health care, insurance, and criminal justice demonstrate the utility of our evaluation metric in comparing human decisions and machine predictions.

international world wide web conferences | 2011

Smart news feeds for social networks using scalable joint latent factor models

Himabindu Lakkaraju; Angshu Rai; Srujana Merugu

Social networks such as Facebook and Twitter offer a huge opportunity to tap the collective wisdom (both published and yet to be published) of all the participating users in order to address the information needs of individual users in a highly contextualized fashion using rich user-specific information. Realizing this opportunity, however, requires addressing two key limitations of current social networks: (a) difficulty in discovering relevant content beyond the immediate neighborhood, (b) lack of support for information filtering based on semantics, content source and linkage. We propose a scalable framework for constructing smart news feeds based on predicting user-post relevance using multiple signals such as text content and attributes of users and posts, and various user-user, post-post and user-post relations (e.g. friend, comment, author relations). Our solution comprises of two steps where the first step ensures scalability by selecting a small set of user-post dyads with potentially interesting interactions using inverted feature indexes. The second step models the interactions associated with the selected dyads via a joint latent factor model, which assumes that the user/post content and relationships can be effectively captured by a common latent representation of the users and posts. Experiments on a Facebook dataset using the proposed model lead to improved precision/recall on relevant posts indicating potential for constructing superior quality news feeds.

international world wide web conferences | 2012

TEM: a novel perspective to modeling content onmicroblogs

Himabindu Lakkaraju; Hyung-il Ahn

In recent times, microblogging sites like Facebook and Twitter have gained a lot of popularity. Millions of users world wide have been using these sites to post content that interests them and also to voice their opinions on several current events. In this paper, we present a novel non-parametric probabilistic model - Temporally driven Theme Event Model (TEM) for analyzing the content on microblogs. We also describe an online inference procedure for this model that enables its usage on large scale data. Experimentation carried out on real world data extracted from Facebook and Twitter demonstrates the efficacy of the proposed approach.

international conference on weblogs and social media | 2013