Rayid Ghani | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rayid Ghani is active.

Explore More

Publication

Featured researches published by Rayid Ghani.

knowledge discovery and data mining | 2015

A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes

Himabindu Lakkaraju; Everaldo Aguiar; Carl Shan; David Miller; Nasir Bhanpuri; Rayid Ghani; Kecia L. Addison

Many school districts have developed successful intervention programs to help students graduate high school on time. However, identifying and prioritizing students who need those interventions the most remains challenging. This paper describes a machine learning framework to identify such students, discusses features that are useful for this task, applies several classification algorithms, and evaluates them using metrics important to school administrators. To help test this framework and make it practically useful, we partnered with two U.S. school districts with a combined enrollment of approximately 200,000 students. We together designed several evaluation metrics to assess the goodness of machine learning algorithms from an educators perspective. This paper focuses on students at risk of not finishing high school on time, but our framework lays a strong foundation for future work on other adverse academic outcomes.

international conference on data mining | 2013

Online Active Learning with Imbalanced Classes

Zahra Ferdowsi; Rayid Ghani; Raffaella Settimi

This paper proposes an online algorithm for active learning that switches between different candidate instance selection strategies (ISS) for classification in imbalanced data sets. This is important for two reasons: 1) many real-world problems have imbalanced class distributions and 2) there is no ISS that always outperforms all the other techniques. We first empirically compare the performance of existing techniques on imbalanced data sets and show that different strategies work better on different data sets and some techniques even hurt compared to random selection. We then propose an unsupervised score to track and predict the performance of individual instance selection techniques, allowing us to select an effective technique without using a holdout set and wasting valuable labeled data. This score is used in a simple online learning approach that switches between different ISS at each iteration. The proposed approach performs better than the best individual strategy available to the online algorithm over data sets in this paper and provides a way to build practical and effective active learning system for imbalanced data sets.

knowledge discovery and data mining | 2015

Early Prediction of Cardiac Arrest (Code Blue) using Electronic Medical Records

Sriram Somanchi; Samrachana Adhikari; Allen Lin; Elena Eneva; Rayid Ghani

Code Blue is an emergency code used in hospitals to indicate when a patient goes into cardiac arrest and needs resuscitation. When Code Blue is called, an on-call medical team staffed by physicians and nurses is paged and rushes in to try to save the patients life. It is an intense, chaotic, and resource-intensive process, and despite the considerable effort, survival rates are still less than 20% [4]. Research indicates that patients actually start showing clinical signs of deterioration some time before going into cardiac arrest [1][2[][3], making early prediction, and possibly intervention, feasible. In this paper, we describe our work, in partnership with NorthShore University HealthSystem, that preemptively flags patients who are likely to go into cardiac arrest, using signals extracted from demographic information, hospitalization history, vitals and laboratory measurements in patient-level electronic medical records. We find that early prediction of Code Blue is possible and when compared with state of the art existing method used by hospitals (MEWS - Modified Early Warning Score)[4], our methods perform significantly better. Based on these results, this system is now being considered for deployment in hospital settings.

knowledge discovery and data mining | 2015

Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning

Eric Potash; Joe Brew; Alexander Loewi; Subhabrata Majumdar; Andrew Reece; Joe Walsh; Eric William Davis Rozier; Emile Jorgenson; Raed Mansour; Rayid Ghani

Lead poisoning is a major public health problem that affects hundreds of thousands of children in the United States every year. A common approach to identifying lead hazards is to test all children for elevated blood lead levels and then investigate and remediate the homes of children with elevated tests. This can prevent exposure to lead of future residents, but only after a child has been poisoned. This paper describes joint work with the Chicago Department of Public Health (CDPH) in which we build a model that predicts the risk of a child to being poisoned so that an intervention can take place before that happens. Using two decades of blood lead level tests, home lead inspections, property value assessments, and census data, our model allows inspectors to prioritize houses on an intractably long list of potential hazards and identify children who are at the highest risk. This work has been described by CDPH as pioneering in the use of machine learning and predictive analytics in public health and has the potential to have a significant impact on both health and economic outcomes for communities across the US.

Criminal Justice Policy Review | 2018

Early Intervention Systems: Predicting Adverse Interactions Between Police and the Public:

Jennifer Helsby; Samuel Carton; Kenneth Joseph; Ayesha Mahmud; Youngsoo Park; Andrea Navarrete; Klaus Ackermann; Joe Walsh; Lauren Haynes; Crystal Cody; Major Estella Patterson; Rayid Ghani

Adverse interactions between police and the public hurt police legitimacy, cause harm to both officers and the public, and result in costly litigation. Early intervention systems (EISs) that flag officers considered most likely to be involved in one of these adverse events are an important tool for police supervision and for targeting interventions such as counseling or training. However, the EISs that exist are not data-driven and based on supervisor intuition. We have developed a data-driven EIS that uses a diverse set of data sources from the Charlotte-Mecklenburg Police Department and machine learning techniques to more accurately predict the officers who will have an adverse event. Our approach is able to significantly improve accuracy compared with their existing EIS: Preliminary results indicate a 20% reduction in false positives and a 75% increase in true positives.

knowledge discovery and data mining | 2016

The Legislative Influence Detector: Finding Text Reuse in State Legislation

Matthew Burgess; Eugenia Giraudy; Julian Katz-Samuels; Joe Walsh; Derek Willis; Lauren Haynes; Rayid Ghani

State legislatures introduce at least 45,000 bills each year. However, we lack a clear understanding of who is actually writing those bills. As legislators often lack the time and staff to draft each bill, they frequently copy text written by other states or interest groups. However, existing approaches to detect text reuse are slow, biased, and incomplete. Journalists or researchers who want to know where a particular bill originated must perform a largely manual search. Watchdog organizations even hire armies of volunteers to monitor legislation for matches. Given the time-consuming nature of the analysis, journalists and researchers tend to limit their analysis to a subset of topics (e.g. abortion or gun control) or a few interest groups. This paper presents the Legislative Influence Detector (LID). LID uses the Smith-Waterman local alignment algorithm to detect sequences of text that occur in model legislation and state bills. As it is computationally too expensive to run this algorithm on a large corpus of data, we use a search engine built using Elasticsearch to limit the number of comparisons. We show how system has found 45,405 instances of bill-to-bill text reuse and 14,137 instances of model-legislation-to-bill text reuse. System reduces the time it takes to manually find text reuse from days to seconds.

knowledge discovery and data mining | 2016

Designing Policy Recommendations to Reduce Home Abandonment in Mexico

Klaus Ackermann; Eduardo Blancas Reyes; Sue He; Thomas Anderson Keller; Paul van der Boor; Romana Khan; Rayid Ghani; José Carlos González

Infonavit, the largest provider of mortgages in Mexico, assists working families to obtain low-interest rate housing solutions. An increasingly prevalent problem is home abandonment: when a homeowner decides to leave their property and forego their investment. A major causal factor of this outcome is a mismatch between the homeowners needs, in terms of access to services and employment, and the location characteristics of the home. This paper describes our collaboration with Infonavit to reduce home abandonment at two levels: develop policy recommendations for targeted improvements in location characteristics, and develop a decision-support tool to assist the homeowner in the home location decision. Using 20 years of mortgage history data combined with surveys, census, and location information, we develop a model to predict the probability of home abandonment based on both individual and location characteristics. The model is used to develop a tool that provides Infonavit the ability to give advice to Mexican workers when they apply for a loan, evaluate and improve the locations of new housing developments, and provide data-driven recommendations to the federal government to influence local development initiatives and infrastructure investments. The result is improving economic outcomes for the citizens of Mexico by pre-emptively identifying at-risk home mortgages, thereby allowing them to be altered or remedied before they result in abandonment.

knowledge discovery and data mining | 2016

Identifying Police Officers at Risk of Adverse Events

Samuel Carton; Jennifer Helsby; Kenneth Joseph; Ayesha Mahmud; Youngsoo Park; Joe Walsh; Crystal Cody; Cpt Estella Patterson; Lauren Haynes; Rayid Ghani

Adverse events between police and the public, such as deadly shootings or instances of racial profiling, can cause serious or deadly harm, damage police legitimacy, and result in costly litigation. Evidence suggests these events can be prevented by targeting interventions based on an Early Intervention System (EIS) that flags police officers who are at a high risk for involvement in such adverse events. Todays EIS are not data-driven and typically rely on simple thresholds based entirely on expert intuition. In this paper, we describe our work with the Charlotte-Mecklenburg Police Department (CMPD) to develop a machine learning model to predict which officers are at risk for an adverse event. Our approach significantly outperforms CMPDs existing EIS, increasing true positives by ~12% and decreasing false positives by ~32%. Our work also sheds light on features related to officer characteristics, situational factors, and neighborhood factors that are predictive of adverse events. This work provides a starting point for police departments to take a comprehensive, data-driven approach to improve policing and reduce harm to both officers and members of the public.

international conference on big data | 2016

Detecting fraud, corruption, and collusion in international development contracts: The design of a proof-of-concept automated system

Emily Grace; Ankit Rai; Elissa M. Redmiles; Rayid Ghani

International development banks provide low-interest loans to developing countries in an effort to stimulate social and economic development. These loans support key infrastructure projects including the building of roads, schools, and hospitals. However, despite the best efforts of development banks, these loan funds are often lost to fraud, corruption, and collusion. In an effort to sanction and deter this wrongdoing and to ensure proper use of funds, development banks conduct extensive, costly investigations that can take over a year to complete. This paper describes a proof-of-concept of a fully automated fraud, corruption, and collusion classification system for identifying risk in international development contracts. We developed this system in conjunction with the World Bank Group — the largest international development bank — to improve the time and cost efficiency of their investigation process. Using historical monetary award data and past investigation outcomes, our classifier assigns a “risk score” to World Bank contracts. This risk score is designed to enable World Bank investigators to identify the contracts most likely to lead to a substantiated investigation. If implemented, our automated system is predicted to successfully identify fraud, corruption, and collusion in 70% of cases.

knowledge discovery and data mining | 2018

Deploying Machine Learning Models for Public Policy: A Framework

Klaus Ackermann; Joe Walsh; Adolfo De Unánue; Hareem Naveed; Andrea Navarrete Rivera; Sun-Joo Lee; Jason Bennett; Michael Defoe; Crystal Cody; Lauren Haynes; Rayid Ghani

Machine learning research typically focuses on optimization and testing on a few criteria, but deployment in a public policy setting requires more. Technical and non-technical deployment issues get relatively little attention. However, for machine learning models to have real-world benefit and impact, effective deployment is crucial. In this case study, we describe our implementation of a machine learning early intervention system (EIS) for police officers in the Charlotte-Mecklenburg (North Carolina) and Metropolitan Nashville (Tennessee) Police Departments. The EIS identifies officers at high risk of having an adverse incident, such as an unjustified use of force or sustained complaint. We deployed the same code base at both departments, which have different underlying data sources and data structures. Deployment required us to solve several new problems, covering technical implementation, governance of the system, the cost to use the system, and trust in the system. In this paper we describe how we addressed and solved several of these challenges and provide guidance and a framework of important issues to consider for future deployments.

Explore More