Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Muhammad Bilal Zafar is active.

Publication


Featured researches published by Muhammad Bilal Zafar.


conference on recommender systems | 2014

Inferring user interests in the Twitter social network

Parantapa Bhattacharya; Muhammad Bilal Zafar; Niloy Ganguly; Saptarshi Ghosh; Krishna P. Gummadi

We propose a novel mechanism to infer topics of interest of individual users in the Twitter social network. We observe that in Twitter, a user generally follows experts on various topics of her interest in order to acquire information on those topics. We use a methodology based on social annotations (proposed earlier by us) to first deduce the topical expertise of popular Twitter users, and then transitively infer the interests of the users who follow them. This methodology is a sharp departure from the traditional techniques of inferring interests of a user from the tweets that she posts or receives. We show that the topics of interest inferred by the proposed methodology are far superior than the topics extracted by state-of-the-art techniques such as using topic models (Labeled LDA) on tweets. Based upon the proposed methodology, we build a system Who Likes What, which can infer the interests of millions of Twitter users. To our knowledge, this is the first system that can infer interests for Twitter users at such scale. Hence, this system would be particularly beneficial in developing personalized recommender services over the Twitter platform.


international world wide web conferences | 2017

Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment

Muhammad Bilal Zafar; Isabel Valera; Manuel Gomez Rodriguez; Krishna P. Gummadi

Automated data-driven decision making systems are increasingly being used to assist, or even replace humans in many settings. These systems function by learning from historical decisions, often taken by humans. In order to maximize the utility of these systems (or, classifiers), their training involves minimizing the errors (or, misclassifications) over the given historical data. However, it is quite possible that the optimally trained classifier makes decisions for people belonging to different social groups with different misclassification rates (e.g., misclassification rates for females are higher than for males), thereby placing these groups at an unfair disadvantage. To account for and avoid such unfairness, in this paper, we introduce a new notion of unfairness, disparate mistreatment, which is defined in terms of misclassification rates. We then propose intuitive measures of disparate mistreatment for decision boundary-based classifiers, which can be easily incorporated into their formulation as convex-concave constraints. Experiments on synthetic as well as real world datasets show that our methodology is effective at avoiding disparate mistreatment, often at a small cost in terms of accuracy.


conference on information and knowledge management | 2013

On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream

Saptarshi Ghosh; Muhammad Bilal Zafar; Parantapa Bhattacharya; Naveen Kumar Sharma; Niloy Ganguly; P. Krishna Gummadi

Several applications today rely upon content streams crowd-sourced from online social networks. Since real-time processing of large amounts of data generated on these sites is difficult, analytics companies and researchers are increasingly resorting to sampling. In this paper, we investigate the crucial question of how to sample the data generated by users in social networks. The traditional method is to randomly sample all the data. We analyze a different sampling methodology, where content is gathered only from a relatively small subset (< 1%) of the user population namely, the expert users. Over the duration of a month, we gathered tweets from over 500,000 Twitter users who are identified as experts on a diverse set of topics, and compared the resulting expert-sampled tweets with the 1% randomly sampled tweets provided publicly by Twitter. We compared the sampled datasets along several dimensions, including the diversity, timeliness, and trustworthiness of the information contained within them, and find important differences between the datasets. Our observations have major implications for applications such as topical search, trustworthy content recommendations, and breaking news detection.


conference on computer supported cooperative work | 2014

Deep Twitter diving: exploring topical groups in microblogs at scale

Parantapa Bhattacharya; Saptarshi Ghosh; Juhi Kulshrestha; Mainack Mondal; Muhammad Bilal Zafar; Niloy Ganguly; Krishna P. Gummadi

We present a semantic methodology to identify topical groups in Twitter on a large number of topics, each consisting of users who are experts on or interested in a specific topic. Early studies investigating the nature of Twitter suggest that it is a social media platform consisting of a relatively small section of elite users, producing information on a few popular topics such as media, politics, and music, and the general population consuming it. We show that this characterization ignores a rich set of highly specialized topics, ranging from geology, neurology, to astrophysics and karate - each being discussed by their own topical groups. We present a detailed characterization of these topical groups based on their network structures and tweeting behaviors. Analyzing these groups on the backdrop of the common identity and bond theory in social sciences shows that these groups exhibit characteristics of topical-identity based groups, rather than social-bond based ones.


ACM Transactions on The Web | 2015

Sampling Content from Online Social Networks: Comparing Random vs. Expert Sampling of the Twitter Stream

Muhammad Bilal Zafar; Parantapa Bhattacharya; Niloy Ganguly; Krishna P. Gummadi; Saptarshi Ghosh

Analysis of content streams gathered from social networking sites such as Twitter has several applications ranging from content search and recommendation, news detection to business analytics. However, processing large amounts of data generated on these sites in real-time poses a difficult challenge. To cope with the data deluge, analytics companies and researchers are increasingly resorting to sampling. In this article, we investigate the crucial question of how to sample content streams generated by users in online social networks. The traditional method is to randomly sample all the data. For example, most studies using Twitter data today rely on the 1% and 10% randomly sampled streams of tweets that are provided by Twitter. In this paper, we analyze a different sampling methodology, one where content is gathered only from a relatively small sample (<1%) of the user population, namely, the expert users. Over the duration of a month, we gathered tweets from over 500,000 Twitter users who are identified as experts on a diverse set of topics, and compared the resulting expert sampled tweets with the 1% randomly sampled tweets provided publicly by Twitter. We compared the sampled datasets along several dimensions, including the popularity, topical diversity, trustworthiness, and timeliness of the information contained within them, and on the sentiment/opinion expressed on specific topics. Our analysis reveals several important differences in data obtained through the different sampling methodologies, which have serious implications for applications such as topical search, trustworthy content recommendations, breaking news detection, and opinion mining.


conference on online social networks | 2015

Strength in Numbers: Robust Tamper Detection in Crowd Computations

Bimal Viswanath; Muhammad Ahmad Bashir; Muhammad Bilal Zafar; Simon Bouget; Saikat Guha; Krishna P. Gummadi; Aniket Kate; Alan Mislove

Popular social and e-commerce sites increasingly rely on crowd computing to rate and rank content, users, products and businesses. Today, attackers who create fake (Sybil) identities can easily tamper with these computations. Existing defenses that largely focus on detecting individual Sybil identities have a fundamental limitation: Adaptive attackers can create hard-to-detect Sybil identities to tamper arbitrary crowd computations. In this paper, we propose Stamper, an approach for detecting tampered crowd computations that significantly raises the bar for evasion by adaptive attackers. Stamper design is based on two key insights: First, Sybil attack detection gains strength in numbers: we propose statistical analysis techniques that can determine if a large crowd computation has been tampered by Sybils, even when it is fundamentally hard to infer which of the participating identities are Sybil. Second, Sybil identities cannot forge the timestamps of their activities as they are recorded by system operators; Stamper analyzes these unforgeable timestamps to foil adaptive attackers. We applied Stamper to detect tampered computations in Yelp and Twitter. We not only detected previously known tampered computations with high accuracy, but also uncovered tens of thousands of previously unknown tampered computations in these systems.


privacy enhancing technologies | 2016

Listening to Whispers of Ripple: Linking Wallets and Deanonymizing Transactions in the Ripple Network

Pedro Moreno-Sanchez; Muhammad Bilal Zafar; Aniket Kate

Abstract The decentralized I owe you (IOU) transaction network Ripple is gaining prominence as a fast, low-cost and efficient method for performing same and cross-currency payments. Ripple keeps track of IOU credit its users have granted to their business partners or friends, and settles transactions between two connected Ripple wallets by appropriately changing credit values on the connecting paths. Similar to cryptocurrencies such as Bitcoin, while the ownership of the wallets is implicitly pseudonymous in Ripple, IOU credit links and transaction flows between wallets are publicly available in an online ledger. In this paper, we present the first thorough study that analyzes this globally visible log and characterizes the privacy issues with the current Ripple network. In particular, we define two novel heuristics and perform heuristic clustering to group wallets based on observations on the Ripple network graph. We then propose reidentification mechanisms to deanonymize the operators of those clusters and show how to reconstruct the financial activities of deanonymized Ripple wallets. Our analysis motivates the need for better privacy-preserving payment mechanisms for Ripple and characterizes the privacy challenges faced by the emerging credit networks.


conference on computer supported cooperative work | 2016

On the Wisdom of Experts vs. Crowds: Discovering Trustworthy Topical News in Microblogs

Muhammad Bilal Zafar; Parantapa Bhattacharya; Niloy Ganguly; Saptasrshi Ghosh; Krishna P. Gummadi

Extracting news on specific topics from the Twitter microblogging site poses formidable challenges, which include handling millions of tweets posted daily, judging topicality and importance of tweets, and ensuring trustworthiness of results in the face of spam. To date, all scalable approaches have relied on crowd wisdom, i.e., keyword-matching on the global tweet stream to gather relevant tweets, and crowd- endorsements to judge the importance of tweets. We propose a fundamentally different methodology -- for a given topic, we identify trustworthy experts on the topic, and extract news-stories that are most popular among the experts. Comparing the crowd-based and expert-based methodologies, we demonstrate that the news-stories obtained by our methodology (i) have higher relevance for a wide variety of topics, (ii) achieve very high coverage of important news-stories posted globally in Twitter, and (iii) are far more trustworthy. Using our methodology, we implemented and publicly deployed a topical news system for Twitter, which can extract news-stories on thousands of topics.


knowledge discovery and data mining | 2018

A Unified Approach to Quantifying Algorithmic Unfairness: Measuring Individual &Group Unfairness via Inequality Indices

Till Speicher; Hoda Heidari; Nina Grgić-Hlača; Krishna P. Gummadi; Adish Singla; Adrian Weller; Muhammad Bilal Zafar

Discrimination via algorithmic decision making has received considerable attention. Prior work largely focuses on defining conditions for fairness, but does not define satisfactory measures of algorithmic unfairness. In this paper, we focus on the following question: Given two unfair algorithms, how should we determine which of the two is more unfair? Our core idea is to use existing inequality indices from economics to measure how unequally the outcomes of an algorithm benefit different individuals or groups in a population. Our work offers a justified and general framework to compare and contrast the (un)fairness of algorithmic predictors. This unifying approach enables us to quantify unfairness both at the individual and the group level. Further, our work reveals overlooked tradeoffs between different fairness notions: using our proposed measures, the overall individual-level unfairness of an algorithm can be decomposed into a between-group and a within-group component. Earlier methods are typically designed to tackle only between-group un- fairness, which may be justified for legal or other reasons. However, we demonstrate that minimizing exclusively the between-group component may, in fact, increase the within-group, and hence the overall unfairness. We characterize and illustrate the tradeoffs between our measures of (un)fairness and the prediction accuracy.


Information Retrieval Journal | 2018

Search bias quantification: investigating political bias in social media and web search

Juhi Kulshrestha; Motahhare Eslami; Johnnatan Messias; Muhammad Bilal Zafar; Saptarshi Ghosh; Krishna P. Gummadi

Users frequently use search systems on the Web as well as online social media to learn about ongoing events and public opinion on personalities. Prior studies have shown that the top-ranked results returned by these search engines can shape user opinion about the topic (e.g., event or person) being searched. In case of polarizing topics like politics, where multiple competing perspectives exist, the political bias in the top search results can play a significant role in shaping public opinion towards (or away from) certain perspectives. Given the considerable impact that search bias can have on the user, we propose a generalizable search bias quantification framework that not only measures the political bias in ranked list output by the search system but also decouples the bias introduced by the different sources—input data and ranking system. We apply our framework to study the political bias in searches related to 2016 US Presidential primaries in Twitter social media search and find that both input data and ranking system matter in determining the final search output bias seen by the users. And finally, we use the framework to compare the relative bias for two popular search systems—Twitter social media search and Google web search—for queries related to politicians and political events. We end by discussing some potential solutions to signal the bias in the search results to make the users more aware of them.

Collaboration


Dive into the Muhammad Bilal Zafar's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Saptarshi Ghosh

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Niloy Ganguly

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Parantapa Bhattacharya

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge