Manoj Apte
Tata Consultancy Services
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Manoj Apte.
Data Mining and Knowledge Discovery | 2008
Girish Keshav Palshikar; Manoj Apte
Many mal-practices in stock market trading—e.g., circular trading and price manipulation—use the modus operandi of collusion. Informally, a set of traders is a candidate collusion set when they have “heavy trading” among themselves, as compared to their trading with others. We formalize the problem of detection of collusion sets, if any, in the given trading database. We show that naïve approaches are inefficient for real-life situations. We adapt and apply two well-known graph clustering algorithms for this problem. We also propose a new graph clustering algorithm, specifically tailored for detecting collusion sets. A novel feature of our approach is the use of Dempster–Schafer theory of evidence to combine the candidate collusion sets detected by individual algorithms. Treating individual experiments as evidence, this approach allows us to quantify the confidence (or belief) in the candidate collusion sets. We present detailed simulation experiments to demonstrate effectiveness of the proposed algorithms.
data and knowledge engineering | 2007
Girish Keshav Palshikar; Mandar S. Kale; Manoj Apte
A well-known problem that limits the practical usage of association rule mining algorithms is the extremely large number of rules generated. Such a large number of rules makes the algorithms inefficient and makes it difficult for the end users to comprehend the discovered rules. We present the concept of a heavy itemset. An itemset A is heavy (for given support and confidence values) if all possible association rules made up of items only in A are present. We prove a simple necessary and sufficient condition for an itemset to be heavy. We present a formula for the number of possible rules for a given heavy itemset, and show that a heavy itemset compactly represents an exponential number of association rules. Along with two simple search algorithms, we present an efficient greedy algorithm to generate a collection of disjoint heavy itemsets in a given transaction database. We then present a modified apriori algorithm that starts with a given collection of disjoint heavy itemsets and discovers more heavy itemsets, not necessarily disjoint with the given ones.
applications of natural language to data bases | 2016
Nitin Ramrakhiyani; Sachin Pawar; Girish Keshav Palshikar; Manoj Apte
Performance appraisal (PA) is an important Human Resources exercise conducted by most organizations. The text data generated during the PA process can be a source of valuable insights for management. As a new application area, analysis of a large PA dataset (100K sentences) of supervisor feedback text is carried out. As the first contribution, the paper redefines the notion of an aspect in the feedback text. Aspects in PA text are like activities characterized by verb-noun pairs. These activities vary dynamically from employee to employee (e.g. conduct training, improve coding) and can be challenging to identify than the static properties of products like a camera (e.g. price, battery life). Another important contribution of the paper is a novel enhancement to the Label Propagation (LP) algorithm to identify aspects from PA text. It involves induction of a prior distribution for each node and iterative identification of new aspects starting from a seed set. Evaluation using a manually labelled set of 500 verb-noun pairs suggests an improvement over multiple baselines.
Emerging Trends in ICT Security | 2014
Girish Keshav Palshikar; Manoj Apte
Money laundering (ML) is a serious problem for the economies and financial institutions around the world. Financial institutions get used by organized criminals and terrorists as vehicles of large-scale money laundering, which presents the institutions with challenges of regulatory compliance, maintaining financial security, preserving goodwill and reputation, and avoiding operational risks like liquidity crunch and lawsuits. Hence prevention, detection, and control of ML is crucial for the financial security and risk management of financial institutions. Realizing the gravity of ML, various nations have started anti-ML (AML) activities, along with cooperative international efforts, including Financial Action Task Force, Egmont Group and Wolfsberg Group. This chapter begins with an overview of ML, discusses commonly used methods of ML, and the anti-ML efforts worldwide. After surveying some analytics techniques used to estimate the extent of ML, some data-mining techniques reported in the literature for detection of ML episodes (instances) are surveyed.
Information Systems Frontiers | 2018
Girish Keshav Palshikar; Manoj Apte; Deepak Pandita
Social media has quickly established itself as an important means that people, NGOs and governments use to spread information during natural or man-made disasters, mass emergencies and crisis situations. Given this important role, real-time analysis of social media contents to locate, organize and use valuable information for disaster management is crucial. In this paper, we propose self-learning algorithms that, with minimal supervision, construct a simple bag-of-words model of information expressed in the news about various natural disasters. Such a model is human-understandable, human-modifiable and usable in a real-time scenario. Since tweets are a different category of documents than news, we next propose a model transfer algorithm, which essentially refines the model learned from news by analyzing a large unlabeled corpus of tweets. We show empirically that model transfer improves the predictive accuracy of the model. We demonstrate empirically that our model learning algorithm is better than several state of the art semi-supervised learning algorithms. Finally, we present an online algorithm that learns the weights for words in the model and demonstrate the efficacy of the model with word weights.
Archive | 2019
Manoj Apte; Girish Keshav Palshikar; Sriram Baskaran
With the widespread use of computers, communications infrastructure, and the Internet, online social networks (OSN) have gained a huge popularity in recent years. Unfortunately, the very nature and popularity of OSN have brought about its own share of frauds and misuse. Frauds in OSN refer to activities that result in harassment, loss of money, loss of reputation of a person or an entity, loss of trust in the system or an individual, etc. Due to the complex structure, and information flow in OSN, as well as the relative anonymity of the identity, detection, control and prevention of frauds in OSN is difficult, time-consuming, error-prone and demands an unusually high level of technical finesse from the investigators. In this paper, we begin with a simple typology of OSN frauds and then follow up by describing in detail the nature of each fraud and by reviewing some of the state-of-the-art research done so far (mostly in machine learning, data mining, and text mining) to detect them. Where possible, we stress on the scale and impact of these frauds. We identify manipulation of identities and diffusion of misinformation as two important aspects in the modus operandi of most types of OSN frauds. We identify manipulation of identities and diffusion of misinformation as two important aspects in the modus operandi of most types of OSN frauds.
international conference data science and management | 2018
Devendra Kumar Luna; Girish Keshav Palshikar; Manoj Apte; Arnab Bhattacharya
Money laundering refers to activities pertaining to hiding the true income, evading taxes, or converting illegally earned money for normal use. These activities are often performed through shell companies that masquerade as real companies but where actual the purpose is to launder money. Shell companies are used in all the three phases of money laundering, namely, placement, layering, and integration, often simultaneously. In this paper, we aim to identify shell companies. We propose to use only bank transactions since that is easily available. In particular, we look at all incoming and outgoing transactions from a particular bank account along with its various attributes, and use anomaly detection techniques to identify the accounts that pertain to shell companies. Our aim is to create an initial list of potential shell company candidates which can be investigated by financial experts later. Due to lack of real data, we propose a banking transactions simulator (BTS) to simulate both honest as well as shell company transactions by studying a host of actual real-world fraud cases. We apply anomaly detection algorithms to detect candidate shell companies. Results indicate that we are able to identify the shell companies with a high degree of precision and recall.1
ieee international conference on data science and advanced analytics | 2017
Girish Keshav Palshikar; Manoj Apte; Sachin Pawar; Nitin Ramrakhiyani
Performance appraisal (PA) is a crucial HR process that enables an organization to periodically measure and evaluate every employee’s performance and also to drive performance improvements. In this paper, we describe a novel system called HiSPEED to analyze PA data using automated statistical, data mining and text mining techniques, to generate novel and actionable insights/patterns and to help in improving the quality and effectiveness of the PA process. The goal is to produce insights that can be used to answer (in part) the crucial “business questions” that HR executives and business leadership face in talent management. The business questions pertain to (1) improving the quality of the goal setting process, (2) improving the quality of the self-appraisal comments and supervisor feedback comments, (3) discovering high-quality supervisor suggestions for performance improvements, (4) discovering evidence provided by employees to support their self-assessments, (5) measuring the quality of supervisor assessments, (6) understanding the root causes of poor and exceptional performances, (7) detecting instances of personal and systemic biases and so forth. The paper discusses specially designed algorithms to answer these business questions and illustrates them by reporting the insights produced on a real-life PA dataset from a large multinational IT services organization.
bangalore annual compute conference | 2014
Sriram Baskaran; Manoj Apte
Visualization of the raw data into meaningful information is necessary for the proper understanding of the concepts at hand. Many real life scenarios require visualization of data as a statistical chart. Creation of such charts with dynamic data brings in challenges like impact at different levels in the code, redundancy of the code and restarts of servers every time a new build is released. In this paper, we have studied the issues with the conventional way of development and proposed a framework for charts by giving specifications in a separate eXtensible Markup Language (XML) file called Single File Specification System (SFS) and automating the process of creation of the charts and tables. We present RAPIDCharts, a framework that will reduce the impact of change or addition of new charts, redundancy of code and server restarts using an external file where the specifications are mentioned.
conference on management of data | 2016
Manoj Apte; Sachin Pawar; Sangameshwar Patil; Sriram Baskaran; Apoorv Shrivastava; Girish Keshav Palshikar