Vijil Chenthamarakshan

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vijil Chenthamarakshan is active.

Explore More

Publication

Featured researches published by Vijil Chenthamarakshan.

conference on information and knowledge management | 2010

PROSPECT: a system for screening candidates for recruitment

Amit Singh; Catherine Rose; Karthik Visweswariah; Vijil Chenthamarakshan; Nandakishore Kambhatla

Companies often receive thousands of resumes for each job posting and employ dedicated screeners to short list qualified applicants. In this paper, we present PROSPECT, a decision support tool to help these screeners shortlist resumes efficiently. Prospect mines resumes to extract salient aspects of candidate profiles like skills, experience in each skill, education details and past experience. Extracted information is presented in the form of facets to aid recruiters in the task of screening. We also employ Information Retrieval techniques to rank all applicants for a given job opening. In our experiments we show that extracted information improves our ranking by 30% there by making screening task simpler and more efficient.

analytics for noisy unstructured text data | 2008

Rule based synonyms for entity extraction from noisy text

Rema Ananthanarayanan; Vijil Chenthamarakshan; Prasad M. Deshpande; Raghuram Krishnapuram

Identification of named entities such as person, organization and product names from text is an important task in information extraction. In many domains, the same entity could be referred to in multiple ways due to variations introduced by different user groups, variations of spellings across regions or cultures, usage of abbreviations, typographical errors and other reasons associated with conventional usage. Identifying a piece of text as a mention of an entity in such noisy data is difficult, even if we have a dictionary of possible entities. Previous approaches treat the synonym problem as part entity disambiguation and use learning-based methods that use the context of the words to identify synonyms. In this paper, we show that existing domain knowledge, encoded as rules, can be used effectively to address the synonym problem to a considerable extent. This makes the disambiguation task simpler, without the need for much training data. We look at a subset of application scenarios in named entity extraction, categorize the possible variations in entity names, and define rules for each category. Using these rules, we generate synonyms for the canonical list and match these synonyms to the actual occurrence in the data sets. In particular, we describe the rule categories that we developed for several named entities and report the results of applying our technique of extracting named entities by generating synonyms for two different domains.

knowledge discovery and data mining | 2014

Predicting employee expertise for talent management in the enterprise

Kush R. Varshney; Vijil Chenthamarakshan; Scott W. Fancher; Jun Wang; Dongping Fang; Aleksandra Mojsilovic

Strategic planning and talent management in large enterprises composed of knowledge workers requires complete, accurate, and up-to-date representation of the expertise of employees in a form that integrates with business processes. Like other similar organizations operating in dynamic environments, the IBM Corporation strives to maintain such current and correct information, specifically assessments of employees against job roles and skill sets from its expertise taxonomy. In this work, we deploy an analytics-driven solution that infers the expertise of employees through the mining of enterprise and social data that is not specifically generated and collected for expertise inference. We consider job role and specialty prediction and pose them as supervised classification problems. We evaluate a large number of feature sets, predictive models and postprocessing algorithms, and choose a combination for deployment. This expertise analytics system has been deployed for key employee population segments, yielding large reductions in manual effort and the ability to continually and consistently serve up-to-date and accurate data for several business functions. This expertise management system is in the process of being deployed throughout the corporation.

Ibm Journal of Research and Development | 2009

Leveraging social networks for corporate staffing and expert recommendation

Vijil Chenthamarakshan; Kuntal Dey; Jianying Hu; Aleksandra Mojsilovic; W. Riddle; Vikas Sindhwani

Effective management of human resources is a significant challenge faced by most organizations. In this paper, we look at two problems that arise in large, globally distributed organizations: staffing projects with the required subject matter experts and connecting subject matter experts to other employees who can benefit from their expertise. Several approaches based on automated skill matching have been suggested in the past to solve these problems. However, we argue that social relationships play an important role in both of these functions, and better matches can be obtained by combining skill matching with rich social interaction data. We describe two systems that exploit social networking data to solve these problems and report the results of real life experiments performed using these systems.

international joint conference on artificial intelligence | 2011

Concept labeling: building text classifiers with minimal supervision

Vijil Chenthamarakshan; Prem Melville; Vikas Sindhwani; Richard D. Lawrence

The rapid construction of supervised text classification models is becoming a pervasive need across many modern applications. To reduce human-labeling bottlenecks, many new statistical paradigms (e.g., active, semi-supervised, transfer and multi-task learning) have been vigorously pursued in recent literature with varying degrees of empirical success. Concurrently, the emergence of Web 2.0 platforms in the last decade has enabled a world-wide, collaborative human effort to construct a massive ontology of concepts with very rich, detailed and accurate descriptions. In this paper we propose a new framework to extract supervisory information from such ontologies and complement it with a shift in human effort from direct labeling of examples in the domain of interest to the much more efficient identification of concept-class associations. Through empirical studies on text categorization problems using the Wikipedia ontology, we show that this shift allows very high-quality models to be immediately induced at virtually no cost.

knowledge discovery and data mining | 2013

Amplifying the voice of youth in Africa via text analytics

Prem Melville; Vijil Chenthamarakshan; Richard D. Lawrence; James Powell; Moses Mugisha; Sharad Sapra; Rajesh Anandan; Solomon Assefa

U-report is an open-source SMS platform operated by UNICEF Uganda, designed to give community members a voice on issues that impact them. Data received by the system are either SMS responses to a poll conducted by UNICEF, or unsolicited reports of a problem occurring within the community. There are currently 200,000 U-report participants, and they send up to 10,000 unsolicited text messages a week. The objective of the program in Uganda is to understand the data in real-time, and have issues addressed by the appropriate department in UNICEF in a timely manner. Given the high volume and velocity of the data streams, manual inspection of all messages is no longer sustainable. This paper describes an automated message-understanding and routing system deployed by IBM at UNICEF. We employ recent advances in data mining to get the most out of labeled training data, while incorporating domain knowledge from experts. We discuss the trade-offs, design choices and challenges in applying such techniques in a real-world deployment.

international conference on data mining | 2010

ALPOS: A Machine Learning Approach for Analyzing Microblogging Data

Dan Zhang; Yan Liu; Richard D. Lawrence; Vijil Chenthamarakshan

With the development of Internet, the increasing volume of information posted on micro-blogging sites like Twitter necessitates the need for efficient information filtering. In conventional text classification problems, it is assumed that the feature vectors extracted from the available documents are sufficient to learn good classifiers. However, this conventional approach is not likely to work for Twitter due to the limited number of characters on each tweet. From a higher level, each tweet can be viewed as an abbreviated abstraction of a long document, and we only have a partial observation of this document. To solve the problem caused by the partial observations, we introduce a novel domain adaption/transfer learning approach called Assisted Learning for Partial Observation (ALPOS). The basic idea is to use a large number of multi-labeled examples (source domain) to improve the learning on the partial observations (target domain). In particular, we learn a hidden, higher-level abstraction space, which is meaningful for the multi-labeled examples in the source domain. This is done by simultaneously minimizing the document reconstruction error and the error in a classification model learned in the hidden space by using known labels from the source domain. The partial observations in the target space are then mapped to the same hidden space for recovery and classification. We compare the performance of this method with existing approaches on synthetic data and the well-known Reuters-21578 dataset. We also present experimental results on twitter classification.

ieee international conference on services computing | 2009

Dependency Analysis Framework for Software Service Delivery

Rema Ananthanarayanan; Vijil Chenthamarakshan; Heng Chu; Prasad M. Deshpande; Raghu Krishnapuram; Shajeer K. Mohammed

Various phases in the delivery of software services such as solution design, application deployment, and maintenance require analysis of the dependencies of software products that form the solution. As software systems become more complex and involve a large number of software products from multiple vendors, availability of correct and up-to-date system requirement information becomes critical to ensure proper functioning of managed and maintained software solutions. System requirement information, is mostly made available in unstructured formats from sources such as websites or product documents and are not amenable to programmatic analysis. In this paper, we motivate the benefits of capturing this information in a structured format for software service delivery, and present a dependency analysis system that collects and integrates software dependency/interoperability information from multiple unstructured sources using text mining techniques. Information hence collected, is used to support analytics useful in software service delivery. We report the results of our experiments on mining millions of web pages to collect dependency information for more than 700 software products.

web age information management | 2012

WYSIWYE: An Algebra for Expressing Spatial and Textual Rules for Information Extraction

Vijil Chenthamarakshan; Ramakrishna Varadarajan; Prasad M. Deshpande; Raghuram Krishnapuram; Knut Stolze

The visual layout of a webpage can provide valuable clues for certain types of Information Extraction (IE) tasks. In traditional rule based IE frameworks, these layout cues are mapped to rules that operate on the HTML source of the webpages. In contrast, we have developed a framework in which the rules can be specified directly at the layout level. This has many advantages, since the higher level of abstraction leads to simpler extraction rules that are largely independent of the source code of the page, and, therefore, more robust. It can also enable specification of new types of rules that are not otherwise possible. To the best of our knowledge, there is no general framework that allows declarative specification of information extraction rules based on spatial layout. Our framework is complementary to traditional text based rules framework and allows a seamless combination of spatial layout based rules with traditional text based rules. We describe the algebra that enables such a system and its efficient implementation using standard relational and text indexing features of a relational database. We demonstrate the simplicity and efficiency of this system for a task involving the extraction of software system requirements from software product pages.

ieee international conference on services computing | 2010

Measuring Compliance and Deviations in a Template-Based Service Contract Development Process

Vijil Chenthamarakshan; Rafah A. Hosn; Shajith Ikbal; Nandakishore Kambhatla; Debapriyo Majumdar; Soumitra Sarkar

Asset-based approaches, involving the use of standardized reusable components (as opposed to building custom solutions), are increasingly being adopted by IT service industries to achieve higher standardization, quality and cost reduction goals. In this paper, we address issues related to the use of an asset-based approach for authoring service contracts, where standard templates are defined for each type of service offered. The success of such an approach relies on a compliance checking system. We focus on three key components of such a system. The first measures how well actual contracts comply with the standard templates. The second analyzes compliant contracts containing moderate deviations and reports on the consistent patterns of deviations observed for each template to help identify necessary modifications required in templates to keep them up-to-date with evolving business requirements and customer needs. The third analyzes noncompliant contracts and identifies groups within them such that members of each group have enough similarity to each other to warrant consideration for development of new templates for each group. We describe the architecture of the proposed system, our experience in the use of various text analysis techniques to prototype different system components, and the lessons learned.

Explore More