Faizan Javed | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Faizan Javed is active.

Explore More

Publication

Featured researches published by Faizan Javed.

international conference on big data | 2015

Carotene: A Job Title Classification System for the Online Recruitment Domain

Faizan Javed; Qinlong Luo; Matt McNair; Ferosh Jacob; Meng Zhao; Tae Seung Kang

In the online job recruitment domain, accurate classification of jobs and resumes to occupation categories is important for matching job seekers with relevant jobs. An example of such a job title classification system is an automatic text document classification system that utilizes machine learning. Machine learning-based document classification techniques for images, text and related entities have been well researched in academia and have also been successfully applied in many industrial settings. In this paper we present Carotene, a machine learning-based semi-supervised job title classification system that is currently in production at CareerBuilder. Carotene leverages a varied collection of classification and clustering tools and techniques to tackle the challenges of designing a scalable classification system for a large taxonomy of job categories. It encompasses these techniques in a cascade classifier architecture. We first present the architecture of Carotene, which consists of a two-stage coarse and fine level classifier cascade. We compare Carotene to an early version that was based on a flat classifier architecture and also compare and contrast Carotene with a third party occupation classification system. The paper concludes by presenting experimental results on real world industrial data using both machine learning metrics and actual user experience surveys.

collaboration technologies and systems | 2014

sCooL: A system for academic institution name normalization

Ferosh Jacob; Faizan Javed; Meng Zhao; Matt McNair

Named Entity Normalization involves normalizing recognized entities to a concrete, unambiguous real world entity. Within the purview of the online job posting domain, academic institution name normalization provides a beneficial opportunity for CareerBuilder (CB). Accurate and detailed normalization of academic institutions are important to perform sophisticated labor market dynamics analysis. In this paper we present and discuss the design and the implementation of sCooL, an academic institution name normalization system designed to supplant the existing manually maintained mapping system at CB. We also discuss the specific challenges that led to the design of sCooL. sCooL leverages Wikipedia to create academic institution name mappings from a school database which is created from job applicant resumes posted on our website. The mappings created are utilized to build a database which is then used for normalization. sCooL provides the flexibility to integrate mappings collected from different curated and non-curated sources. The system is able to identify malformed data and K-12 schools from universities and colleges. We conduct an extensive comparative evaluation of the semi-automated sCooL system against the existing manual mapping implementation and show that sCooL provides better coverage with improved accuracy.

knowledge discovery and data mining | 2016

CompanyDepot: Employer Name Normalization in the Online Recruitment Industry

Qiaoling Liu; Faizan Javed; Matt McNair

Entity linking links entity mentions in text to the corresponding entities in a knowledge base (KB) and has many applications in both open domain and specific domains. For example, in the recruitment domain, linking employer names in job postings or resumes to entities in an employer KB is very important to many business applications. In this paper, we focus on this employer name normalization task, which has several unique challenges: handling employer names from both job postings and resumes, leveraging the corresponding location context, and handling name variations, irrelevant input data, and noises in the KB. We present a system called CompanyDepot which contains a machine learning based approach CompanyDepot-ML and a heuristic approach CompanyDepot-H to address these challenges in three steps: (1) searching for candidate entities based on a customized search engine for the KB; (2) ranking the candidate entities using learning-to-rank methods or heuristics; and (3) validating the top-ranked entity via binary classification or heuristics. While CompanyDepot-ML shows better extendability and flexibility, CompanyDepot-H serves as a strong baseline and useful way to collect training data for CompanyDepot-ML. The proposed system achieves 2.5%-21.4% higher coverage at the same precision level compared to an existing system used at CareerBuilder over multiple real-world datasets. Applying the system to a similar task of academic institution name normalization further shows the generalization ability of the method.

international conference on big data | 2015

A pipeline for extracting and deduplicating domain-specific knowledge bases

Mayank Kejriwal; Qiaoling Liu; Ferosh Jacob; Faizan Javed

Building a knowledge base (KB) describing domain-specific entities is an important problem in industry, examples including KBs built over companies (e.g. Dun & Bradstreet), skills (LinkedIn, CareerBuilder) and people (inome). The task involves several engineering challenges, including devising effective procedures for data extraction, aggregation and deduplication. Data extraction involves processing multiple information sources in order to extract domain-specific data instances. The extracted instances must be aggregated and deduplicated; that is, instances referring to the same underlying entity must be identified and merged. This paper describes a pipeline developed at CareerBuilder LLC for building a KB describing employers, by first extracting entities from both global, publicly available data sources (Wikipedia and Freebase) and a proprietary source (Infogroup), and then deduplicating the instances to yield an employer-specific KB. We conduct a range of pilot experiments over three independently labeled datasets sampled from the extracted KB, and comment on some lessons learned.

international conference on big data | 2016

Quantifying skill relevance to job titles

Wenjun Zhou; Yun Zhu; Faizan Javed; Mahmudur Rahman; Janani Balaji; Matt McNair

Eliminating or reducing skill gaps in the job market is critical to putting people back to work, reducing the unemployment rate, and increasing the labor market participation rate. A key element in closing the skills gap is accurately identifying the mismatch between the skills expected by employers and those possessed by job seekers. In this study, our goal was to profile job titles by effectively quantifying the relevance of skills. We started by using a naive, frequency-based skill ranking approach, which resulted in the most generic skills ranked on the top. We then adapted a number of alternative metrics and compared their performances on a number of job titles. The outcome of this study can support CareerCoach, an analytical solution CareerBuilder has piloted to provide insights and data dashboards to job seekers.

international conference on big data | 2015

WebScalding: A Framework for Big Data Web Services

Ferosh Jacob; Aaron Johnson; Faizan Javed; Meng Zhao; Matt McNair

CareerBuilder (CB) currently has 50 million active resumes and 2 million active job postings. Our team has been working to provide the most relevant jobs for job seekers and resumes for employers and recruiters. These goals often lead to Big Data problems. In this paper, we introduce WebScalding, a Big Data framework designed and developed to solve some of the common large scale data challenges at CB. The WebScalding framework raises the level of abstraction of Twitters Scalding framework to adapt to CBs unique challenges. The WebScalding framework helps users by ensuring that: 1) All internal web services are available as cascading pipe operations, 2) These pipe operations can read from our common data sources and create a pipe assembly and, 3) The pipe assembly such created can be executed in the CB Hadoop cluster as well as local machines without making any changes. We describe WebScalding using three case studies taken from actual internal projects that explain how data scientists at CB not well versed in Big Data tools and methodologies leverage WebScalding to design, implement, and test Big Data applications. We also compare the execution time of a WebScalding program with its sequential Python counterpart to illustrate the super linear speed up of WebScalding programs.

knowledge discovery and data mining | 2018

Lessons Learned from Developing and Deploying a Large-Scale Employer Name Normalization System for Online Recruitment

Qiaoling Liu; Josh Chao; Thomas Mahoney; Alan Chern; Chris Min; Faizan Javed; Valentin Jijkoun

Employer name normalization, or linking employer names in job postings or resumes to entities in an employer knowledge base (KB), is important for many downstream applications in the online recruitment domain. Key challenges for employer name normalization include handling employer names from both job postings and resumes, leveraging the corresponding location and URL context, and handling name variations and duplicates in the KB. In this paper, we describe the CompanyDepot system developed at CareerBuilder, which uses machine learning techniques to address these challenges. We discuss the main challenges and share our lessons learned in deployment, maintenance, and utilization of the system over the past two years. We also share several examples of how the system has been used in applications at CareerBuilder to deliver value to end customers.

international conference on management of data | 2018

Avatar: Large Scale Entity Resolution of Heterogeneous User Profiles

Janani Balaji; Chris Min; Faizan Javed; Yun Zhu

Entity Resolution (ER), also known as record linkage or de-duplication, has been a long-standing problem in the data management space. Though an ER system follows an established pipeline involving the Blocking -> Matching -> Clustering components, the Matching forms the core element of an ER system. At CareerBuilder, we perform de-duplication of massive datasets of people profiles collected from disparate sources with varying informational content. In this paper, we discuss the challenges of de-duplicating inherently heterogeneous data and illustrate the end-to-end process of building a functional and scalable machine learning-based matching platform. We also provide an incremental framework to enable differential ER assimilation for continuous de-duplication workflows.

Data Science and Engineering | 2018

Automatically Detecting Errors in Employer Industry Classification Using Job Postings

Alan Chern; Qiaoling Liu; Josh Chao; Mahak Goindani; Faizan Javed

AbstractIn the recruitment domain, knowing the employer industry of jobs is important to get an insight about the demand in each industry. The existing system at CareerBuilder uses an employer name normalization system and an employer knowledge base (KB) to infer the employer industry of a job. However, errors may occur during the computation of the job employer and in the construction of the employer KB with the industry attributes. Since the KB is huge, it is not possible to manually detect the errors.n Therefore, in this paper we use machine learning techniques to automatically detect the errors. With the observation that the main jobs posted by an employer often relate to the employer industry, e.g., truck driver jobs often correspond to employers in the transportation industry, we develop a system that classifies the industry of an employer using job posting data. We aggregate job postings from an employer and derive features from employer names, employer descriptions, job titles, and job descriptions to predict the industry of the employer. Two models are used for classification: (1) support vector machine and (2) random forest. Our experiments show that random forest is more effective than SVM in identifying the errors in the existing industry classification system, which achieves precision 0.69, recall 0.78, and f-score 0.73. It especially better handles mixed feature vectors when normalization errors occur. We also observe that generally our models perform better in detecting errors for industries that have higher error rates.

international conference on big data | 2015

Macau: Large-scale skill sense disambiguation in the online recruitment domain

Qinlong Luo; Meng Zhao; Faizan Javed; Ferosh Jacob

Named entity sense disambiguation is a problem with important natural language processing applications. In the online recruitment industry, normalization and recognition of occupational skills play a key role in linking the right candidate with the right job. The disambiguation of multisense skills will help improve this normalization and recognition process. In this paper we discuss an automatic large-scale system to identify and disambiguate multi-sense skills, including: (1) Feature Selection: employing word embedding to quantify the skills and their contexts into vectors; (2) Clustering: applying Markov Chain Monte Carlo (MCMC) methods to aggregate vectors into clusters that represent respective senses; (3) Large-scale: implementing parallelization to process text blobs on a large-scale; (4) Pruning: cluster cleaning by analyzing intra-cluster cosine similarities. Based on experiments on sample datasets, the MCMC-based clustering algorithm outperforms other clustering algorithms for the disambiguation problem. Also based on data-driven in-house evaluations, our disambiguation system achieves 84% precision.

Explore More