Job2Vec: Job Title Benchmarking with Collective Multi-View Representation Learning
Denghui Zhang, Junming Liu, Hengshu Zhu, Yanchi Liu, Lichen Wang, Pengyang Wang, Hui Xiong
JJob2Vec: Job Title Benchmarkingwith Collective Multi-View Representation Learning
Denghui Zhang , Junming Liu , Hengshu Zhu ∗ , Yanchi Liu Lichen Wang , Pengyang Wang , Hui Xiong , Magagement Science and Information Technology Department, Rutgers University, USA Baidu Talent Intelligent Center, Baidu Inc, China Electrical and Computer Engineering Department, Northeastern University, USA Computer Science Department, University of Central Florida, USA
ABSTRACT
Job Title Benchmarking (JTB) aims at matching job titles with simi-lar expertise levels across various companies. JTB could provide pre-cise guidance and considerable convenience for both talent recruit-ment and job seekers for position and salary calibration/prediction.Traditional JTB approaches mainly rely on manual market surveys,which is expensive and labor intensive. Recently, the rapid develop-ment of Online Professional graph has accumulated a large numberof talent career records, which provides a promising trend for data-driven solutions. However, it is still a challenging task since (1) thejob title and job transition (job-hopping) data is messy which con-tains a lot of subjective and non-standard naming conventions fora same position ( e.g. , Programmer , Software Development Engineer , SDE , Implementation Engineer ), (2) there is a large amount of miss-ing title/transition information, and (3) one talent only seeks limitednumbers of jobs which brings the incompleteness and randomnessfor modeling job transition patterns. To overcome these challenges,we aggregate all the records to construct a large-scale Job TitleBenchmarking Graph (Job-Graph), where nodes denote job titlesaffiliated with specific companies and links denote the correlationsbetween jobs. We reformulate the JTB as the task of link predictionover the Job-Graph that matched job titles should have links. Alongthis line, we propose a collective multi-view representation learningmethod (Job2Vec) by examining the Job-Graph jointly in (1) graphtopology view (the structure of relationships among job titles), (2)semantic view (semantic meaning of job descriptions), (3) job transi-tion balance view (the numbers of bidirectional transitions betweentwo similar-level jobs are close), and (4) job transition durationview (the shorter the average duration of transitions is, the moresimilar the job titles are). We fuse the multi-view representations inthe encode-decode paradigm to obtain an unified optimal represen-tations for the task of link prediction. Finally, we conduct extensiveexperiments to validate the effectiveness of our proposed method. ∗ Hui Xiong and Hengshu Zhu are corresponding authors. This work is supported byNSFC 91746301. The code is available at: https://github.com/zdh2292390/Job2Vec-Job-Title-Benchmarkingwith-Collective-Multi-View-Representation-LearningPermission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from
CIKM ’19, November 3–7, 2019, Beijing, China © 2019 Association for Computing Machinery.ACM ISBN 978-1-4503-6976-3/19/11...$15.00https://doi.org/10.1145/3357384.3357825
CCS CONCEPTS • Information systems → Information systems applications ; Webmining . KEYWORDS
Talent Intelligence, Job Title Benchmarking, Multi-view learning,Auto-encoder, Representation Learning
ACM Reference Format:
Denghui Zhang , Junming Liu , Hengshu Zhu ∗ , Yanchi Liu and LichenWang , Pengyang Wang , Hui Xiong , . 2019. Job2Vec: Job Title Bench-marking with Collective Multi-View Representation Learning. In The 28thACM International Conference on Information and Knowledge Management(CIKM ’19), November 3–7, 2019, Beijing, China.
ACM, New York, NY, USA,9 pages. https://doi.org/10.1145/3357384.3357825
Recent years have witnessed the increasing popularity of using datamining techniques for addressing human resource management(HRM) tasks ( e.g. , intelligent job-person fit and intelligent interviewassessment [14, 15]). However, few research efforts have been madeon intelligent J ob T itle B enchmarking (JTB), which aims at match-ing job titles with similar expertise levels across various companies.For both job seekers and employers, JTB is important for talentrecruitment and salary calibration. With appropriate JTB insights,employers can recruit relevant talent with the right title and salary.While for job seekers, JTB can provide guidance for their careerdevelopment. In this paper, we study the problem of JTB from thedata mining persective.Traditional JTB relies heavily on manual market surveys, whichis expensive and labor intensive. Recently, the emergence of On-line Professional graph (e.g., Linkedin) helps to accumulate a largenumber of career records, which provides an unparalleled opportu-nity for a data-driven solution. However, JTB is still a challengingtask due to the following three aspects. First, the job title and jobtransition (job-hopping) data is messy which contains a lot of sub-jective and non-standard naming conventions for the same position.For example, as shown in Figure 1, Software Engineer, SDE, Soft-ware Development Engineer, and Computer Programmer are thesame-level jobs across different IT companies. Second, there is alarge amount of missing title/transition information. Many userson Online Professional graph do not update their information intime. Too much missing information will hinder the applicabilityof data mining algorithms. Third, in individual career, one talentonly seeks limited numbers of jobs compared to the total set of a r X i v : . [ c s . A I] S e p oftware Engineer SDE SoftwareDevelopment EngineerComputer ProgrammerSubject 2Subject 1 Subject 3 Figure 1: Job transitions of different subjects across compa-nies and titles. Our approach aims to explore the multipleclues for high performance on job title benchmarking task. job titles on the job market, which brings the incompleteness andrandomness for modeling job transition patterns.To tackle these challenges, we propose to construct a J ob T itle B enchmarking G raph (Job-Graph), where nodes denote job titlesaffiliated with the specific companies and links denote the numbersof transition between job titles. We hold the assumption that thebechmarked job title pairs should have strong correlations that thereexists links between the job titles. Along this line, we reformulateJTB as the task of the link prediction over the Job-Graph.Representation learning methods achieve outstanding perfor-mances for the link prediction task [24, 30]. However, due to thethree unique properties of Job-Graph (the topology structure, richsemantic information of job titles, and job transition patterns), ex-isting representation learning methods are unable to model theseproperties at the same time. Therefore, we propose a collectivemulti-view representation learning method to learn the representa-tions of job titles for the task of link prediction.Specifically, first, we model four views of representations: (1) Graph Structure View , which refers to the topology structureof the Job-Graph that encodes the graph structure and neighbor-hood information; (2)
Semantic View , which refers to the semanticmeaning of job titles; (3)
Job Transition Balance View , which iscompliant with the observation that the numbers of bidirectionaltransitions between two similar-level jobs are close; (4)
Job Tran-sition Duration View , which reveals the fact that the shorter theaverage duration of transitions is, the more similar the job titles are.Then, to obtain an unified representation, we design a represen-tation fusion process based on the encode-decode paradigm. Themulti-view representations are fed into the associated multi-layerperceptrons which are attached behind with a representation en-semble layer to work as an encoder. The ensembled representationis dispatched to the corresponding decoder by a representationdispatching layer to reconstruct the multi-view representations.The loss between the original and reconstructed multi-view rep-resentations will be minimized to guarantee the optimal unifiedrepresentation. Moreover, we train the multi-view representationlearning procedure and the representation fusion procedure in analternative way. The loss from four views and the representation fu-sion procedure will be minimized simultaneously to generate highquality representation for job titles for the task of link prediction.
Table 1: An example of job transitions.
Job Title Company DurationProduction Engineer Square Inc 2011/7-2016/10Senior Site Reliability Engineer Google 2010/10-2011/7Architect Yahoo! 2009/7-2010/6Systems Engineer Yahoo! 2006/6-2009/7Systems Engineer IBM 2006/2-2006/6
Table 2: Statistics of Job-Graphs in IT and Finance.
Dataset IT Finance
In summary, in this paper, we propose a data-driven solutionfor the problem of JTB. Specifically, we first construct the Job TitleBenchmarking Graph based on the job transition records. Then, wereformulate the problem of JTB as the task of link prediction. Wepropose a collective multi-view representation learning method, bylearning an unified job title representation from the graph structureview, semantic view, job transition balance view, and job transitionduration view. Finally, we conduct extensive experiment to evaluateour proposed method over the real-world dataset. The promisingresults validate the effecitveness of our proposed method.
In this section, we first briefly introduce the real-world dataset wecollected for JTB task. Then, we introduce some essential definitions.Followed the definitions, we propose our problem statement. Finally,we present an overview of the propposed framework.
In this study, we analyze real-world talent job title transition data,collected from a major commercial Online Professional Network.The data includes two main categories, IT-related and Finance-related job titles. Table 2 shows Job-Graphs constructed from thesetwo datasets are very sparse.Table 1 presents an example of job transition records from anindividual talent. Each line consists of a job title, company nameand the duration holding this position.
Here, we introduce some essential definitions, which will be usedthroughout this paper.
Definition 2.1.
Job Title Bechmarking (JTB)
JTB is a processthat matches job titles with similar expertise levels across vari-ous companies. Formally, given two job title-company pair, ( Title i , Company i ) and ( Title j , Company j ) , the objective is determine whetherthe given paired job titles are on the same level. JTB could pro-vide precise guidance and considerable convenience for both tal-ent recruitment and job seekers for position and salary calibra-tion/prediction. Definition 2.2.
Job Title Transition Graph (Job-Graph).
Job-Graph is defined as a directed graph G = ( V , E ) , where eachnode v i ∈ | V | represents a job title affiliated a company(i.e., v i = ( Title i , Company i ) ), and each link e ij ∈ | E | between two nodes v i WE-AzureSWE-WinXPSWE-AndroidTacticalSourcingBuyer SourcingBuyerUnileverSourcing Buyer
FSOSSWESourcingBuyer(a) Individual Career Profiles (b) Job titles aggregation (d) Collective Multi-view Representation Learning(c) Snapshot on Job-Graph
SemanticTransition BalanceTransition DurationNetwork Multi-viewFusionTest Engineer Software EngineerProjectLeader SystemEngineerTechnical LeadSenior Technical Staff SeniorStaff Test Engineer Software EngineerProjectLeaderTechnical LeadSeniorStaff (e) Link prediction on Job-Graph
Single-view Learning
Figure 2: Overview of our job title benchmarking framework. and v j indicates that there exist job transitions from the ( Title i , Company i ) to ( Title j , Company j ) , and the weight of edge e ij repre-sents the number of transitions observed from ( Title i , Company i ) to ( Title j , Company j ) In this paper, we study the problem of Job Title Benchmarking(JTB). We first construct job title transition graph to depict thejob transition patterns. We formulate the JTB as the the task oflink prediction over the Job-graph, based on the assumption thatsimilar-level job titles should have strong correlations to enablea link between them. To enable the link prediction task, we pushforward the problem formulation to the representation learningover the Job-Graph to learn unified and optimal representations forjob titles.Formally, given the Job-Graph G = ( V , E ) , we aim to find amapping function f : v → z that takes node (job title) v as theinput, and outputs the vectorized representation z of the job titles,while preserving properties of the Job-Graph and job transitionpatterns. The generated node (job title) representation v is thenutilized to solve the problem of link prediction. Figure 2 shows an overview of our proposed framework that in-cludes the following essential tasks: (i) constructing the job titletransition graph; (ii) developing a collective multi-view represen-tation learning method for learning job title representations; (iii)applying learned job title representations for link prediction onJob-Graph. In the first task, given job transition records of talents,we construct a job title transition graph. In the second task, a col-lective multi-view representation learning method is developed forjointly modeling the graph structure, semantic meaning of job titlesand job transition patterns. In the last task, we apply our proposedmethod to learn job title representations for link prediction onJob-Graph to benchmark job titles.
In this section, we show how to construct the Job-Graph. Intuitively,the Job-Graph can be directly constructed from the raw job tranis-tion record data. However, messy, noisy and non-standard nameconvention of job titles makes the Job-Graph extremely sparse andredundant, which hinders the further analysis. Therefore, we refinethe Job-Graph into an applicable fashion in the following steps: (1) Extract job transitions from the raw career records; (2) Mapand aggregate all the transitions into the Job-Graph, where eachnode represents a job title, each edge represents the number oftransitions between the nodes.
As shown in Table 1, each transition consists of a source job anddestination job, i.e., ( job src , job des ) . We first set all the job titlesexisted in the raw data as nodes. Then we sum the transition fre-quencies as the link from job src to job des . The constructed graphis set as the base graph for the further refinement. Generally speaking, job titles usually consist of three parts:(1)
Title level , such as
Senior , Principle and
Director .(2)
Title core function , such as Software Engineer, ProductManager.(3)
Unique additional information , such as Software Engi-neer in Bid Data , Sales Rep on Small and Medium businesses .After we study the word frequencies of job titles, there is an inter-esting observation that the word frequency distribution subjects to power law distribution , as shown in Figure 3. It also can be observedthat noise words or those additional user-related words usuallyappear in the long tail. We also show the top 10 frequent wordsand bottom 10 frequent words. It is obvious that the most frequentwords like manager and engineer usually describe the core functionof job titles, while the less frequent words looks more like user’sunique information. With this observation, we aggregate job titlesby filtering out low frequency words in job titles and thus get anormalized and denser Job-Graph. Specifically, in this work, wefiltered out the words that have a frequency lower than 30.Table 3 shows three real examples of aggregated job titles byfiltering out low frequency words. Words in bold indicate the low-frequency words. For example, "Tactical Sourcing Buyer (Unilever)" and "Sourcing Buyer, MARCOM & FSOS" are originally thought to bedifferent, after filtering low-frequency words, they are aggregatedto the same title "Sourcing Buyer" . To be noted, the reason we usefiltering low-frequency words instead of using cluster algorithmto cluster the job titles is that there are no standard target classesfor job titles, and it is hard to decide the number of clusters for jobtitles if cluster algorithm is applied.With this aggregated Job-Graph, we can obtain some concisejob title matching insights like: A Senior Software Engineer of able 3: Examples of aggregating job titles.
Original Job Titles Aggregated Job Title
Tactical
Sourcing Buyer (Unilever)
Sourcing BuyerSourcing Buyer,
MARCOM & FSOS
Software Design Engineer- (Azure)
Software Design EngineerSoftware Design Engineer-
WindowsXP
Software Design Engineer- (Contracting) EncartaCyber
Security Architect Security ArchitectSecurity Architect
LinkedIn can match a Software Engineer of Google since mostSenior Software Engineers of LinkedIn obtained the title of Soft-ware Engineer when they just made a career transition to Google.However, the sparsity issue of Job-Graph still limits the perfor-mance of traditional representation learning method. Therefore, innext section, we introduce our collective multi-view representationlearning method, Job2Vec, and show how to perform link predictionto enrich Job-Graph based on the proposed method.
In this section, we introduce our collective multi-view representa-tion learning methods fore job titles.
We learn representations of job titles with the following intuitions.
Intuition 1: Topology Structure Preservation.
Job-Graph isbuilt to depict job transitions on the job market. The topology struc-ture of Job-Graph reveal the connectivity and neighbor informationof job titles which can help to describe the latent structures amongjob titles. We should preserve the topology structure of job-graphin the representation learning.
Intuition 2: Semantics Preservation.
Job titles contain richsemantic descriptions which can further enhance the quality of jobtitle representation. Therefore, we should preserve the semanticmeanings of job titles.
Intuition 3: Job Transition Patterns Preservation.
Job tran-sitions have unique latent patterns. Transitions among different jobtitle pairs are also different. Consequently, we should preserve thejob transition patterns.Therefore, we model the job title representations in multi views.Specifically, we introduce graph topology view for Intuition 1, se-mantic view for Intuition 2, job transition balance view and jobtransition duration view for Intuition 3. In addition, we propose acollective method to fuse the multi-view representations into anunified representation. We introduce the details as follows.
Job graph structure explicitly illustrates the similarity and corre-lations between different titles. It is the most crucial and effectiveinformation which provides comprehensive and accurate title con-nections. However, the topology information is hidden in the graph
ManagerEngineerSoftwareSeniorSales BusinessLeadSpecialistDevelopmentProgram
Top 10 words
ParalegalGpsEspecialistaTamAdoption LumiaSaCustomAtlanticMobilefirst
Last 10 words ×10 $ Figure 3: Words frequency distribution. structure. Motivated by the success of graph representation learn-ing methods in link prediction on social graph and knowledgegraph [1, 4, 7, 19, 29], the first view we use in Job2vec is the
GraphTopology view, which can encode the graph structure and neigh-borhood information into the representations. In Graph Topologyview, we aim to learn a low-dimensional representation for eachjob title which can keep the neighborhood structure information, i.e. , job titles that share similar neighbors in Job-Graph should beclose to each other in the graph view representation space.To achieve this, we first assign each job title v i into two repre-sentation vectors, "self representation", e i ∈ R N д , and "neighborrepresentation", e ′ i ∈ R N д , where N д is dimension of the represen-tation vector of Graph Topology View. Both e i and e ′ i are randomlyinitialized. We utilize its "self representation" while "neighbor repre-sentation" will be used if v i is just the neighbor of the one we focuson. Then, in order to enforce embeddings to be close to each otherif they share similar graph neighbors, we define the loss function O N as followed: O N = − (cid:213) ( i , j )∈ E w ij loд ( p ( v j | v i )) , (1)where w ij is the weight between v i and v j , E is the set of all edgesin Job-Graph, to incorporate high-order proximity, we extend E by adding edges of k-steps paths into the set. When k = O N is the same as the second-order proximity in LINE. p ( v j | v i ) is theprobability of v j occured as neighbor given v i , defined as a softmaxfunction followed: p ( v j | v i ) = exp ((cid:174) e ′ Tj · (cid:174) e i ) (cid:205) | V | k = exp ((cid:174) e ′ Tk · , (cid:174) e i ) , (2)where v i is the current job title we focus on, v j is the neighbors of v i , e i is the "self representation" of v i , e ′ j is the "neighbor represen-tation" of v j , V is the set of all job titles, i.e. , all nodes in Job-Graph, | V | is the cardinal number of V . Minimizing O N equals to maximiz-ing the conditional probability of v j given v i . Since the conditionalprobability p ( v j | v i ) is parameterized by e ′ j and e i , as a result, the"self representation" of those job titles that share similar neighborswill be similar, i.e. , close in the graph view space. To be noted, intesting stage, we utilizes the "self representation" of each job titleto calculate the similarity score. Normally, each job title is consisted of several key words whichdescribe the basic function and duty of this job ( e.g. , Project Man-ager and Computer Engineer). Therefore, the semantic informationontained in these keywords is crucial to be explored in the repre-sentation learning process for two reasons: (1) Talents tend to maketransition between functionally similar jobs, and semantic infor-mation guides the model to learn a better representation to tacklethe complex job transition pattern. (2) The shared key words in jobtitles could further connect them even though the Job TransitionGraph is extremely sparse, thus can alleviate the sparsity issue andimprove the prediction capability of the learned representation.We consider this view as the semantic view of Job TransitionGraph. In semantic view, we aim to learn a low-dimensional repre-sentation of each job title which can keep the semantic information, i.e. , job titles that have similar key words should be close to eachother in the semantic view representation space. To achieve this,we first assign each job title v i a vector s i ∈ R N s , and each word w j in the Job-Graph vocabulary a vector s ′ j ∈ R N s which are randomlyinitialized. N s is dimension of the representation vector of SemanticView. Then, we enforce s i to be close to each other if they sharesimilar words, based on the loss function O S defined as followed: O S = − (cid:213) w j ∈ v i f ij loд ( p ( w j | v i )) , (3)where f ij is the frequency of the word w j occurred in v i , v i is ajob title, w j is the words in v i , p ( w j | v i ) is the probability of w j occurred in v i , defined as a softmax function: p ( w j | v i ) = exp ((cid:174) s ′ Tj · (cid:174) s i ) (cid:205) | W | k = exp ((cid:174) s ′ Tk · (cid:174) s i ) , (4)where W is the vocabulary set of Job-Graph, s i is the semanticrepresentation of job title v i , s ′ j is the semantic representation ofword w j . Minimizing O S equals to maximizing the conditionalprobability of w j given v i , as a result, job titles v i and v j will havesimilar representations s i and s j if they are similar. The numbers of bidirectional transitions between two similar-leveljobs are close. For example, software engineer in Apple is on thesame level as the SDE in Facebook. The transition number fromsoftware engineer (Apple) to the SDE (Facebook), and the transitionnumber from the SDE (Facebook) to software engineer (Apple)should be close. However, for two different-level jobs, like juniorsoftware engineer and senior software engineer, the transitionusually is in one direction, from the junior to the senior one. Inother words, the transition number for two-directions will be verydifferent. To this end, we further consider Job Transition Balanceas an important factors for JTB, which effectively indicates thematches of each pair of titles. The intuition of Transition Balanceis that if comparable amounts of talent transitions can be found inboth direction between two job titles, then these two job titles arehighly likely to be in the same level. To model Transition Balance,we first assign each job title v i a vector b i ∈ R N b which is randomlyinitialized. N b is the dimension of the representation vector ofTransition Balance View. Then given two job titles v i and v j , wedefine the Transition Balance (TB) between them as: T B ( v i , v j ) = exp (− | w ij − w ji | w ij w ji ) , (5) ViewEnsemble
F G
ViewDispatchFusedRepresentation
Network View
Software Engineer - Oracle
Software Engineer - Google
Semantic ViewTransition Balance View86
Multi-viewFusion Encoder
Transition Duration View
Multi-viewFusion Decoder
Figure 4: Collective Multi-View Representation Learning where w ij is the weight of the edge from v i to v j . Then, based onthe loss function O B we enforce b i to be similar to each other ifthey have balanced transitions between each other. O B is definedas followed: O B = − (cid:213) ( i , j )∈ v i T B ( v i , v j ) loд ( p ( v i , v j )) . (6)where p ( v i , v j ) is the joint probability of v i and v j defined as: p ( v i , v j ) = + exp (−(cid:174) b Ti · (cid:174) b j ) . (7)Minimizing O B will "drag" the representation vectors of those"balanced" job title pair to be close in the representation space. Most people require a relatively long time ( e.g. , one or several years)to get a promotion. In contrast, if a person can change his/her jobsquickly and frequently, then there are high possibilities that thesejobs are similar titles requires similar expertise and working expe-rience. Make a long story short, the shorter the average durationof transitions is, the more similar the job titles are. To this end, wedefine Job Transition Duration which is the average duration timebetween two job titles. To include Transition Duration propertyinto our model, we first assign each job title v i a vector d i ∈ R N d which are randomly initialized. N d is dimension of the represen-tation vector of Transition Duration View. Given two job titles v i and v j , we define the Transition Duration (TD) between them as: T D ( v i , v j ) = exp (− t ij ) , (8)where t ij is the average duration time from v i to v j . Then wedesigned a loss function O D , which enforces d i and d j to be closerto each other if the average transition time between them is short, i.e. , it is easy to transit from v i to v j . O D is defined as: O D = − (cid:213) ( i , j )∈ v i T D ( v i , v j ) loд ( p ( v i , v j )) , (9)where p ( v i , v j ) is the joint probability of v i and v j defined as: p ( v i , v j ) = + exp (− (cid:174) d ⊤ i · (cid:174) d j ) . (10)inimizing O D will "drag" the representation vectors of those"easily transited" job title pair to be close in the representationspace, which hopefully further improve the learning performance. Each views provides a comprehensive and unique aspect acrossdifferent titles, and there are more informative and sophisticated cor-relation knowledge residing across different views. While, naivelycombining all the views together cannot efficiently utilize the infor-mation, or even hurt the performance due to the dramatic differentscales and formats of the views.To this end, we propose a Collective Multi-View Auto-Encoder(CMVAE) framework to compress the multiple representations intoa single denser representation. As shown in Figure 4, the four differ-ent representations obtained by learning from the above mentionedobjectives are feed into the CMVAE. To avoid losing informationfrom all the representations, we directly concatenate the represen-tations and feed them into the Fusion Encoder
F (·) . F (·) consistsof two fully-connected layers and outputs a single and denser rep-resentation. Then this intermediate representation is feed to theFusion Decoder
G(·) which is also a two fully-connected layer neu-ral network and outputs the restored representation. The objectivefunction of CMVAE is shown below: L = N N (cid:213) i = ∥ X i − G(F ( X i ))∥ , (11)where N is the number of the training samples. X i = [ e i ; s i ; b i ; d i ] isthe ensembled multi-view representation for a job title v i , G(F ( X i )) is the restored representation. Minimizing the difference betweenthe raw representations X i and restored representations G(F ( X i )) will enforce the model to learn a denser and unified representa-tion F ( X i ) . CMVAE hopefully captures the distinctive aspects fromdifferent views and further reveals the latent correlations acrossthe views. We jointly optimize CMVAE associate with the otherindividual-view representation graphs. This jointly training strat-egy could let each graph assistant other graphs and further enhancethe learning performance. Finally, we use F ( X i ) as the fused multi-view representation for subsequent link prediction task. This section details our empirical evaluation of the proposed methodon real-world data.
Table 4 presents the statistics of our data sets from InformationTechnology industry and Finance industry. Here we provide moredetails about our real-world data as follows:
IT Data.
To construct IT data, we randomly sampled one millionuser career records who have been working at several well-knownIT companies in US. For ease of analysis, we chose 15 most fa-mous and leading companies in IT and only keep transition recordsof these companies. The 15 companies include
Facebook, Google,Amazon, Microsoft, Apple, IBM, LinkedIn, Cisco, Oracle, Airbnb, Uber,Yahoo, Nokia, Apple, Intel, HP.
Then we used the methods mentionedand Section 3 to construct the Job-Graph and aggregated it. Finally
Table 4: Statistic Details of the DatasetIT Finance train test train test
Finance Data.
For finance data, we randomly sampled one millionrecords according to the rule: whose records contains finance re-lated keywords such as
Finance, Asset Manager, Financial ResearchAnalyst, Investment Banking Analyst, Equity Research Analyst, TrustOfficer, Commercial Banker, etc.
Then again we used the methodsmentioned in Section 3 to construct the Job-Graph and aggregatedit. Finally we got the Finance Job-Graph shown in Table 4. The timespan of the data is from 03/20/2004 to 12/01/2018.
We compare our proposed method with the following representativebaselines of representation learning:
Deepwalk [13]:
DeepWalk adopted a truncated random walk ona graph to generate a set of walk sequences and train Skip-Gramon these sequences. It only considers graph topology.
Node2Vec [2]:
Node2Vec generalizes DeepWalk by defining a moreflexible notion of a node’s graph neighborhood. It only considersgraph topology.
LINE(1st order) [17]:
In LINE, first-order and second-order prox-imity are modeled by the joint probability distribution between twonodes and the similarity between their neighborhood respectively.LINE(1st order) only keeps first-order proximity. It only considersgraph topology.
LINE(1st+2nd order):
This is the full model of LINE. It keeps bothfirst-order and second-order proximity. It only considers graphtopology.
Word2Vec[9]:
Word2Vec only applies semantic view. Specifically,we treat each job title as a sentence, then train Word2Vec on all thejob titles on Job-Graph. Finally we get the embedding vector of ajob title by averaging the vectors of the words in it.
Job2Vec:
The model proposed in this paper which considers fourcrucial aspects (graph topology, semantic, transition balance, tran-sition duration) of the Job-Graph.
Metrics:
We use two metrics, namely,
MRR and MP @ K , to evaluatethe link prediction performance. • For each test i , the correct answer job title is identified at positionrank [ i ] for closest job titles. The Mean Reciprocal Rank (MRR) is MRR = N N (cid:213) i = [ i ] . (12) able 5: Link Prediction Performance Comparison. IT Dataset Finance Dataset
MRR MP @5 MP @10 MP @15 MP @20 MRR MP @5 MP @10 MP @15 MP @20DeepWalk 0.0688 0.0858 0.1070 0.1198 0.1293 0.1044 0.1164 0.1444 0.1600 0.1675Node2Vec 0.0645 0.0785 0.0925 0.1042 0.1153 0.0979 0.1065 0.1235 0.1335 0.1460Line(1st order) 0.0644 0.0752 0.0947 0.1081 0.1198 0.0983 0.1071 0.1245 0.1341 0.1428Line(1st+2nd order) 0.0651 0.0791 0.0958 0.1064 0.1125 0.0943 0.0994 0.1150 0.1274 0.1381Word2Vec 0.1295 0.2135 0.2792 0.3194 0.3334 0.1110 0.1239 0.1452 0.1643 0.1775Job2Vec Higher MRR means that correct answers appear more closelywith the query job title. • Additionally, for test i consisting of a query job title and target jobtitle pair, consider the closest K job titles to the query embedding.If the correct target job title to the query job title is among these K titles, then the Precision@K for test i (denoted P@K[ i ]) is 1;else, it is 0. Then the Mean Precision@K is defined as MP @ K = N N (cid:213) i = ( P @ K [ i ]) . (13)Higher precision indicates a better ability to acquire correct an-swers using close embeddings. We performed link prediction on both IT and Finance datasets, i.e.,predicting missing links on Job-Graph. Since edges with a largerweight in Job-Graph indicates a better match between job titles, wekept edges that have weight larger than a threshold (here is 5) fortraining and also try to predict links that have weight larger than5 (predicted links if have weights lower than 5 will be consideredas wrong). Then we randomly split the Job-Graph links into 10equal parts, 8 of them as training set, 1 as validation set and 1 astesting set, no data in validation set and testing set can be used fortraining the embeddings. To avoid "cold start", we only kept jobtitles that have occurred in training data. We trained all the baselinemodels and our Job2Vec on the training set to get the embeddings, toavoid overfitting we tuned parameters on the validation set, finallywe predict links on the testing set using the learned embeddings.Given a job title job i , to predict which job may have links with it,we calculated the cosine similarity score between the embeddingsof job i and the rest of other jobs, then ranked them based on thesimilarity score. Higher ranked jobs have a higher probability to bematched with job i .We obtained the best hyper parameters of our model on the vali-dation set. The dimensions of the four views’ representations are N д , N s , N b , N d = MRR and MP @ K . Word2Vec(averaging word vec-tors in job title) achieves nearly 100% improvements on MRR and on (a) DeepWalk (b) Job2Vec
Figure 5: Robustness Comparison MP @ K compared with DeepWalk, thus proves that semantic viewis tremendously helpful for link prediction on Job-Graph. Job2Vecachieves further improvements on MRR compared with Word2Vec,which shows the effectiveness of preserving more views than onlysemantic view. Job2Vec achieves the best performance on
MRR and MP @5, MP @10, MP @15, MP @20 among all the models. Specif-ically, it improves 200% over DeepWalk and 50% over Word2Vec.This confirms the superiority of the multi-view representation andthe effectiveness of the encode-decode paradigm for fusing multipleviews. From the results on the Finance dataset, similar conclusionscan be drawn. In this subsection, we explore the robustness of different modelsagainst the sparsity of job transition graph. Specifically, we makethe original graph sparser by subsampling the training edges atdifferent rates r = { . , . , . , . } (only keep 90%, 80%, 70%, 60%edges). Then we retrain our model and baseline models on thesubsampled graph and compare the link prediction performancedegradation. Here we use IT dataset as an example. We compareour model Job2Vec with DeepWalk for conciseness. Job2Vec takefour views (graph, semantic, transition balance and transition du-ration view) into account while DeepWalk only considers graphview. From Figure 5, we can observe that the performance of Deep-Walk degrades sharply as r decreases while Job2Vec seems to holdsteady. This again confirms the effectiveness of incorporating moreviews especially semantic view over the sparsity issue of job tran-sition graph. This can also be well explained: when the graph issparse, many nodes in graph have poor connectivity, existing graphembedding models can not learn sufficient representations fromgraph view. But in semantic view, shared key words can connectdifferent job titles in Job-Graph even though when it is sparse.Learning feasible representations from semantic view does not relyon graph connectivity and thus can perform excellent performanceover sparse graph. able 6: Job Title Benchmarking Cases. IT Project Manager Product Manager PC Accessories-IBM - MicrosoftIT Support Lead and System Trainer IT Support Lead- HP - IBMSWE - Google Machine Learning Engineer -AirbnbSoftware Engineer Data Scientist- Facebook - Microsoft
Finance
Investment Banking Analyst Investment Banking Analyst- Citi - J.P. MorganEquity Research Analyst Equity Research Analyst- Nomura - Goldman SachsFinancial Analyst Financial Analyst- Goldman Sachs - Rushmark PropertiesPortfolio Manager Portfolio Manager- WellsFargo - ReMark Capital Group
We visualize the learned representations in Figure 6 to show thepromising benchmarking results of our proposed model. For conve-nience, we select four categories of job title representations, includ-ing engineer, sales, consultant and manager. We randomly sampled1000 job titles for each categories. We utilize t-SNE [18] to reducethe representation dimensions to do the visualization. Each colorcorresponds to one category of job titles.Figure 6 shows that our proposed Job2Vec achieves the bestresults. Each category of representations learned by our model canbe clustered into four groups very well. In another word, job titlesare benchmarked by our proposed model effectively. However, therepresentations learned by baselines are distributed randomly inthe space which cannot reveal the becnmarking relations among jobtitles. The potential explanation is that our proposed method jointlymodel four views to preserve the topology structure, semantics andjob transition patterns, which plays an essential role in the task ofjob title benchmarking.
In this subsection, we show some Job Title Benchmarking (JTB)results extracted from existing Job Transition Graph as well as somelink prediction JTB results generated by our model and baselinemodels.Table 6 shows eight JTB cases extracted from the aggregatedJob-Graph, apparently job titles that have similar responsibilitiesand expertise level while also from companies of the same vol-ume are matched. It is interesting that JTB can find some match-ing pairs that are not similar literally, such as (Software Engineer,Data Scientist), (SWE, Machine Learning Engineer). Table 7 showsthe Top 3 link prediction results of different models for "SoftwareEngineer-Facebook". Titles in bold font are wrong predictions. Itcan be observed that DeepWalk tends to make predictions that havesimilar neighborhood structure in Job-Graph, while Word2Vec in-clines to job titles that are semantically similar. With only graphtopology view, DeepWalk may make anomalous predictions such as"Product Manager-Microsoft". With only semantic view, Word2Vecare likely to predict repetitive job titles and miss some interestingmatching pairs that are not very similar literally, such as (Software (a) Deepwalk (b) word2ec(c) LINE-1st (d) LINE-1+2(e) node2vec (f) Job2Vec (Ours)
Figure 6: Visualization of the learned representations
Engineer-Facebook, SDE-Microsoft). Our model, Job2Vec, incorpo-rating graph topology, semantic, transition balance and transitionduration views, makes more reasonable predictions.
In this section, we review two categories of literatures that arerelated to this paper, namely research on data mining for careertrajectory analysis, and research on graph embedding.
Data Mining for Career Trajectory Analysis.
With the rise ofOnline Professional graphs (OPNs), Career Trajectory Analysishave been proved useful in many human resource management(HRM) problems[5, 28]. For example, Xu et al. build a organiza-tional level job transition graph from OPN data and proposed atalent circle detection method to identify the right talent sources forrecruitment[27]. For better assessing the expertise level or rank ofa job title, Xu et al. proposed a Gaussian Bayesian graph to extractthe job title hierarchy of an organization from employees’ careertrajectory data[26] . In [6], a contextual LSTM model is proposed toaccurately predict an employee’s next career move. However, fewexisting works pay attention to the problem of Job Title Benchmark-ing(JTB), which has broad application prospect in human resourcemanagement. To the best of our knowledge, we are the first toextract JTB insights from job transition graph.
Graph Representation learning graph representation learningassigns nodes in a graph to low-dimensional representations andeffectively preserves the graph structure. Recently, a significant able 7: Top 3 Results of Link Prediction Comparison.
Top 3 resultsSoftware Engineer-Facebook Job2Vec SDE-Microsoft, Software Development Engineer-Amazon, Software Developer-GoogleDeepWalk Software Developer-Google,
Product Manager-Microsoft , Software Test Engineer-GoogleWord2Vec
Senior Software Engineer-Facebook , Software Engineer-IBM, Software Engineer-AppleProject Manager-IBM Job2Vec Product Manager-Microsoft, Project Manager-Microsoft, Program Manager-IBMDeepWalk Product Manager-Microsoft,
Senior Consultant-IBM , Storage Administrator-IBM
Word2Vec
Project Manager Lead-IBM , Advisory Project Manager-IBM, Project Manager-Microsoft amount of progresses have been made toward this emerging graphanalysis paradigm[2, 4, 12, 16, 17, 20, 21]. Inspired by the successof representation learning in natural language processing [3, 8, 9],DeepWalk[13] is the first extension of Word2Vec to graph analysis.It uses random walks to sample paths from a graph and treat pathsas "sentences" to train a SkipGram model to keep graph Proximityinto learned embeddings. Node2Vec [2] generalizes DeepWalk bydesigning a more flexible random walk strategy. LINE[17] proposesfirst-order and second-order proximity to keep graph propertiesand combines both by concatenating first-order and second-ordervectors. To better model the asymmetric property of graphs, Ouet al. proposes asymmetric proximity preserving (APP) graph em-bedding [12]. However, in our problem setting, we need to jointlymodel multi-view representations, and obtain an unified representa-tion fused from multi-view. Current graph representation learningcannot be directly applied into the JTB scenario. To the best of ourknowledge, our work is the first attempt to solve the JTB problemvia multi-view graph representation learning.
Multi-View Representation learning has become attractive andurgent as the increasing multi-modal sensors are widely deployed ina great number of real-world applications [25]. It explores the com-plementary information among different views, where the views re-fer to various feature representations, modalities or sensors. Most ofthe approaches focus on the multi-view data including features [11],images [10], and videos [22, 23], while our approach reveal the in-formation from multiple graph structure data which is challenging.
In this paper, we propose a data-driven solution for the problem ofjob title benchmarking (JTB). We construct the job title transitiongraphs (Job-Graph) to represent job transitions, and reformulate theJTB problem as the task of link prediction over the Job-Graph. Specif-ically, we propose a collective multi-view representation learningmethod by jointly learning the four views of representations, in-cluding graph topology view, semantic view, job transition balanceview, and job transition duration view. Besides, we also propose aencode-decode based fusion method to obtain an unified represen-tation from the multi-view representations. We present intensiveexperimental results with IT and finance related job transition datato demonstrate the effectiveness of our method.
REFERENCES [1] Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, andThomas S Huang. 2015. Heterogeneous network embedding via deep archi-tectures. In
Proc. ACM SIGKDD . 119–128.[2] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning fornetworks. In
Proc. ACM SIGKDD . ACM, 855–864.[3] Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences anddocuments. In
Proc. ICML . 1188–1196.[4] Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010. Signed networksin social media. In
Proc. ACM SIGCHI . ACM, 1361–1370. [5] Huayu Li, Yong Ge, Hengshu Zhu, Hui Xiong, and Hongke Zhao. 2017. Prospect-ing the career development of talents: A survival analysis perspective. In
Proc.ACM SIGKDD . 917–925.[6] Liangyue Li, How Jing, Hanghang Tong, Jaewon Yang, Qi He, and Bee-ChungChen. 2017. Nemo: Next career move prediction with contextual embedding. In
Proc. International Conference on World Wide Web Companion . 505–513.[7] Manling Li, Denghui Zhang, Yantao Jia, Yuanzhuo none Wang, and Xueqi Cheng.2018. Link Prediction in Knowledge Graphs: A Hierarchy-Constrained Approach.
IEEE Trans. on Big Data (2018).[8] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficientestimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).[9] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.Distributed representations of words and phrases and their compositionality. In
Proc. NIPS . 3111–3119.[10] Feiping Nie, Guohao Cai, and Xuelong Li. 2017. Multi-view clustering and semi-supervised classification with adaptive neighbours. In
Proc. AAAI .[11] Feiping Nie, Jing Li, Xuelong Li, et al. 2016. Parameter-Free Auto-WeightedMultiple Graph Learning: A Framework for Multiview Clustering and Semi-Supervised Classification.. In
Proc. IJCAI . 1881–1887.[12] Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 2016. Asymmet-ric transitivity preserving graph embedding. In
Proc. ACM SIGKDD . 1105–1114.[13] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learningof social representations. In
Proc. ACM SIGKDD . 701–710.[14] Chuan Qin, Hengshu Zhu, Tong Xu, Chen Zhu, Liang Jiang, Enhong Chen, andHui Xiong. 2018. Enhancing person-job fit for talent recruitment: An ability-aware neural network approach. In
Proc. ACM SIGIR . 25–34.[15] Dazhong Shen, Hengshu Zhu, Chen Zhu, Tong Xu, Chao Ma, and Hui Xiong.2018. A Joint Learning Approach to Intelligent Job Interview Assessment.. In
Proc. IJCAI . 3542–3548.[16] Han Hee Song, Tae Won Cho, Vacha Dave, Yin Zhang, and Lili Qiu. 2009. Scalableproximity estimation and link prediction in online social networks. In
Proc. ACMSIGCOMM . 322–335.[17] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015.Line: Large-scale information network embedding. In
Proc. WWW . InternationalWorld Wide Web Conferences Steering Committee, 1067–1077.[18] Laurens Van Der Maaten. 2014. Accelerating t-SNE using tree-based algorithms.
JMLR
15, 1 (2014), 3221–3245.[19] Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embed-ding. In
Proc. ACM SIGKDD . 1225–1234.[20] Lichen Wang, Zhengming Ding, and Yun Fu. 2018. Learning Transferable Sub-space for Human Motion Segmentation.[21] Lichen Wang, Zhengming Ding, and Yun Fu. 2019. Low-Rank Transfer HumanMotion Segmentation.
IEEE TIP
28, 2 (2019), 1023–1034.[22] Lichen Wang, Zhengming Ding, Zhiqiang Tao, Yunyu Liu, and Yun Fu. 2019.Generative Multi-View Human Action Recognition. In
Proc. IEEE ICCV .[23] Lichen Wang, Bin Sun, Joseph Robinson, Taotao Jing, and Yun Fu. 2019. EV-Action: Electromyography-Vision Multi-Modal Action Dataset. arXiv preprintarXiv:1904.12602 (2019).[24] Zhitao Wang, Chengyao Chen, and Wenjie Li. 2017. Predictive network repre-sentation learning for link prediction. In
Proc. ACM SIGIR . 969–972.[25] Chang Xu, Dacheng Tao, and Chao Xu. 2013. A survey on multi-view learning. arXiv preprint arXiv:1304.5634 (2013).[26] Huang Xu, Zhiwen Yu, Bin Guo, Mingfei Teng, and Hui Xiong. 2018. ExtractingJob Title Hierarchy from Career Trajectories: A Bayesian Perspective.. In
Proc.IJCAI . 3599–3605.[27] Huang Xu, Zhiwen Yu, Jingyuan Yang, Hui Xiong, and Hengshu Zhu. 2016. Talentcircle detection in job transition networks. In
Proc. ACM SIGKDD . 655–664.[28] Huang Xu, Zhiwen Yu, Jingyuan Yang, Hui Xiong, and Hengshu Zhu. 2018.Dynamic Talent Flow Analysis with Deep Sequence Prediction Modeling.
IEEETKDE (2018).[29] Denghui Zhang, Manling Li, Yantao Jia, Yuanzhuo Wang, and Xueqi Cheng. 2017.Efficient parallel translating embedding for knowledge graphs. arXiv preprintarXiv:1703.10316 (2017).[30] Jiawei Zhang, Congying Xia, Chenwei Zhang, Limeng Cui, Yanjie Fu, and S YuPhilip. 2017. BL-MNE: emerging heterogeneous social network embeddingthrough broad learning with aligned autoencoder. In