CER: Complementary Entity Recognition via Knowledge Expansion on Large Unlabeled Product Reviews
CCER: Complementary Entity Recognition via Knowledge Expansionon Large Unlabeled Product Reviews
Hu Xu ∗ , Sihong Xie † , Lei Shu ∗ , Philip S. Yu ∗‡∗ Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA † Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA ‡ Institute for Data Science, Tsinghua University, Beijing, [email protected], [email protected], [email protected], [email protected]
Abstract —Product reviews contain a lot of useful informationabout product features and customer opinions. One importantproduct feature is the complementary entity (products) thatmay potentially work together with the reviewed product.Knowing complementary entities of the reviewed product isvery important because customers want to buy compatibleproducts and avoid incompatible ones. In this paper, weaddress the problem of Complementary Entity Recognition(CER). Since no existing method can solve this problem, wefirst propose a novel unsupervised method to utilize syntacticdependency paths to recognize complementary entities. Thenwe expand category-level domain knowledge about complemen-tary entities using only a few general seed verbs on a largeamount of unlabeled reviews. The domain knowledge helpsthe unsupervised method to adapt to different products andgreatly improves the precision of the CER task. The advantageof the proposed method is that it does not require any labeleddata for training. We conducted experiments on 7 popularproducts with about 1200 reviews in total to demonstrate thatthe proposed approach is effective.
Keywords -Entity Recognition; Relation Extraction; ProductRelation; Complementary Entity; Complementary Product
I. I
NTRODUCTION
E-commerce websites (e.g., Amazon.com) contain a hugeamount of products reviews and most existing works ofsentiment analysis [1] (or opinion mining) on reviews focuson extracting opinion targets (aspects or features) of thereviewed product and the associated opinions [2]–[4] (e.g.,extract “battery” and pos from “It has a good battery”).Besides features about the reviewed product itself (e.g.,“battery” or “screen”), one important feature is whether thereviewed product is compatible/incompatible with anotherproduct. We call the reviewed product target entity andthe other product complementary entity . A pair of a targetentity and its complementary entity forms a complementaryrelation . They may work together to fulfill some sharedfunctionalities. So, they are usually co-purchased. For ex-ample, in Figure 1, we assume there are some reviews ofseveral accessories (on the left) talking about compatibilityissues. We consider these accessories as the target enti-ties and they have some complementary entities (on theright side) mentioned in reviews. The target entities areone micro SD card , one tablet stand and one mouse ; the Figure 1: Several target entities (reviewed products), theircomplementary entities and complementary relations men-tioned in reviews.complementary entities are one
Nikon DSLR , one iPhone ,one
Samsung Galaxy S6 and one
MS Surface Pro . An arrowpointing from a target entity to a complementary entityindicates that they have a complementary relation and shallwork together. For example, the micro SD card can helpthe
Samsung Galaxy S6 to expand its memory capacity.Knowing these complementary entities is important becausecompatible products are preferred over incompatible ones.Thus, recognizing complementary entities is an importanttask in text mining.
Problem Statement : In this paper, we study the problemof Complementary Entity Recognition (CER) from reviews(e.g., extracting “Samsung Galaxy S6” from “It works withmy Samsung Galaxy S6” ). We observe that compatibilityissues are more frequently discussed in reviews of elec-tronics accessories, so we choose reviews of accessoriesfor experiments. To the best of our knowledge, accessoryreviews are not well studied before.Predicting complementary entities is pioneered byMcAuley et al. [5] as a link prediction problem in socialnetwork. Their method mostly predicts category-level com- a r X i v : . [ c s . C L ] D ec atible products based on the learned representations of theproducts. However, we observe that reviews contain manycomplementary entities based on firsthand user experiences,which provide practical fine-grained complementary entities.We detail the discussions of their method in Section II.The proposed problem has a few challenges and alsoprovides more research opportunities: • To the best of our knowledge, the linguistic patterns ofcomplementary relations are not studied in computerscience. There is no largely annotated dataset for su-pervised methods. We propose an unsupervised method,which does not require any labeled data to solve thisproblem (we only annotate a small amount of data forevaluation purposes). • Similar to the aspect (feature) extraction problem inreviews [4], CER is also a domain-specific problem.We leverage domain knowledge to help the unsu-pervised method to adapt to different products. Thisnovel product domain knowledge is expanded usinga few seed words on a large amount of unlabeledreviews under the same category as the target entity.The idea of using reviews under the same categoryas the target entity is that the number of reviews forone target entity is small. We observe that products(target entities) under the same category share similarcomplementary entities (i.e., two different micro SDcard s may share complementary entities like phone or tablet ). So the domain knowledge expanded on reviewsfrom the same category is larger than that on reviewsfrom a single target entity. Therefore, there is almostno labor-intensive effort to get domain knowledge. Ourdomain knowledge contains candidate complementaryentities and domain-specific verbs. • Although the problem may be closely related to thewell-known Named Entity Recognition (NER) problemon surface [6], recognizing a complementary entityrequires more contexts. For example, given a reviewfor a micro SD card , we should not treat “SamsungGalaxy S6” in “Samsung Galaxy S6 is great” as acomplementary entity. However, we should considerthe same entity in “It works with my Samsung GalaxyS6” as a complementary entity. The domain knowledgecontains domain-specific verbs, which greatly help todetect the contexts of complementary entities. • We further notice that some linguistic patterns ofcomplementary relations are similar to other extractionpatterns (e.g., patterns for aspect extraction). Candidatecomplementary entities in the domain knowledge canhelp to filter out non-complementary entities extractedby similar patterns.The main contributions of this paper can be summarizedas the following: we propose a novel problem called Com-plementary Entity Recognition (CER). Then we propose a novel unsupervised method utilizing dependency pathsto identify complementary relations and extract entitiessimultaneously. We further leverage domain knowledge toimprove the precision of extraction. The domain knowledgeis expanded on a large amount of unlabeled reviews fromonly a few seed words (general complementary verbs) via anovel set of dependency paths. The expanded domain knowl-edge can greatly improve the precision of the unsupervisedmethod. We conduct thorough experiments and provide casestudies to demonstrate that the proposed method is effective.II. R
ELATED W ORKS
The proposed problem is closely related to product rec-ommender systems that are able to separate substitues andcomplements [5], [7]. Zheng et al. [7] first propose toincorporate the concepts of substitutes and complementsinto recommendation systems by analyzing navigation logs.More specifically, predicting complementary relations ispioneered by McAuley et al. [5]. They utilize topic mod-els and customer purchase information (e.g., the productsin the “items also viewed” section and the “items alsobought” section of a product page) to predict category-level substitutes and complements. However, we observe thatpurchase information generated by the unknown algorithmfrom Amazon.com tends to be noisy and inaccurate forcomplementary entities since co-purchased products may notbe complementary to each other. We demonstrate that theirpredictions are non-complementary entities for the productsthat we use for experiments in Section VI. Also, category-level predictions are not good enough for specific pairs ofproducts (i.e.,
DSLR lens and webcam are not complements).Furthermore, their predictions do not provide informationabout incompatible entities, which are valuable buyingwarnings for customers. Thus, fine-grained extraction ofcomplementary entities from reviews that express firsthanduser experience is important. To the best of our knowledge,the linguistic patterns of complementary relations are notstudied in computer science.The proposed problem is closely related to aspect extrac-tion [2]–[4], [8], which is to extract product features fromreviews. More specifically, extracting comparable products(i.e, one type of substitutes, or products that can replaceeach other) from reviews is studied by Jindal and Liu[9]. Recently, dependency paths [10] are used for aspectextraction [8], [11]. Shu et al. [12] use unsupervised graphlabeling method to identify entities from opinion targets.However, since aspects are mostly context independent andthe same aspect may appear multiple times, aspect extractionin general does not need to extract each occurrence ofan aspect (as long as the same aspect can be extractedat least once). In contrast, the CER problem is contextdependent and many complementary entities are infrequent(i.e.,
Samsung Galaxy S6 is infrequent than the aspect price ). We use dependency paths to accurately identify eachccurrence of complementary entities. Since extracting eachcomplementary entity can be inaccurate, we further utilizedomain knowledge to improve the precision.CER is closely related to Named Entity Recognition(NER) [6] and relation extraction [13]. NER methods uti-lize annotated data to train a sequential tagger [14]–[16].However, our task is totally different from NER sincewe care about the context of a complementary entity andmany complementary entities are not named entities (e.g., phone ). CER is also different from relation extraction [13],[17]–[19], which assumes that two entities are identifiedin advance. In reviews, the target entity is unfortunatelymissing in many cases (i.e., “Works with my phone”). Theproposed method only cares about the relation context of acomplementary entity rather than a full relation.III. P
RELIMINARIES
In this section, we first formally define our problem. Thenwe introduce basic ideas of the proposed method. Lastly, wedescribe dependency paths used in later sections.
A. Problem Formalization
Our problem is to recognize entities that functionallycomplement to the reviewed product. There are severaldefinitions involved in this problem.
Definition 1 (Target Entity):
We define target entity e T asthe reviewed product.We do not extract target entities from reviews but assumethat the target entity can be retrieved from the meta data(product title) of reviews. This is because many mentionsof the target entity are co-referenced or implicitly assumedin reviews. For example, if the reviewed product is a tabletstand , “It works with my Samsung Galaxy S6” uses “It”to refer to the target entity tablet stand ; “Works well withSamsung Galaxy S6” completely omits the target entity. Definition 2 (Complementary Entity):
Given a set of re-views R T of a target entity e T , a complementary entity e C is an entity mentioned in reviews that are functionallycomplementary to the target entity e T . A target entity has aset of complementary entities: e C ∈ E C .A complementary entity can either be a single noun (e.g., iPhone ) or a noun phrase (e.g., Samsung Galaxy S6 ). Thereare two types of complementary entities: a named entity or a general entity . A named entity is usually a specific productname containing a brand name and a model name (e.g.,
Samsung Galaxy S6 or Apple iPhone ). A general entity (e.g., phone or tablet ) represents a set of named entities. Generalentities are informative. For example, in a review of a tabletstand , “phone” in “It also works with my phone” is a goodassurance for phone owners who want to use this tablet stand as a phone stand . Definition 3 (Complementary Relation):
Eachcomplementary entity e C ∈ E C forms a complementaryrelation ( e T , e C ) with the target entity e T . Definition 4 (
Complementary Entity Recognition ): Given a set of reviews R T for a target entity e T , theproblem of Complementary Entity Recognition (CER) is toidentify a set of complementary entities E C , where each e C ∈ E C has a complementary relation ( e T , e C ) with thetarget entity e T .We do not extract an entity without a complementarycontext (e.g., “Samsung Galaxy S6” in “Samsung GalaxyS6 is great”, even though Samsung Galaxy S6 may be acomplementary entity).
Definition 5 (Domain):
We assume that every target en-tity e T belongs to a pre-defined domain (or category ) Dom ( e T ) = d ∈ D . A review corpora R Dom ( e T ) is allreviews under the same category as the target entity e T . Definition 6 (Domain Knowledge):
Each domain d hasits own domain knowledge . We consider two types of domainknowledge: candidate complementary entity e dC ∈ E dC and domain-specific verb v d ∈ V d . All target entities e T underthe same domain share the same domain knowledge. B. Basic Ideas
The basic idea of the proposed method is to use de-pendency paths to identify complementary entities. Due todifferent linguistic patterns, these dependency paths mayhave different performance on extraction. Some dependencypaths may have high precision but low recall and viceversa. To ensure the quality of extraction, high precisiondependency paths are preferred. The idea of using domainknowledge is that high precision dependency paths canexpand high quality (precision) domain knowledge on alarge amount of unlabeled reviews, which in turn helpslow precision but high recall dependency paths to improvetheir precisions. In the end, the domain knowledge servesas a filter to remove noises in low precision paths. Thisframework can potentially be generalized to any extractiontask when a large amount of unlabeled data is accessible. Wedescribe the proposed method in the following two parts:
Basic Entity Recognition : We analyze the linguistic pat-terns and leverage multiple dependency paths to recognizecomplementary entities. The major goal of the basic entityrecognition is to get high recall because each complementaryentity can be infrequent and we care about each mention ofa complementary entity. Due to similarity with other noisypatterns, these paths tend to have a low precision.
Recognition via Domain Knowledge Expansion : We ex-pand the domain knowledge on a large amount of unlabeledreviews using a set of high precision dependency pathsto compensate for the low precision (noisy) dependencypaths. First, we extract candidate complementary entities foreach domain using only verbs fit and work . Then we usethe extracted candidate complementary entities to inducedomain-specific verbs (e.g., insert for micro SD card , or hold for tablet stand ). Finally, we integrate these two typesf domain knowledge into the dependency paths of basicentity recognition to improve the precision. C. Dependency Paths
In this subsection, we briefly review the concepts usedby dependency paths. We further describe how to match adependency path with a sentence.
Definition 7 (Dependency Relation): A dependency rela-tion is a typed relation between two words in a sentencewith the following format of attributes : type(gov, govidx, govpos, dep, depidx, deppos) , where type is the type of a dependency relation, gov is the governor word , govidx is the index (position) of the gov word in the sentence, govpos is the POS (Part-Of-Speech)tag of the gov word, dep is the dependent word , depidx isthe index of the dep word in the sentence and deppos is thePOS tag of the dep word. The direction of a dependencyrelation is from the gov word to the dep word.A sentence can be parsed into a set of dependency rela-tions through dependency parsing [10], [20]. For example,“It works with my phone” can be parsed into a set ofdependency relations in Table I, which is further illustratedin Figure 2.Figure 2: Visualization of dependency relations of “It workswith my phone”: numbers indicates indices. Definition 8 (Dependency Segment): A dependency seg-ment is an abstract form of a dependency relation. Adependency segment has the following format of attributes,which is similar to a dependency relation: ( src, srcpos ) pathtype −−−−→ ( dst, dstpos ) , where src is the source word , srcpos is the POS tag of thesource word, dst is the destination word , dstpos is the POStag of the destination word and pathtype is the dependencytype of the segment. Similarly, the direction of an segmentis from the src word to the dst word. Definition 9 (Dependency Segment Matching):
A depen-dency segment can have a dependency segment matching with a dependency relation. To have such a match, we mustensure that attributes src , srcpos , dst , dstpos and pathtype inan segment match attributes gov , govpos , dep , deppos and type in a dependency relation respectively. So the direction We utilize Stanford CoreNLP as the tool for dependency parsing.
Table II: Rules of matching attributes of dependency seg-ments and dependency relations (all unspecified attributesmust have an exact match): [ lem. word ] means lemmatizedword, which matches multiple specific forms of the sameword (e.g., “work” matches “works” and “working”);
CETT indicates the complementary entity we want to extract.
Path Attr. Value Rel. Attr. Value src/dst [ lem. word ] gov/dep [ specific form ] src/dst * / CETT gov/dep [ any word ] srcpos/dstpos N gov/dep NN NNP NNPS NPsrcpos/dstpos V gov/dep VB VBD VBGVBN VBP VBZsrcpos/dstpos J gov/dep JJ JJR JJSpathtype nmod:cmprel type nmod:with nmod:fornmod:in nmod:onnmod:to nmod:insidenmod:into of a dependency segment also matches the direction of adependency relation.To allow a matching to cover more specific dependencyrelations, we further define a set of rules when matching theattributes, which are summarized in Table II. Please note thatwe finally want to extract the complementary entity coveredby tag CETT . Other kinds of attributes are defined to makethe dependency paths more compact.
Example 1:
The segment: ( “work”, V ) nmod:cmprel −−−−−−−→ ( CETT, N ) (1)can match the dependency relation 5 in Table I. This is be-cause source word “work” is the lemmatized governor word “works” ; V covers VBZ ; N covers NN ; and nmod:cmprel covers dependency type nmod:with . Since the tag CETT asthe destination word in the segment covers the dependentword “phone” in dependency relation 5, this segment indi-cates “phone” is a possible complementary entity.
Definition 10 (Dependency Path): A dependency path isa finite sequence of dependency segments connected by asequence of src/dst attributes.Given different directions of 2 adjacent dependency seg-ments, there are 4 possible types of a connection: →→ , →← , ←→ and ←← . Definition 11 (Dependency Path Matching):
Aprocedure of dependency path matching is specifiedas the following: when matching a dependency path witha sentence, we first check whether there are at leastone dependency relations for each segment. If so, wefurther check whether the two directions of dependencysegments for each connection match the directions oftwo corresponding dependency relations and whether theconnected governor/dependent words from two matcheddependency relations have the same index (they are thesame word in the original sentence).able I: Dependency relations parsed from “It works with my phone.”
ID Dependency Relation Syntactic Dependency Relation Type Explanation1 nsubj(works, 2, VBZ, It, 1, PRP) nsubj : nominal subject Relate the 1st word “It”to the 2nd word “works”2 root(ROOT, 0, None, works, 2, VBZ) root : root relation Relate the 2nd word “works”to the virtual word ROOT3 case(phone, 5, NN, with, 3, IN) case : case-marking Relate the 3rd word “with”to the 5th word “phone”4 nmod:poss(phone, 5, NN, my, 4, PRP $ ) nmod:poss : possessive nominal modifier Relate the 4th word “my”to the 5th word “phone”5 nmod:with(works, 2, VBZ, phone, 5, NN) nmod:with : nominal modifier via with Relate the 5th word “phone”to the 2nd word “works” Finally, after we have a successful dependency pathmatching, we extract the gov/dep in dependency relationslabeled as
CETT by the dependency path.
Example 2:
The following path ( *, V ) nmod:with −−−−−→ ( CETT, N ) nmod:poss −−−−−→ ( “my”, PRP $ ) (2)can match the sentence “It works with my phone” since thetwo segments match dependency relation 5 and 4 respec-tively. Here wildcard * matches word “works”. Further thedependent word “phone” of the dependency relation 5 havethe same index (the 5th word described in Table I) as thegovernor word of the dependency relation 4.IV. B ASIC E NTITY R ECOGNITION
A. Syntactic Patterns of Complementary Relation
There are many ways to mention complementary relationsin reviews. Complementary relations are usually expressedwith or without a preposition. In the first case, the prepo-sition is used to bring out the complementary entity andis usually associated with a verb, a noun, an adjective or adeterminer; in the second case without a preposition, review-ers only use transitive verbs to bring out the complementaryentities. The verbs used in both cases can either be generalverbs such as “fit” or “work”, or domain-specific verbs suchas “insert” for micro SD card or “hold” for tablet stand .Complementary relations can also be expressed throughnouns, adjectives or determiners. We discuss the syntacticpatterns of complementary relations as the following:
Verb+Prep : The majority of complementary relations areexpressed through a verb followed by a preposition. Forexample, “It works with my phone” falls into this pattern,where the verb “works” and the preposition “with” worktogether to relate the pronoun “It” to “phone”. The targetentity can appear in this pattern either as the subject oras the object of the verb. In the previous example, subject“It” indicates the target entity. In “I insert the card into myphone”, “the card” is the object of the verb “insert”. Thetarget entity can also be implicitly assumed as in “Workswith my phone.”
Noun+Prep : Complementary relation can be expressedthrough nouns. Those nouns typically have opinions. Forexample, “No problem” in “No problem with my phone”has a positive opinion on “phone”.
Adjective+Prep : Complementary relation can also be ex-pressed through adjectives with prepositions. For example,the adjective “useful” together with the preposition “for” in“It is useful for my phone” expresses a positive opinion ona complementary relation.
Determiner+Prep : Determiner “this” in “I use this for myphone” refers to the target entity. It is associated with thepreposition “for” in dependency parsing.
Verb : Complementary relation can be expressed onlythrough verbs without using any preposition. For example,in “It fits my phone”, subject “It” is related to the object“phone” via only the transitive verb “fits”. This pattern haslow precision on extraction since almost every sentence hasa subject, a verb and an object. We improve the precisionof this pattern using the domain knowledge in Section V.
B. Dependency Paths for Extraction
According to the discussed patterns, we implement de-pendency paths, which are summarized in Table III. Forpatterns with a preposition (e.g., Verb+Prep, Noun+Prep,Adjective+Prep, Determiner+Prep), we use dependency type nmod:cmprel to encode all prepositions, because cmprel represents with , for , in , on , to , inside and into as described inSection III. Then type nmod:cmprel can relate verbs, nouns,adjectives or determiners to the complementary entities.As shown in Example 1 and 2, nmod:cmprel can match nmod:with and relates the verb “works” to the complemen-tary entity “phone” for dependency relation 5 in Table I.This path is defined as Path 1 in Table III.For pattern Verb, we use dependency type dobj to relatea verb to the complementary entity. Since this pattern tendsto have low precision, we further constrain the pattern byconnecting a nsubj relation or a nmod:poss relation, asdescribed in Path 5 or Path 6 respectively in Table III. Forexample, “It fits iPhone” has the following two dependencyrelations: nsubj(“fits”, VBZ, 2, “It”, PRP, 1) and dobj(“fits”, able III: Summary of dependency paths: CETT indicates the complementary entity we want to extract; verb indicates anyverb for Section IV or domain-specific verbs for Section V.
Path Type ID Path ExampleVerb+Prep 1 ( verb, V ) nmod:cmprel −−−−−−−→ ( CETT, N ) It works/V with my phone[
CETT ].Noun+Prep 2 ( *, N ) nmod:cmprel −−−−−−−→ ( CETT, N ) No problem/N with my phone[
CETT ].Adjective+Prep 3 ( *, J ) nmod:cmprel −−−−−−−→ ( CETT, N ) It is compatible/J with my phone[
CETT ].Determiner+Prep 4 ( *, DT ) nmod:cmprel −−−−−−−→ ( CETT, N ) I use this/DT for my phone[
CETT ].Verb 5 ( verb, V ) dobj −−→ ( CETT, N ) nmod:poss −−−−−→ ( “my”, PRP $ ) It fits my phone[
CETT ].6 ( “it”/“this”, DT ) nsubj ←−−− ( verb, V ) dobj −−→ ( CETT, N ) It fits iPhone[
CETT ]. VBZ, 2, “iPhone”, NNP, 3) . Path 6 can match these twodependency relations separately and then check the two“fits”s have the same index in these two dependencyrelations. So “iPhone” tagged as CETT can be extracted.Finally, these paths may appear multiple times in asentence. So multiple complementary entities in a sentencecan be extracted. For example, “It works with my phone,laptop and tablet” has 3 complementary entities. It has thefollowing 3 dependency relations: nmod:with(“works”, VBZ,2, “phone”, NN, 5) , nmod:with(“works”, VBZ, 2, “laptop”,NN, 7) and nmod:with(“works”, VBZ, 2, “tablet”, NN, 9) .So Path 1 can have 3 matches to extract “phone”, “laptop”and “tablet”.Please note that Table III does not list all possible de-pendency paths. For example, complementary entities canalso serve as the subject of a sentence: “My phone likes thiscard”. We simply demonstrate typical dependency paths andnew dependency paths can be easily added into the systemto improve the recall. C. Post-processing
Since a dependency relation can only handle the rela-tion between two individual words, a complementary entity(labeled by
CETT ) extracted from Subsection B can onlycontain a single word. In reality, many complementaryentities are named entities that represent product names suchas “Samsung/NNP Galaxy/NNP S6/NNP”. Dependency rela-tions usually pick a single noun (e.g., “S6”) and relate it withother words in the phrase via other dependency relations(e.g., type compound ). We use the regular expression pattern (cid:104) N (cid:105)(cid:104) N | CD (cid:105) * to chunk a single noun into a noun phrase .This pattern means one noun ( N ) followed by 0 to manynouns or numbers. Nouns and numbers (model number) aretypical POS tags of words in a product name.V. R ECOGNITION VIA D OMAIN K NOWLEDGE E XPANSION
Using the paths defined in Section IV tends to havelow precision (noisy) of extractions since syntactic patternsmay not distinguish a complementary relation from other relations. For example, Path 6 can match any sentence withtype dobj . A sentence like “It has fast speed” uses type dobj to bring out “speed”, which is a feature of the target entityitself. To improve the precision, we incorporate category-level domain knowledge (candidate complementary entitiesand domain-specific verbs) into the extraction process. Thoseknowledge can help to constrain possible choices of CETT and verb in dependency paths defined in Section IV.We mine domain knowledge from a large amount ofunlabeled reviews under the same category. We get those twotypes of domain knowledge by bootstrapping them only fromgeneral verb fit and work . We randomly select 6000 reviewsfor each domain (category) to accumulate enough knowledge(knowledge from reviews of a single target entity may not besufficient). One important observation is that products underthe same domain share similar complementary entities anduse similar domain-specific verbs. For example, all micro SDcards have camera , camcorder , phone , tablet , etc. as theircomplementary entities and use verbs like insert to expresscomplementary relations. But these complementary entitiesand domain-specific verbs do not make sense for category tablet stand . To ensure the quality of the domain knowledge,we utilize several high precision dependency paths. Thesepaths have low recall, so applying them directly to thetesting reviews of the target entity has poor performance.High precision paths can leverage big data to improve theprecision of other paths in Section IV. A. Exploiting Candidate Complementary Entities
Knowing category-level candidate complementary entitiesis important for extracting complementary entities for atarget entity under that category. For example, the sentences“It works in iPhone”, “It works in practice” and “It worksin 4G” have similar dependency relations nmod:in(“works”,VBZ, 2, “iPhone”/ “practice”/ “4G”, NN, 4) . But only thefirst sentence has a mention of a complementary entity; thesecond sentence has a common phrase “in practice” with apreposition “in”; the third sentence expresses an aspect of thetarget entity. The key idea is that if we know that iPhone is apotential complementary entity under the category of microSD card and “practice” and “4G” are not, we are confidentable IV: Summary of dependency paths for extracting Candidate Complementary Entities (CCE) and Domain-SpecificVerbs (DSV)
Type ID Path ExampleCCE 7 ( “fit”/“work”, V ) nmod:cmprel −−−−−−−→ ( CETT, N ) nmod:poss −−−−−→ ( “my”, PRP $ ) It works with my phone[
CETT ].DSV 8 ( verb, V ) nmod:cmprel −−−−−−−→ ( CETT, N ) nmod:poss −−−−−→ ( “my”, PRP $ ) I insert[ verb ] the card into my phone[
CETT ].9 ( “this”, DT ) dobj ←−− ( verb, V ) nmod:poss −−−−−→ ( “my”, PRP $ ) This holds[ verb ] my phone[
CETT ] well. to extract “iPhone” as a complementary entity.We use Path 7 to extract candidate complementary entitiesas described in Table IV. It has high precision because givena verb like “fit” or “work”, a preposition that relates toanother entity and the possessive pronoun “my”, we are con-fident that the entity modified by “my” is a complementaryentity. Lastly, all extracted complementary entities are storedas domain knowledge for each category.
B. Exploiting Domain-Specific Verbs
Similarly, knowing category level domain-specific verbsis also important. This is because each category of productsmay have its own domain verbs to describe a complementaryrelation. If we only use general verbs (e.g., fit and work ),we may miss many complementary entities that are bring outvia domain-specific verbs (e.g., insert for micro SD card or hold for tablet stand ), and this leads to poor recall rate.In contrast, if we consider all verbs into the paths withoutdistinguishing them as in Section IV, we may bring in lotsof noisy false positives. For example, if the target entityis a tablet stand , “It holds my tablet” and “It preventsmy finger going numb” have similar dependency relations( dobj(“holds”/“prevents”, VB, 2, “tablet”/“finger”, NN,4) ). The former one has a complementary entity since“holds” indicates a functionality that a tablet stand can have.The latter one does not have one. So if we know hold (we lemmatize the verbs) is a domain-specific verb underthe category of tablet stand and “prevents” is not, we aremore confident to get rid of the latter one. Therefore, wedesign dependency paths to extract high quality domain-specific verbs. This time, candidate complementary entitiescan help to identify whether a verb has a semantic meaningof complement . So we leverage the domain knowledgeextracted in Subsection A to extract domain-specific verbs.In the end, we get domain-specific verbs from general seedverbs fit and work .Path 8 and 9 in Table IV are used to get verbs in patternVerb+Prep and Verb respectively. These paths also have highprecision because given possessive modifier “my” modifyinga complementary entity or determiner “this” indicating atarget entity it is almost certain that the verb between themindicates a complementary relation. Then we keep the wordstagged by verb more than once (to reduce the noise) andstore them as domain knowledge. Please note that we do Table V: Statistics of the annotated dataset on the num-ber of reviews, the number of sentences, the number ofcomplementary relations and the number of reviews withcomplementary relations Product Revs. Sents. Rel. Revs. w/ Rels.Stylus 216 892 165 116Micro SD Card 216 802 193 149Mouse 216 1158 221 136Tablet Stand 218 784 154 115Keypad 114 618 113 76Notebook Sleeve 109 405 125 84Compact Flash 113 347 99 82 not further expand domain knowledge to avoid reducing thequality of domain knowledge.
C. Entity Extraction using Domain Knowledge
We use the same dependency paths in Section IV toperform extraction. But this time we utilize the knowledgeof candidate complementary entities and domain-specificverbs under the same category as the target entity. Duringmatching, we look up candidate complementary entities anddomain-specific verbs for tags
CETT and verb respectively.But there is an exception for
CETT . Since a named entityas a complementary entity may rarely appear again in alarge amount of reviews, we ignore such a check if theword covered by
CETT can be expanded into a noun phrase(more than 1 word) during post-processing. Furthermore, wenotice that knowledge about target entities is also useful.For example, “I insert this card into my phone” uses “this”to bring out the target entities, which may indicate nearbyentities are complementary entities. However, knowledgeabout a target entity may be expanded on reviews of thattarget entity (test data) rather than reviews under the samecategory because target entities are not the same under thesame category.VI. E
XPERIMENTAL R ESULTS
A. Dataset
We select reviews of 7 products that have frequent men-tions of complementary relations from the Amazon reviewdatasets [5]. We choose accessories because compatibilityissues are more frequently discussed in accessory reviews.The products are stylus , micro SD card , mouse , tablet stand ,able VI: Comparison of different methods in precision, recall and F1-score Product NP Chunker OpenNLP UIUC NER CRF Sceptre
P R F P R F P R F P R F P @25Stylus 0.21 0.96 0.35 0.03 0.13 0.05 0.41 0.21 0.28 0.69 0.46 0.55 0.04Micro SD Card 0.26 0.99 0.41 0.04 0.14 0.07 0.34 0.39 0.36 0.85 0.47 0.6 0.16Mouse 0.22 0.98 0.36 0.1 0.4 0.15 0.3 0.26 0.28 0.65 0.4 0.49 0.16Tablet Stand 0.25 0.97 0.4 0.06 0.21 0.09 0.82 0.16 0.27 0.73 0.44 0.55 0.04Keypad 0.2 0.98 0.33 0.05 0.21 0.08 0.4 0.25 0.31 0.63 0.24 0.35 0.04Notebook Sleeve 0.33 0.97 0.5 0.05 0.1 0.06 0.79 0.26 0.4 0.64 0.26 0.37 0.0Compact Flash 0.3 0.95 0.46 0.06 0.16 0.09 0.56 0.36 0.44 0.77 0.33 0.46 0.04“My” Entity CER CER1K+ CER3K+ CER6K+
P R F P R F P R F P R F P R F Stylus 0.5 0.54 0.52 0.35 0.89 0.5 0.89 0.64 0.75 0.88 0.69 0.77 0.86 0.71
Micro SD Card 0.63 0.51 0.56 0.39 0.8 0.52 0.81 0.64 0.71 0.79 0.66 0.72 0.8 0.67
Mouse 0.54 0.37 0.44 0.35 0.91 0.5 0.69 0.69 0.69 0.66 0.7 0.68 0.66 0.72
Tablet Stand 0.58 0.43 0.49 0.41 0.84 0.55 0.68 0.39 0.5 0.75 0.69 0.72 0.75 0.72
Keypad 0.54 0.46 0.5 0.33 0.92 0.49 0.66 0.67 0.66 0.67 0.73 0.7 0.69 0.82
Notebook Sleeve 0.69 0.38 0.49 0.46 0.71 0.56 0.93 0.5 0.65 0.93 0.65 0.76 0.92 0.66
Compact Flash 0.75 0.61 0.67 0.46 0.88 0.6 0.86 0.63 0.73 0.86 0.68 0.76 0.85 0.7
Table VII: Running time (in seconds(s) ) of expanding domain knowledge from 1K, 3K and 6K reviews and samples ofcandidate complementary entities and domain-specific verbs
Category 1K(s) 3K(s) 6K(s) Candidate Complementary Entity Domain-Specific VerbsCat:Stylus 1.16 4.53 7.49 ipad 2, tablet, iPhone, Samsung Galaxy 2 scratch, match, press, draw, sketch, signCat:Micro SD Card 1.23 3.67 5.58 laptop, psp, galaxy s4, Galaxy tab add, insert, plug, transfer, store, stickCat:Mouse 1.61 5.1 7.71 Macbook pro, laptop bag, MacBook Air move, rest, carry, connect, clickCat:Tablet Stand 1.51 4.08 6.93 Nook, ipad 2, Kindle Fire, Galaxy tab, fire rest, insert, stand, support, hold, sitCat:Keypad 1.25 2.93 6.17 MacBook, MacBook pro, Mac hook, connect, go, need, use, fit, plugCat:Notebook Sleeve 1.11 2.79 5.46 backpack, Macbook pro, Lenovo x220 show, scratch, bring, feel, protectCat:Compact Flash 1.49 3.29 6.45 dslr, Canon rebel, Nikon d700 load, pop, format, insert, put keypad , notebook sleeve and compact flash . We select nearly220 reviews for the first 4 products and 110 reviews forthe last 3 products. We select 50% reviews of the first 4products as the training data for Conditional Random Field(CRF) (one supervised baseline). The remaining reviews ofthe first 4 products and all reviews of the last 3 productsare test data. We split the training/testing data for 5 timesand average the results. We label complementary entitiesin each sentence. The whole datasets are labeled by 3annotators independently. The initial agreement is 82%.Then disagreements are discussed and final agreements arereached. The statistics of the datasets can be found in TableV. We observe that more than half of the reviews have atleast one mention of complementary entities and more than10% sentences have at least one mention of complementaryentities.We also utilize the category information in the meta dataof each review to group reviews under the same categorytogether. Then we randomly select 1000 (1K), 3000 (3K),6000 (6K) reviews from each category and use them for ∼ hxu/ extracting domain knowledge. We choose different scales ofreviews to see the performance of CER under the help ofdifferent sizes of domain reviews and the scalability of therunning time of domain knowledge expansion. B. Compared Methods and Evaluation
Since the proposed problem is novel, there are not somuch existing baselines that can directly solve the problem.Except for CRF, we compare existing trained models orunsupervised methods with the proposed methods.
NP Chunker : Since most product names are Noun Phrases(NP), we use the same noun phrase chunker ( (cid:104) N (cid:105)(cid:104) N | CD (cid:105) * )as the proposed method to extract nouns or noun phrases andtake them as names of complementary entity. This baselineis used to illustrate a close to random results. OpenNLP NP Chunker : We utilize the trained noun phrasechunking model from OpenNLP to tag noun phrases. Weonly consider chunks of words tagged as NP as predictionsof complementary entities. UIUC NER : We use UIUC Named Entity Tagger [21]to perform Named Entity Recognition (NER) on product https://opennlp.apache.org/ eviews. It has 18 labels in total and we consider entitieslabeled as PRODUCT and
ORG as complementary entities.We use this baseline to demonstrate the performance of anamed entity tagger.
CRF : We retrain a Conditional Random Field (CRF) modelusing 50% reviews of the first 4 products. We use BIOtags. For example, “Works with my Apple iPhone” shouldbe trained/predicted as “Works/O with/O my/O Apple/BiPhone/I”. We use MALLET as the implementation of CRF. Sceptre : We also retrieve the top 25 complements for thesame 7 products from Sceptre [5] and adapt their resultsfor a comparison. Direct comparison is impossible sincetheir task is a link prediction problem with different labeledground truths. We label and compute the precision of the top25 predictions and assume annotators have the same back-ground knowledge for both datasets. We observe that thepredicted products are mostly non-complementary products(e.g., network cables , mother board ) and all 7 products havesimilar predictions. “My” Entity : This baseline extracts complementary entitiesby finding all nouns/noun phrases modified by word “my”via dependency type nmod:poss (e.g., “It works with myphone”). The word “my” usually indicates a product alreadypurchased, so the modified nouns/noun phrases are highlypossible complementary entities. We use path ( CETT, N ) nmod:poss −−−−−→ ( “my”, PRP $ ) to extract complementary entities and use the same post-process step as CER/CER1K/3K/6K+. CER : This method uses all paths described in Section IVwithout using any domain knowledge.
CER1K+, CER3K+, CER6K+ : These methods incorporatedomain knowledge extracted from 1000/3000/6000 domainreviews respectively, as described in both Section IV and V.We perform our evaluation on each mention of com-plementary entities and compute precision and recall ofextraction. We first count the true positive tp , the falsepositive fp and the false negative fn of each prediction. Foreach sentence, one extracted complementary entity that iscontained in the annotated complementary entities from thesentence is considered as one count for tp ; one extractedcomplementary entity that are not found contributes onecount to fp ; any annotated complementary entity that cannot be extracted contributes one count to fn . We run thesystem on an i5 laptop with 4GB memory. The system isimplemented using Python. All reviews are preprocessed viadependency parsing [20]. C. Result Analysis
Table VI demonstrates results of different methods. Wecan see that CER6K+ performs well on all products. Itsignificantly outperforms CER for each product. This shows http://mallet.cs.umass.edu/ that domain knowledge can successfully reduce the noiseand improve the precision. More importantly, we noticethat using just 3K reviews already gets good performance.This is important for categories with less than 6K reviews.We notice that the F1-scores of CER are close or worsethan baselines such as CRF or “My” Entity. The majorreason of its low precisions is that Path 5 and Path 6 inTable III can introduce many false positives as we expected.Please note that removing Path 5 and 6 can increase theF1-score of CER. But to have a fair comparison withCER1K/3K/6K+ and demonstrate the room of improvement,we keep noisy Path 5 and 6 in CER. “My” Entity hasbetter precision but lower recall than those of CER baselinessince not all complementary entities are modified by “my”.CRF performs relatively good on these products. But theperformance drops for the last 3 products because of thedomain adaptation problem. In reality, it is impractical tohave training data for each product. Sceptre performs poorly,we guess the reason is that products in “Items also bought”are noisy for training labels. The overall recall of UIUC NERis low because many complementary entities (e.g., generalentities like tablet ) are not named entities. Please note thatthe information of domain knowledge (or unlabeled data)may help other baselines, but all those baselines may notable to adopt domain knowledge easily. The running timeof all testing is short (less than 1 seconds), so we omit thediscussion here.Next, we demonstrate the running time of domain knowl-edge expansion and samples of domain knowledge in TableVII. We observe that expanding knowledge is pretty fast andscalable as the size of reviews grow. We can see that for eachcategory most entities and verbs are reasonable based on ourcommon sense. For example, for category Cat:Stylus , thesystem successfully detects capacitive screen devices as itscandidate complementary entities and most drawing actionsas domain-specific verbs.
D. Case Studies
We notice that category-level domain knowledge is usefulfor extraction. Knowing candidate complementary entitiescan successfully remove many words that are not comple-mentary entities or even entities. In the reviews of micro SDcard , many features such as speed , data , etc. are mentioned;also, common phrases like “in practice”, “in reality”, “in thelong run” are also mentioned. Handling these cases one-by-one is impractical since identifying different types of falsepositive examples needs different techniques to identify.But knowing candidate complementary entities can easilyremove those false positives.Domain-specific verbs such as draw , insert and hold aresuccessfully mined for stylus , micro SD card and tablet stand respectively. Taking tablet stand for example, the significantimprovement of the precision of CER1K/3K/6K+ comesfrom taking hold as a domain-specific verb. Reviewers areess likely to use general verbs such as fit or work for tabletstand . The reason could be that a tablet is loosely attachedto a tablet stand . So people tend to use “It holds tablet well”a lot. However, this sentence has a dobj relation that usuallyrelates a verb to an object, which can appear in almostany sentence. Knowing hold is a domain-specific verb isimportant to improve the precision. The major errors comefrom parsing errors since reviews are informal texts.VII. C ONCLUSION
In this paper, we propose the problem of CER. Then wepropose an unsupervised method using dependency paths tosolve this problem. It further incorporates domain knowledgemined from a large amount of unlabeled reviews to improveits precision. Applications of our work can be found inmining compatible/incompatible products, which is usefulfor customers, manufacturers and recommender systems.Future directions of our work are mining opinions oncomplementary relations.A
CKNOWLEDGMENT
This work is supported in part by NSF through grants IIS-1526499 and CNS-1626432. We gratefully acknowledge thesupport of NVIDIA Corporation with the donation of theTitan X GPU used for this research.R
EFERENCES [1] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sen-timent classification using machine learning techniques,” in
Proceedings of the ACL-02 conference on Empirical methodsin natural language processing-Volume 10 . Association forComputational Linguistics, 2002, pp. 79–86.[2] M. Hu and B. Liu, “Mining and summarizing customerreviews,” in
Proceedings of the tenth ACM SIGKDD interna-tional conference on Knowledge discovery and data mining .ACM, 2004, pp. 168–177.[3] A.-M. Popescu and O. Etzioni, “Extracting product featuresand opinions from reviews,” in
Natural language processingand text mining . Springer, 2007, pp. 9–28.[4] B. Liu,
Sentiment Analysis: Mining Opinions, Sentiments, andEmotions . Cambridge University Press, 2015.[5] J. McAuley, R. Pandey, and J. Leskovec, “Inferring networksof substitutable and complementary products,” in
Proceed-ings of the 21th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining . ACM, 2015, pp.785–794.[6] D. Nadeau and S. Sekine, “A survey of named entity recogni-tion and classification,”
Lingvisticae Investigationes , vol. 30,no. 1, pp. 3–26, 2007.[7] J. Zheng, X. Wu, J. Niu, and A. Bolivar, “Substitutes orcomplements: another step forward in recommendations,”in
Proceedings of the 10th ACM conference on Electroniccommerce . ACM, 2009, pp. 139–146. [8] G. Qiu, B. Liu, J. Bu, and C. Chen, “Opinion word expansionand target extraction through double propagation,”
Computa-tional linguistics , vol. 37, no. 1, pp. 9–27, 2011.[9] N. Jindal and B. Liu, “Mining comparative sentences andrelations,” in
AAAI , vol. 22, 2006, pp. 1331–1336.[10] S. K¨ubler, R. McDonald, and J. Nivre, “Dependency parsing,”
Synthesis Lectures on Human Language Technologies , vol. 1,no. 1, pp. 1–127, 2009.[11] Q. Liu, Z. Gao, B. Liu, and Y. Zhang, “Automated rule selec-tion for aspect extraction in opinion mining,” in
InternationalJoint Conference on Artificial Intelligence (IJCAI) , 2015.[12] L. Shu, B. Liu, H. Xu, and A. Kim, “Lifelong-rl: Lifelongrelaxation labeling for separating entities and aspects in opin-ion targets,” in
Proceedings of the Conference on EmpiricalMethods in Natural Language Processing (EMNLP) , 2016.[13] N. Bach and S. Badaskar, “A review of relation extraction,”
Literature review for Language and Statistics II , 2007.[14] L. R. Rabiner and B.-H. Juang, “An introduction to hiddenmarkov models,”
ASSP Magazine, IEEE , vol. 3, no. 1, pp.4–16, 1986.[15] A. McCallum, D. Freitag, and F. C. Pereira, “Maximumentropy markov models for information extraction and seg-mentation.” in
ICML , vol. 17, 2000, pp. 591–598.[16] J. Lafferty, A. McCallum, and F. C. Pereira, “Conditionalrandom fields: Probabilistic models for segmenting and la-beling sequence data,” in
Proceedings of the EighteenthInternational Conference on Machine Learning (ICML 2001),Williams College, Williamstown, MA, USA, June 28 - July 1,2001 , 2001, pp. 282–289.[17] A. Culotta and J. Sorensen, “Dependency tree kernels for re-lation extraction,” in
Proceedings of the 42nd Annual Meetingon Association for Computational Linguistics . Associationfor Computational Linguistics, 2004, p. 423.[18] M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distantsupervision for relation extraction without labeled data,” in
Proceedings of the Joint Conference of the 47th Annual Meet-ing of the ACL and the 4th International Joint Conferenceon Natural Language Processing of the AFNLP: Volume 2-Volume 2 . Association for Computational Linguistics, 2009,pp. 1003–1011.[19] R. C. Bunescu and R. J. Mooney, “A shortest path depen-dency kernel for relation extraction,” in
Proceedings of theconference on human language technology and empiricalmethods in natural language processing . Association forComputational Linguistics, 2005, pp. 724–731.[20] M.-C. De Marneffe and C. D. Manning, “Stanford typeddependencies manual,” Technical report, Stanford University,Tech. Rep., 2008.[21] L. Ratinov and D. Roth, “Design challenges and miscon-ceptions in named entity recognition,” in