Hier-SPCNet: A Legal Statute Hierarchy-based Heterogeneous Network for Computing Legal Case Document Similarity
Paheli Bhattacharya, Kripabandhu Ghosh, Arindam Pal, Saptarshi Ghosh
HHier-SPCNet: A Legal Statute Hierarchy-based HeterogeneousNetwork for Computing Legal Case Document Similarity
Paheli Bhattacharya
IIT Kharagpur, India
Kripabandhu Ghosh
Tata Research Development and Design Centre, Pune,India
Arindam Pal
Data61, CSIRO and Cyber Security CRCSydney, NSW, Australia
Saptarshi Ghosh
IIT Kharagpur, India
ABSTRACT
Computing similarity between two legal case documents is an im-portant and challenging task in Legal IR, for which text-basedand network-based measures have been proposed in literature. Allprior network-based similarity methods considered a precedentcitation network among case documents only (PCNet). However,this approach misses an important source of legal knowledge –the hierarchy of legal statutes that are applicable in a given le-gal jurisdiction (e.g., country). We propose to augment the PCNetwith the hierarchy of legal statutes, to form a heterogeneous net-work Hier-SPCNet, having citation links between case documentsand statutes, as well as citation and hierarchy links among thestatutes. Experiments over a set of Indian Supreme Court case doc-uments show that our proposed heterogeneous network enablessignificantly better document similarity estimation, as compared toexisting approaches using PCNet. We also show that the proposednetwork-based method can complement text-based measures forbetter estimation of legal document similarity.
CCS CONCEPTS • Information systems → Information retrieval ; •
Appliedcomputing → Law . KEYWORDS
Legal document similarity; citation network; Statute hierarchy;Heterogeneous network; Network embeddings; Legal IR
Many countries such as India, Australia, United States and UnitedKingdom follow the
Common Law System , wherein there are twoprimary sources of law – (1) Statutes or written laws (e.g., Section302 of Indian Penal Code which describes punishment for murder),and (2) Precedents or prior cases decided by important courts (e.g.,the Supreme Court, High Courts). In such a system, law practi-tioners have to look up a huge number of prior cases that match agiven situation or a particular case. This calls for developing legalIR systems, such as recommendation and prior-case search systems.A key step for developing these legal IR systems is to estimate thesimilarity between two legal case documents , which is challengingbecause legal documents are long, complicated and unstructured [3,4, 6, 8]. Also, there is no well defined notion of legal similarity –two legal case documents are considered similar if legal experts judge them to be similar. In this work, we focus on the challenge ofautomating this similarity computation.Although there exists several supervised methods for general doc-ument similarity (e.g., for measuring similarity of news articles [5]),having such supervised methods for legal document similarity isnot practical. This is because training such supervised models needa gold standard containing thousands of similar document pairs.Since legal document similarity can be verified only by legal experts,developing such a gold standard is prohibitively expensive. Exist-ing methodologies for finding similar legal documents are henceunsupervised [3, 4, 6, 8].The existing methods for computing legal document similarityand can be broadly classified into network-based methods that relyon citation to prior case documents [3, 8], and text-based methods that rely on the textual content of the documents [6], and hybrid [4].In this paper, we focus on network-based approaches. All exist-ing network-based methods (including the hybrid ones [4]) rely ona precedent citation network (PCNet) that capture citations from onecase document to prior-case documents (see Section 2). However,PCNet misses an important source of legal information that is inher-ent in the statutes of a particular jurisdiction (e.g., country). Basedon what we understand from discussions with Law practitionersin India (faculty members from the Rajiv Gandhi School of Intel-lectual Property Law, India), statutes represent the written lawsand are hence a valuable source of legal knowledge, that can beused in several tasks including estimating similarity between legaldocuments. Hence, in this work, we augment PCNet to constructa heterogeneous network Hier-SPCNet (Hierarchical Statute andPrecedent Citation Network – see Figure 1) that encompasses thestructure of the statutes as well as citation information present inthem.To estimate the similarity between legal documents, we proposeto apply the graph embedding algorithm Metapath2vec [1] on theheterogeneous Hier-SPCNet. Our method relies on the key ideathat if two documents cite a common statute/precedent or if twodocuments cite different statutes/precedents that are themselvesstructurally similar in the network, then the two documents may bediscussing similar legal issues, which is a strong signal for estimat-ing document similarity. We evaluate our approach on a set of 100document pairs comprising of case judgments from the SupremeCourt of India, whose similarities have been annotated by legal ex-perts. Results show that our proposed method achieves significantimprovement over prior methods that use the PCNet alone. a r X i v : . [ c s . I R ] J u l e also compare our proposed network-based method with astate-of-the-art text-based method for computing legal documentsimilarity using document embeddings [6]. We observe that theproposed network-based method can give complimentary insightscompared to what is given by the text-similarity method. Com-bining the two is a promising way of estimating legal documentsimilarity from multiple aspects.To our knowledge, this is the first work that proposes a net-work to capture all domain information inherent in both statutesand precedents (the two main pillars of a Common Law system)and shows its utility in capturing the similarity of two legal doc-uments. Also note that, though we have focused on Indian legaldocuments, our method can be extended to any jurisdiction thatdefines statutes/codes in their judicial system (e.g., France [7]). Existing network-based similarity methods construct a
PrecedentCitation Network (PCNet) in which the vertices are case documents,and there is a directed edge d → d if document d cites anotherdocument d . The greyed box in Figure 1 shows PCNet for a smallexample. Following are the existing similarity measures applied onPCNet for finding legal document similarity: • Bibliographic Coupling [3] : It is defined as the
Jaccard simi-larity index between the sets of precedent citations (out-citations)from the two documents whose similarity is to be inferred. • Co-citation [3] : Similar to bibliographic coupling, but it is de-fined on the sets of in-citations from the two documents. • Dispersion [8] : This measure measures to what extent the out-neighbours (out-citation documents) of the two documents arethemselves similar, i.e., occurs in the same community/cluster. Weuse the
NetworkX implementation for this measure. We now describe how we augment PCNet using information fromthe legal statutes, to obtain Hier-SPCNet (Hierarchical Statute andPrecedent Citation Network – shown in Figure 1), and how we useHier-SPCNet for legal document similarity.
Modeling the hierarchy of statutes:
In most common law coun-tries, an act has its own hierarchy. For instance, in the Indian judi-ciary, an act can be divided into ‘parts’; each ‘part’ can be dividedinto ‘chapters’; each ‘chapter’ can be further divided into ‘topics’;under a ‘topic’ are finally ‘sections’/‘articles’. An example of theAct → Part → Chapter → Topic → Section/Article hierarchy is –
Constitution of India, 1950 → Part VI: The States → Chapter III: TheState Legislature → Topic: Disqualification of members → Section192: Decision on questions as to disqualification of members.
Some-times, for smaller acts, parts of this hierarchy may not be explicitlyspecified. For instance, we may have sections/articles directly under https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.algorithms.centrality.dispersion.html Figure 1: The proposed heterogeneous network Hier-SPCNet consisting of case documents and statutes. Existingmethods have considered only PCNet (greyed box). act part p chapter c topic s s i s j topic t s k act part b s m s n d d d d d d PCNet an act. An example is –
Dowry Prohibition Act, 1961 → Section 3:Penalty for giving or taking dowry.
For construction of Hier-SPCNet, we extract the hierarchy fromthe text of the statutes, and then represent each act as a hierarchicalstructure of nodes (act / parts / chapters / topics / sections) andhierarchy links. Figure 1 shows a pictorial representation of anact having the complete hierarchy ( act ) and another act having asmaller hierarchy ( act ). Extraction of citations from text:
Extracting statute/precedentcitations from legal text is non-trivial, since the citations are writtenin various forms. We extract the citations using regular expression-based patterns, e.g., the pattern < [ section or article number ] of the [ Act ] > is used to extract citations such as ‘Section 47 of the Code ofCriminal Procedure, 1973’ . An internal evaluation showed that thismethodology correctly extracts more than 90% of all citations thatare identified by human annotators (details omitted for brevity). Hier-SPCNet:
The network consists of six (6) types of nodes – case documents, acts, parts, chapters, topics, sections (or arti-cles). Also there are two types of links/edges – hierarchy links (orange, solid lines in Figure 1) and citation links (blue, dotted linesin Figure 1). The types of edges are described below. • Citation edges:
These edges are of three types. (1) document → document : if one document cites another document. These edgesare the ones in PCNet (the grey coloured box in Figure 1). Existingmethods have considered only this network. (2) document → statute :if a document cites a statute. For example, in Figure 1, document d cites section s i of act . A document can also cite an act as a whole,without referring to a particular section, e.g. document d cites act . (3) statute → statute : if a statute cites another statute. Note thatthe two statutes can be part of the same or different Acts, e.g., inFigure 1, statute s k of act cites statute s n of act . • Hierarchy edges:
The hierarchy links (shown as orange, solidarrows in Fig. 1) represent the hierarchy within each Act, as de-scribed in Section 3.1. These edges can be of various types, such as act → part (e.g., part p is under act in Fig. 1), act → chapter , part → section (e.g., in act , sections s m and s n are under a part b ), topic → section (e.g., s i and s j are under topic s under act ), and so on.Note that, as stated in Section 3.1, all levels of the hierarchy maynot exist uniformly in all the Acts. .2 Document similarity using Hier-SPCNet The existing measures of bibliographic coupling, co-citation anddispersion (see Section 2) can be applied over Hier-SPCNet, simi-lar to how they are applied over PCNet. However, when appliedover Hier-SPCNet, these measures also include statute information,e.g. bibliographic coupling over Hier-SPCNet finds the number ofcommon citations to prior cases as well as to statutes.Additionally, we apply graph embedding techniques Node2Vec [2]and Metapath2Vec [1] over Hier-SPCNet. These embedding tech-niques map the nodes of the graph to a vector space, such that nodeshaving similar neighbourhoods in the network have similar repre-sentations (embeddings). We then compute the cosine similaritybetween these node embeddings to estimate the similarity be-tween the documents (nodes).
Node2Vec [2] : Given a network, Node2vec generates node embed-dings (vectors) via random walks, following Breadth-First Search(BFS) or Depth-First Search (DFS). We apply Node2Vec on both PC-Net and Hier-SPCNet. Note that Node2vec assumes a network tobe homogeneous (all nodes and edges of same type). While PCNet isactually homogeneous, Hier-SPCNet is not; however, Hier-SPCNetis also considered homogeneous when applying Node2vec.
Metapath2Vec [1] : Metapath2Vec is meant for heterogeneous net-works, where nodes are of different types and the edges have differ-ent semantics. The basic working mechanism is similar to Node2Vec,but while Node2Vec uses standard BFS/DFS, Metapath2vec workson certain user-defined metapaths . A metapath is a path betweentwo nodes where the edges can have different semantics. For Hier-SPCNet, we define
14 different metapaths to capture situationswhere two documents cite the same or related statutes, wherebysome signal of similarity between the documents can be inferred. Some of the metapaths we defined are as follows: • doc-sec-doc : when two documents cite the same section/article.E.g., in Figure 1, documents d and d cite the same section s j . • doc-sec-topic-sec-doc : when two documents cite different sec-tions/articles, and the sections are under the same topic. E.g., inFig. 1, document d cites section s j and d cites s i and both s i and s j are under the same topic topic s . • doc-sec-topic-chap-topic-sec-doc : when two documents citedifferent sections, and the sections are under the same chapter. E.g.,in Fig. 1, d cites section s j and d cites s k , and s i and s k are underdifferent topics under the same chapter c of act . • doc-doc-doc : when two documents cite a common document.This is the standard precedent citation, which is the only metapathused when applying Metapath2vec over PCNet.Descriptions of the 10 other metapaths are omitted for brevity. We now describe the experiments to compare performance on vari-ous network-based methods over PCNet and Hier-SPCNet. We used the Node2vec implementation at https://github.com/aditya-grover/node2vecwith embedding size of and other hyperparamaters set to default. We used the implementation of Metapath2vec from https://pypi.org/project/stellargraph/, with walk length of 5, number of random walks per root node of 2000,embedding size of 200, and other hyperparameters set to default.
Dataset used: publiclyavailable full texts, and did not use any proprietary information.The Hier-SPCNet used for the experiments, consists of 1 , , ,
309 edges in the network. The PCNet contains thesame 1 ,
806 case documents as nodes and 542 citation edges amongthe documents.
Developing gold standard for document similarity:
For eval-uating methods for legal document similarity, we need a gold stan-dard consisting of similarity scores given by legal experts for a setof document-pairs. To this end, two legal experts were asked to an-notate the similarity of 100 document-pairs. Each expert assigned asimilarity score in the range [ . , . ] to each document-pair, where0 . . Evaluation metric:
For evaluating the performance of a partic-ular similarity computation method, we use Pearson correlationcoefficient ( ρ ) between the mean expert similarity scores and thesimilarity values inferred by the said method, on the 100 document-pairs. This metric has been used in multiple prior works on legaldocument similarity [3, 4, 6]. Table 1 shows the performance of various network-based methodson both PCNet and Hier-SPCNet. All the methods show statisticallysignificant (by Student’s T-Test at 95%, p < .
05) improvementwhen applied over Hier-SPCNet, as compared to when applied overPCNet, except for co-citation. The value of co-citation remains thesame for both networks since it depends on the common in-citations ,and in-citations of documents are same in PCNet and Hier-SPCNet(since no document is cited by a statute). Especially, a higher valueof bibliographic coupling over Hier-SPCNet highlights the fact that,for accurately estimating legal document similarity, it is importantto consider citations to not only common prior-cases but also tocommon statutes.Also, there is substantial improvement for Node2Vec based simi-larity for Hier-SPCNet. Although Node2Vec considers the graph tobe homogeneous, including the hierarchical structure of statutesover PCNet helps, since the leaf nodes, i.e., the section nodes arestructurally similar.The best performance is observed using Metapath2vec over Hier-SPCNet (correlation of 0 .
674 with mean expert similarity score),which is able to well capture document similarity through the Senior law students from the Rajiv Gandhi School of Intellectual Property Law, India able 1: Pearson correlation coefficient ( ρ ) with mean expertsimilarity score, for similarity values inferred by variousmethods over the two networks. Proposed Hier-SPCNet en-ables statistically significantly better inference of similaritythan PCNet (by Student’s T-Test at 95%).Method ρ over PCNet ρ over Hier-SPCNet Bibliographic Coupling 0.279 0.574Co-citation 0.221 0.221Dispersion 0.229 0.287Node2Vec 0.448 0.586Metapath2Vec 0.215 metapaths among the nodes. Thus, we have effectively encoded thelegal knowledge inherent in the statutes though hiearchical andcitation links by defining the metapath schemas.
Apart from network-based similarity, important signals for legaldocument similarity also come from the textual content of legaldocuments [4, 6]. In this section, we compare the network-basedand text-based methods for legal document similarity.We consider a text-based similarity method using documentembeddings (Doc2Vec), that has been shown to estimate legal doc-ument similarity better than many other methods [6]. Followingthe methodology in [6], we trained a Doc2Vec model on a large setof Indian Supreme Court case judgments (which do not contain thedocuments in our evaluation set of 100 document pairs). We theninfer Doc2vec embeddings for the document pairs in our evaluationset, and compute cosine similarity between the embeddings of thedocuments in each pair.
Comparing network-based and text-based similarity:
The text-based method (Doc2vec) achieves a correlation of 0 .
734 with themean expert similarity score (see Table 2), which is slightly betterthan the correlation of 0 .
674 achieved by the network-based method(Metapath2vec over Hier-SPCNet). The difference is not statisticallysignificant ( p = .
34) by paired Student’s t-test at 95% . In fact, for58 out of the 100 document-pairs, the similarity estimated by thenetwork-based method is numerically closer to the mean expert sim-ilarity score than the similarity estimated by the text-based method,while for the other 42 document-pairs, the text-based similarity iscloser to the mean expert similarity score.We observed the document-pairs for which the text-based simi-larity performs better (i.e., is closer to the mean expert similarityscore), and the document-pairs for which the network-based similar-ity performs better. We discuss below one example document-paireach of the two types.For the document pair 1972_31 and 1984_115, both documentsare about reservation in admission to medical colleges, and theexperts have assigned a high mean similarity score of 0 .
85. Thelegal issues of contention are somewhat different – while in 1972_31the admission criteria considers ‘reservation for backward classes’,in 1984_115 the criteria in argument is ‘domicile’. Hence, there aredifferences in the text, which leads to a moderate textual similarityof 0 .
44. With respect to the statutes cited, 1984_115 cites the ‘Public
Table 2: Pearson correlation coefficient ( ρ ) with mean expertsimilarity score, for a text-based method [6], the proposednetwork-based method, and combinations of the two. Noneof the pairwise differences in ρ is statistically significant(paired Student’s T-test at 95%).Method ρ Network-based (Metapath2vec on Hier-SPCNet) 0.674Text-based (Doc2Vec) 0.734max (text, network) average (text, network) 0.754Employment Requirement as to Residence Act, 1957’ that cites‘Article 16 of the Constitution of India’ which is in turn cited by1972_31. This follows one of our metapaths ‘doc-act-sec-doc’. Also,both the documents cite other articles that are either the same(metapath: ‘doc-sec-doc’) or are under the same part (metapath:‘doc-sec-part-sec-doc’) or under the same act (metapath: ‘doc-sec-act-sec-doc’). As a result, Metapath2vec over Hier-SPCNet estimatesa high similarity of 0 .
73 that is much closer to the mean expertsimilarity score of 0 . explanation to the measured similarity (elucidated by the examplesabove) which was duly appreciated by our legal experts. Combining network-based and text-based similarity:
The abovediscussion shows that the text-based and network-based methodscomplement each other. Hence, a combination of these two metricsseems promising. We tried some simple combinations using thefunctions average (a pair gets the similarity value which is an aver-age of the text-based and network-based similarity values) and max (a pair gets either the text-based similarity or the network-basedsimilarity, whichever is maximum). The results, shown in Table 2,support the idea that combining network-based and text-basedmeasures can be beneficial, since the two methods probably cap-ture complementary signals of legal document similarity. Devisingbetter methods of combination is left as future work.
In this work, we achieved significantly better estimation of similar-ity between legal documents, by developing a hierarchical network(Hier-SPCNet) comprising of the hierarchy of statutes, and thenapplying network embedding methods. To our knowledge, this isthe first attempt to computationally model the legal domain knowl-edge inherent in the statutes, to measure legal document similarity.Our method would be applicable for any other jurisdiction thatdefines a hierarchy of statutes [7]. As a future work, we wouldlike to develop better techniques for combining network-based andtext-based similarity for legal documents.
Acknowledgements:
The authors thank the law students whohelped in developing the gold standard data. The research is par-tially supported by SERB, Government of India, through the project‘NYAYA: A Legal Assistance System for Legal Experts and the Com-mon Man in India’. P. Bhattacharya is supported by a Fellowshipfrom Tata Consultancy Services. EFERENCES [1] Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scal-able representation learning for heterogeneous networks. In
Proc. ACM SIGKDD .[2] Aditya Grover and Jure Leskovec. 2016. Node2Vec: Scalable Feature Learning forNetworks. In
Proc. ACM SIGKDD .[3] Sushanta Kumar, P Krishna Reddy, V Balakista Reddy, and Aditya Singh. 2011.Similarity analysis of legal judgments. In
Proc. ACM India COMPUTE Conference .[4] Sushanta Kumar, P Krishna Reddy, V Balakista Reddy, and Malti Suri. 2013. SimilarLegal Judgements under Common Law System. In
International Workshop onDatabases in Networked Information Systems . [5] Bang Liu, Di Niu, Haojie Wei, Jinghong Lin, Yancheng He, Kunfeng Lai, and Yu Xu.2019. Matching Article Pairs with Graphical Decomposition and Convolutions. In
Proc. ACL .[6] Arpan Mandal, Raktim Chaki, Sarbajit Saha, Kripabandhu Ghosh, Arindam Pal, andSaptarshi Ghosh. 2017. Measuring similarity among legal court case documents.In
Proc. ACM India COMPUTE Conference .[7] Pierre Mazzega, Danièle Bourcier, and Romain Boulet. 2009. The Network ofFrench Legal Codes. In
Proc. Int’l Conf on Artificial Intelligence and Law (ICAIL) .[8] Akshay Minocha, Navjyoti Singh, and Arjit Srivastava. 2015. Finding RelevantIndian Judgments using Dispersion of Citation Network. In
Proc. World Wide Web ..