AI-lead Court Debate Case Investigation
AAI-lead Court Debate Case Investigation
Changzhen Ji
Harbin Institute of TechnologyHarbin, Heilongjiang, [email protected]
Xin Zhou
Alibaba GroupHangzhou, Zhejiang, [email protected]
Conghui Zhu
Harbin Institute of TechnologyHarbin, Heilongjiang, [email protected]
Tiejun Zhao
Harbin Institute of TechnologyHarbin, Heilongjiang, [email protected]
ABSTRACT
The multi-role judicial debate composed of the plaintiff, defendant,and judge is an important part of the judicial trial. Different fromother types of dialogue, questions are raised by the judge, Theplaintiff, plaintiff’s agent defendant, and defendant’s agent wouldbe to debating so that the trial can proceed in an orderly manner.Question generation is an important task in Natural LanguageGeneration. In the judicial trial, it can help the judge raise efficientquestions so that the judge has a clearer understanding of thecase. In this work, we propose an innovative end-to-end questiongeneration model-Trial Brain Model (
TBM ) to build a Trial Brain,it can generate the questions the judge wants to ask through thehistorical dialogue between the plaintiff and the defendant. Unlikeprior efforts in natural language generation, our model can learnthe judge’s questioning intention through predefined knowledge.We do experiments on real-world datasets, the experimental resultsshow that our model can provide a more accurate question in themulti-role court debate scene.
CCS CONCEPTS • Computer systems organization → Embedded systems ; Re-dundancy ; Robotics; •
Networks → Network reliability.
KEYWORDS
Natural Language Generation, multi-role, Trial Brain
The contradiction between the gradual increase of people’s de-mands in pursuing social justice and relatively scarce public re-sources is one of the prominent contradictions in the current so-ciety. In a legal context, a lengthy and expertise-demanding trialcan be a high threshold for a litigant, while the judge has to spendsignificant efforts to investigate the case and explore exhaustivequestionable factors. This can be very challenging for junor judges,
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].
LegalAI’20@SIGIR’20, July 30, 2020, Virtual Event, China © 2020 Association for Computing Machinery.ACM ISBN TBA...$15.00https://doi.org/TBA
Table 1: Example Dialog in Court Debate DatasetRole Dialogue ... ...Judge Defendant, is there any evidence to provide tothe court?Defendant No.Judge Plaintiff, what’s your relationship with
In term of role embedding, we use densevectors to represent different roles (e.g., presiding judge, plaintiff,defendant and witness) in the debate dialogue. • Utterance Layer.
In the utterance layer, we utilize a Bidirec-tional Long-Short Term Memory networks (Bi-LSTM) [4] to encodethe semantics of the utterance while maintaining its syntactics. • Dialogue Layer.
To represent the global context in a dialogue,we use another Bi-LSTM to encode the dependency between utter-ances to obtain a global representation of an utterance as dialoguerepresentation, denoted as 𝑋 . Legal knowledge is an ele-ment marked by the judge, such as (borrowing time, loan amount.....).We also use dense vectors to represent different element, and thenencode it by LSTM, expressed as h p . When judge construct the forthcoming question, he/she should con-sider three kinds of information: (1) the intent of the question, (2)the content of the question, and (3) the litigant role being asked. Itmotivates us to learn the intent transition to navigate the forthcom-ing question generation. To represent the intent of judge, we relyon the Legal Knowledge to learn the navigation among differentsequence of legal concepts (see Eq. 1). At the same time, we learna role transfer matrix to represent the role to be asked, in otherwords, the role who answer the generated question (see Eq. 2). Notethat the role used in Sec. 3.1.1 is speaker’s role of the correspondingutterance while the role mentioned here is the responser’s role ofthe current utterance I = 𝜎 ( 𝑘 𝐼 ∗ h p ) (1) R = 𝜎 ( 𝑘 𝑅 ∗ r ) (2)The two parameters 𝑘 𝐼 and 𝑘 𝑅 stands for the learnable hiddenmatrix for simulating the intent transfer and the response-roletransfer respectively, in which the matrix elements are all valuesbetween 0 and 1.We merge intention information and next role information asbelow: H = ([ I , R ] , [ I , R ] , ..., [ I i , R i + ]) (3)where R i + represents the next role of the current utterance. We mainly focus on the response-role of judge’s question since commonly the nextrole of litigant’s answer will be always the judge.2
I-lead Court Debate Case Investigation LegalAI’20@SIGIR’20, July 30, 2020, Virtual Event, China
Multi-Role Dialogue Encoder
Intent Navigation B I- L S T M B I- L S T M A tt e n ti on n i A tt e n ti on … A tt e n ti on XY XY
X-Y
X*Y
LKG aware
Dialog Representation
Intent
Representation concatenate k I Legal
Knowledge element-wise product Ir i r i …r i 𝒘 𝒊𝟏 𝒘 𝒊𝟐 𝒘 𝒊𝒋 … r k R Loan amount
Figure 1: Network Architecture of the Proposed Method
We further compress the original redundant information andassign more weight to important information via attention mecha-nism: Y = 𝑛 ∑︁ 𝑖 = 𝑒𝑥𝑝 ( I i ) (cid:205) 𝑛𝑖 = 𝑒𝑥𝑝 ( I i ) ∗ H (4)Next, we fuse the original information with the intent/role trans-formation information: Z = [ X , Y , X ∗ Y , X − Y ] (5) In question generation learning, for each dialog 𝐷 , we use cross-entropy to formulate the problem as follows: 𝑙𝑜𝑠𝑠 = − log 𝑃 ( 𝑆 𝑞 | 𝐷 ) = − 𝑙 ∑︁ 𝑗 = log 𝑃 ( 𝑤 𝑖 𝑗 | 𝑤 𝑖 𝑗 − , 𝐷 ) Denoting all the parameters in our model as 𝛿 . Therefore, weobtain the following optimized objective function: 𝑚𝑖𝑛 𝜃 𝑙𝑜𝑠𝑠 = 𝑙𝑜𝑠𝑠 + 𝜆 ∥ 𝛿 ∥ (6)To minimize the objective function, we use the diagonal variantof Adam in [11]. At time step 𝑡 , the parameter 𝛿 is updated asfollows: 𝛿 𝑡 ← 𝛿 𝑡 − − 𝜇 √︃(cid:205) 𝑡𝑖 = 𝑓 𝑖 𝑓 𝑡 (7)where 𝜇 is the initial learning rate and 𝑓 𝑡 is the sub-gradient at time 𝑡 . According to the evaluation results on the development set, allthe hyperparameters are optimized on the training set. In the experiment, we collected 136 ,
019 court debate records of civilPrivate Loan Disputes cases, from which we randomly extracted ,
650 continuous dialogue fragments as independent samples fortraining, developing and testing . In total, it contains more than 4million sentences and each dialogue fragment, on average, contains13 .
38 sentences. The details of the dataset is illustrated in Table 2.
Table 2: Statistics of the Processed Dialogue Fragments
Dataset
Total 302,650 4,048,659 13.38
In order to demonstrate the validity of our model, we selected sometraditional classical methods and the latest mainstream methodsfor text generation. The tested baselines are illustrated as follows: We only selected the fragments in which there are at least five historical utterancesas context for our task of next question generation. The entire dataset is divided by a ratio of 8:1:1 for training, developing and testing,respectively.3 egalAI’20@SIGIR’20, July 30, 2020, Virtual Event, China Changzhen Ji, Xin Zhou, Conghui Zhu, and Tiejun Zhao
Table 3: Main Results of All Test Methods. Note that the re-sults show in TBM(our) rows are statistically significant dif-ferent from the corresponding value of all the baseline mod-els ( 𝑝 -value < . ).Model R.-1 R.-2 R.-3 R.-L BLEU LSTM 29.33 14.97 10.34 26.65 11.59ByteNet 35.57 19.47 14.27 32.56 17.73ConvS2S 36.35 20.42 15.98 33.03 17.97S2S+attention 36.54 20.72 16.40 33.29 17.96PGN 37.67 21.93 17.42 34.39 18.75Transformer 37.59 23.26 18.71 35.38 18.58
TBM(our) 39.02 24.56 21.03 38.12 24.17 • LSTM [4]: We replace all bidirectional LSTM with LSTM inour proposed model. • ConvS2S [3]: LSTM be replaced CNN in the encoder. • ByteNet [5]: It is a one-dimensional convolutional neuralnetwork that is composed of two parts, one to encode thesource sequence and the other to decode the target sequence. • S2S+attention [7]: The Seq2Seq framework relies on the encoder-decoder paradigm. The encoder encodes the input sequence,while the decoder produces the target sequence. Attentionmechanism is added to force the model to learn to focus onspecific parts of the input sequence when decoding. • PGN [9]: It is another commonly used framework for textgeneration which enables copy mechanism to aid accuratereproduction of information, while retaining the ability toproduce novel words through the generator. • Transformer [10]: A neural network architecture based onself-attention mechanism.
To automatically assess the quality of the generated question , weused ROUGE [6] and BLEU [8] scores to compare different models.We report ROUGE-1, ROUGE-2, ROUGE-3 as the means of assessinginformativeness and ROUGE-L as well as BLEU-4 for assessingfluency.
The performance of all tested methods is reported in Table 3. Asthe upper part of Table 3 shows, the proposed method
TBM issignificantly ( 𝑝 -value < . Dialogue generation has been well studied in NLP. At present, someachievements have been made. However, it is also faced with greatchallenges. Many superior models only achieve good results inspecific field. In this paper, we define a new task of the questiongeneration in judicial trial. The proposed Trial Brain Model canlearn the judge’s questioning intention through predefined knowl-edge. Judge-centered debate context heterogeneity is the landmarkof this model, i.e., a delicately designed multi-role dialogue en-coding mechanism via Legal Knowledge with the representation
Figure 2: The performance of tested methods. enhancement through intent navigation by simulating the inten-tion switch across different conversations. The empirical findingsvalidate the hypothesis of this bionic design for judge logic reduc-tion. An extensive set of experiments with a large civil trial datasetshows that the proposed model can generate more accurate andreadable questions against several alternatives in the multi-rolecourt debate scene.
ACKNOWLEDGMENTS
This work is supported by National Key R&D Program of China(2018YFC0830200;2018YFC0830206).
REFERENCES [1] Judge Information Center. [n. d.]. As Workloads Rise in Federal Courts, JudgeCounts Remain Flat. available from https://trac.syr.edu/tracreports/judge/364/.Accessed: 2019-05-06.[2] Judge Information Center. [n. d.]. Some Federal Judges Handle InordinateCaseloads. available from https://trac.syr.edu/tracreports/judge/501/. Accessed:2019-05-06.[3] Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin.2017. Convolutional sequence to sequence learning. JMLR. org, 1243–1252.[4] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.
Neuralcomputation
9, 8 (1997), 1735–1780.[5] Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, AlexGraves, and Koray Kavukcuoglu. 2016. Neural machine translation in linear time. arXiv preprint arXiv:1610.10099 (2016).[6] Chin-Yew Lin and Eduard Hovy. 2003. Automatic evaluation of summaries usingn-gram co-occurrence statistics.[7] Ramesh Nallapati, Bowen Zhou, Caglar Gulcehre, Bing Xiang, et al. 2016. Ab-stractive text summarization using sequence-to-sequence rnns and beyond. arXivpreprint arXiv:1602.06023 (2016).[8] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: amethod for automatic evaluation of machine translation. Association for Compu-tational Linguistics, 311–318.[9] Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point:Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017).[10] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is allyou need. 5998–6008.[11] Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXivpreprint arXiv:1212.5701arXivpreprint arXiv:1212.5701