[PDF] AI-lead Court Debate Case Investigation

Abstract

The multi-role judicial debate composed of the plaintiff, defendant, and judge is an important part of the judicial trial. Different from other types of dialogue, questions are raised by the judge, The plaintiff, plaintiff's agent defendant, and defendant's agent would be to debating so that the trial can proceed in an orderly manner. Question generation is an important task in Natural Language Generation. In the judicial trial, it can help the judge raise efficient questions so that the judge has a clearer understanding of the case. In this work, we propose an innovative end-to-end question generation model-Trial Brain Model (TBM) to build a Trial Brain, it can generate the questions the judge wants to ask through the historical dialogue between the plaintiff and the defendant. Unlike prior efforts in natural language generation, our model can learn the judge's questioning intention through predefined knowledge. We do experiments on real-world datasets, the experimental results show that our model can provide a more accurate question in the multi-role court debate scene.

Full PDF

AAI-lead Court Debate Case Investigation

Changzhen Ji

Harbin Institute of TechnologyHarbin, Heilongjiang, [email protected]

Xin Zhou

Alibaba GroupHangzhou, Zhejiang, [email protected]

Conghui Zhu

Harbin Institute of TechnologyHarbin, Heilongjiang, [email protected]

Tiejun Zhao

Harbin Institute of TechnologyHarbin, Heilongjiang, [email protected]

ABSTRACT

TBM ) to build a Trial Brain,it can generate the questions the judge wants to ask through thehistorical dialogue between the plaintiff and the defendant. Unlikeprior efforts in natural language generation, our model can learnthe judge’s questioning intention through predefined knowledge.We do experiments on real-world datasets, the experimental resultsshow that our model can provide a more accurate question in themulti-role court debate scene.

CCS CONCEPTS • Computer systems organization → Embedded systems ; Re-dundancy ; Robotics; •

Networks → Network reliability.

KEYWORDS

Natural Language Generation, multi-role, Trial Brain

The contradiction between the gradual increase of people’s de-mands in pursuing social justice and relatively scarce public re-sources is one of the prominent contradictions in the current so-ciety. In a legal context, a lengthy and expertise-demanding trialcan be a high threshold for a litigant, while the judge has to spendsignificant efforts to investigate the case and explore exhaustivequestionable factors. This can be very challenging for junor judges,

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].

Table 1: Example Dialog in Court Debate DatasetRole Dialogue ... ...Judge Defendant, is there any evidence to provide tothe court?Defendant No.Judge Plaintiff, what’s your relationship with ?Plaintiff Friends.Judge What did borrow money for at thattime?Plaintiff To operate a supermarket.Defendant I heard the people say they were in the samecircle.Judge Which circle?Defendant Gambling circle.Judge Plaintiff, do you know whether par-ticipated in gambling?Plaintiff I don’t know. I’m not with him.Judge When borrowed money from you, didyou agree that it was his personal debt?Plaintiff No.Judge Defendant, who paid your living expenses with?Defendant It is my expenditure.Judge Who pays for the family expenses?Defendant Me.... ...while a careless negligence can bring unforgivable consequences.Unfortunately, federal/district court judges are experiencing daunt-ing workload, e.g., statistics show that the typical active federaldistrict court judge closed around 250 cases in a year [1, 2]. Applyingnovel artificial legal intelligence techniques to facilitate the lawsuitprocess so as to alleviate the information overload for judges is ofgreat significance.In a longer term, this pioneer investigation can provide criti-cal potentials to enable automatic trail processing system so as toimprove trial efficiency. For junior judges, such automation accom-panying can provide important assistance during an ongoing trialdebate or court rehearsal. a r X i v : . [ c s . C L ] N ov egalAI’20@SIGIR’20, July 30, 2020, Virtual Event, China Changzhen Ji, Xin Zhou, Conghui Zhu, and Tiejun Zhao Unlike existing one-to-one dialogue systems, court debate is con-ducted under multi-role scenario, and an experienced judge maypropose the next question with respect to the following factors.First, the prior debate context can be important to decide the con-tent of the forthcoming question (algorithmically, debate contextcharacterization can be essential for this task). Second, response’srole information is of significant, i.e., same content from differentparties can deliver very different information (algorithmically, roleinformation should be integrated into the debate context represen-tation learning). Third, judge’s intention can scope the content ofthe forthcoming question. For instance, the judge may raise severalrelated question (e.g., when did you get divorced? , who raisethe child? , who cover the daily household expense? ) inte- rms of the current intent (e.g., to confirm if the spouse shouldbe also responsible for the debt). After the judge collected enoughinformation of the current intent, he/she can switch to the nextintent (e.g., to check if there is actual agreement on the term of theloan).we propose an innovative end-to-end question generation modelto build a trial brain, it can generate the questions the judge wantsto ask through the historical dialogue between the plaintiff andthe defendant. Unlike prior efforts natural language generation,our model can learn the judge’s questioning intention throughpredefined knowledge. Fig. 1 depicts the systematic structure of theproposed model.To sum up, our contributions are as follows:(1) The proposed model is able to learn the judge intention tran-sition for question generation navigation through predefinedknowledge.(2) The proposed model can provide a more accurate questionin the multi-role court debate scene. Let 𝐷 denote an arbitrary dialogue fragment, containing 𝐿 utter-ances. Each utterance 𝑈 𝑖 in 𝐷 is composed of a sequence of 𝑙 words(namely sentence) 𝑆 𝑖 along with the associated role (of the speaker) 𝑟 𝑖 . We define the last question in 𝐷 raised by the judge as 𝑈 𝑞 and itshistorical conversations is denoted as 𝐷 − = { 𝑈 , 𝑈 , ..., 𝑈 𝑛 } . In thetask of question generation, the proposed algorithm can generate 𝑈 𝑞 given a corresponding 𝐷 − .To be clarified, the definition of important notations in the fol-lowing sections are illustrated as follows: • 𝐷 : a debate dialogue fragment containing 𝐿 utterances; • 𝑟 𝑖 : the role of the speaker in 𝑈 𝑖 (i.e. judge, plaintiff, defendantand witness); • 𝑆 𝑖 : the text content of 𝑈 𝑖 ; • 𝑆 𝑞 : the text content of 𝑈 𝑞 ; • 𝐷 − : the historical conversations of 𝑈 𝑞 ; • 𝐼 𝑖 : the intent of utterance 𝑈 𝑖 ;Note that U i , r i , S i , and I i represent the embedding representa-tions of the corresponding variables in the list. The innovative multi-view utterance encoder characterizes threekinds of information - role embedding, semantic embedding, andlegal knowledge embedding. More importantly, intention, as an important latent variable, navigates the direction of question gener-ation, which is coached by LKG knowledge transformation. Finally,we use pointer generator networks [9] further enhance the qualityof question generation.

In term of role embedding, we use densevectors to represent different roles (e.g., presiding judge, plaintiff,defendant and witness) in the debate dialogue. • Utterance Layer.

In the utterance layer, we utilize a Bidirec-tional Long-Short Term Memory networks (Bi-LSTM) [4] to encodethe semantics of the utterance while maintaining its syntactics. • Dialogue Layer.

To represent the global context in a dialogue,we use another Bi-LSTM to encode the dependency between utter-ances to obtain a global representation of an utterance as dialoguerepresentation, denoted as 𝑋 . Legal knowledge is an ele-ment marked by the judge, such as (borrowing time, loan amount.....).We also use dense vectors to represent different element, and thenencode it by LSTM, expressed as h p . When judge construct the forthcoming question, he/she should con-sider three kinds of information: (1) the intent of the question, (2)the content of the question, and (3) the litigant role being asked. Itmotivates us to learn the intent transition to navigate the forthcom-ing question generation. To represent the intent of judge, we relyon the Legal Knowledge to learn the navigation among differentsequence of legal concepts (see Eq. 1). At the same time, we learna role transfer matrix to represent the role to be asked, in otherwords, the role who answer the generated question (see Eq. 2). Notethat the role used in Sec. 3.1.1 is speaker’s role of the correspondingutterance while the role mentioned here is the responser’s role ofthe current utterance I = 𝜎 ( 𝑘 𝐼 ∗ h p ) (1) R = 𝜎 ( 𝑘 𝑅 ∗ r ) (2)The two parameters 𝑘 𝐼 and 𝑘 𝑅 stands for the learnable hiddenmatrix for simulating the intent transfer and the response-roletransfer respectively, in which the matrix elements are all valuesbetween 0 and 1.We merge intention information and next role information asbelow: H = ([ I , R ] , [ I , R ] , ..., [ I i , R i + ]) (3)where R i + represents the next role of the current utterance. We mainly focus on the response-role of judge’s question since commonly the nextrole of litigant’s answer will be always the judge.2

I-lead Court Debate Case Investigation LegalAI’20@SIGIR’20, July 30, 2020, Virtual Event, China

Multi-Role Dialogue Encoder

Intent Navigation B I- L S T M B I- L S T M A tt e n ti on n i A tt e n ti on … A tt e n ti on XY XY

X-Y

X*Y

LKG aware

Dialog Representation

Intent

Representation concatenate k I Legal

Knowledge element-wise product Ir i r i …r i 𝒘 𝒊𝟏 𝒘 𝒊𝟐 𝒘 𝒊𝒋 … r k R Loan amount

Figure 1: Network Architecture of the Proposed Method

We further compress the original redundant information andassign more weight to important information via attention mecha-nism: Y = 𝑛 ∑︁ 𝑖 = 𝑒𝑥𝑝 ( I i ) (cid:205) 𝑛𝑖 = 𝑒𝑥𝑝 ( I i ) ∗ H (4)Next, we fuse the original information with the intent/role trans-formation information: Z = [ X , Y , X ∗ Y , X − Y ] (5) In question generation learning, for each dialog 𝐷 , we use cross-entropy to formulate the problem as follows: 𝑙𝑜𝑠𝑠 = − log 𝑃 ( 𝑆 𝑞 | 𝐷 ) = − 𝑙 ∑︁ 𝑗 = log 𝑃 ( 𝑤 𝑖 𝑗 | 𝑤 𝑖 𝑗 − , 𝐷 ) Denoting all the parameters in our model as 𝛿 . Therefore, weobtain the following optimized objective function: 𝑚𝑖𝑛 𝜃 𝑙𝑜𝑠𝑠 = 𝑙𝑜𝑠𝑠 + 𝜆 ∥ 𝛿 ∥ (6)To minimize the objective function, we use the diagonal variantof Adam in [11]. At time step 𝑡 , the parameter 𝛿 is updated asfollows: 𝛿 𝑡 ← 𝛿 𝑡 − − 𝜇 √︃(cid:205) 𝑡𝑖 = 𝑓 𝑖 𝑓 𝑡 (7)where 𝜇 is the initial learning rate and 𝑓 𝑡 is the sub-gradient at time 𝑡 . According to the evaluation results on the development set, allthe hyperparameters are optimized on the training set. In the experiment, we collected 136 ,

019 court debate records of civilPrivate Loan Disputes cases, from which we randomly extracted ,

650 continuous dialogue fragments as independent samples fortraining, developing and testing . In total, it contains more than 4million sentences and each dialogue fragment, on average, contains13 .

38 sentences. The details of the dataset is illustrated in Table 2.

Table 2: Statistics of the Processed Dialogue Fragments

Dataset

Total 302,650 4,048,659 13.38

In order to demonstrate the validity of our model, we selected sometraditional classical methods and the latest mainstream methodsfor text generation. The tested baselines are illustrated as follows: We only selected the fragments in which there are at least five historical utterancesas context for our task of next question generation. The entire dataset is divided by a ratio of 8:1:1 for training, developing and testing,respectively.3 egalAI’20@SIGIR’20, July 30, 2020, Virtual Event, China Changzhen Ji, Xin Zhou, Conghui Zhu, and Tiejun Zhao

Table 3: Main Results of All Test Methods. Note that the re-sults show in TBM(our) rows are statistically significant dif-ferent from the corresponding value of all the baseline mod-els ( 𝑝 -value < . ).Model R.-1 R.-2 R.-3 R.-L BLEU LSTM 29.33 14.97 10.34 26.65 11.59ByteNet 35.57 19.47 14.27 32.56 17.73ConvS2S 36.35 20.42 15.98 33.03 17.97S2S+attention 36.54 20.72 16.40 33.29 17.96PGN 37.67 21.93 17.42 34.39 18.75Transformer 37.59 23.26 18.71 35.38 18.58

TBM(our) 39.02 24.56 21.03 38.12 24.17 • LSTM [4]: We replace all bidirectional LSTM with LSTM inour proposed model. • ConvS2S [3]: LSTM be replaced CNN in the encoder. • ByteNet [5]: It is a one-dimensional convolutional neuralnetwork that is composed of two parts, one to encode thesource sequence and the other to decode the target sequence. • S2S+attention [7]: The Seq2Seq framework relies on the encoder-decoder paradigm. The encoder encodes the input sequence,while the decoder produces the target sequence. Attentionmechanism is added to force the model to learn to focus onspecific parts of the input sequence when decoding. • PGN [9]: It is another commonly used framework for textgeneration which enables copy mechanism to aid accuratereproduction of information, while retaining the ability toproduce novel words through the generator. • Transformer [10]: A neural network architecture based onself-attention mechanism.

To automatically assess the quality of the generated question , weused ROUGE [6] and BLEU [8] scores to compare different models.We report ROUGE-1, ROUGE-2, ROUGE-3 as the means of assessinginformativeness and ROUGE-L as well as BLEU-4 for assessingfluency.

The performance of all tested methods is reported in Table 3. Asthe upper part of Table 3 shows, the proposed method

TBM issignificantly ( 𝑝 -value < . Dialogue generation has been well studied in NLP. At present, someachievements have been made. However, it is also faced with greatchallenges. Many superior models only achieve good results inspecific field. In this paper, we define a new task of the questiongeneration in judicial trial. The proposed Trial Brain Model canlearn the judge’s questioning intention through predefined knowl-edge. Judge-centered debate context heterogeneity is the landmarkof this model, i.e., a delicately designed multi-role dialogue en-coding mechanism via Legal Knowledge with the representation

Figure 2: The performance of tested methods. enhancement through intent navigation by simulating the inten-tion switch across different conversations. The empirical findingsvalidate the hypothesis of this bionic design for judge logic reduc-tion. An extensive set of experiments with a large civil trial datasetshows that the proposed model can generate more accurate andreadable questions against several alternatives in the multi-rolecourt debate scene.

ACKNOWLEDGMENTS

This work is supported by National Key R&D Program of China(2018YFC0830200;2018YFC0830206).

REFERENCES [1] Judge Information Center. [n. d.]. As Workloads Rise in Federal Courts, JudgeCounts Remain Flat. available from https://trac.syr.edu/tracreports/judge/364/.Accessed: 2019-05-06.[2] Judge Information Center. [n. d.]. Some Federal Judges Handle InordinateCaseloads. available from https://trac.syr.edu/tracreports/judge/501/. Accessed:2019-05-06.[3] Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin.2017. Convolutional sequence to sequence learning. JMLR. org, 1243–1252.[4] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.

Neuralcomputation

9, 8 (1997), 1735–1780.[5] Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, AlexGraves, and Koray Kavukcuoglu. 2016. Neural machine translation in linear time. arXiv preprint arXiv:1610.10099 (2016).[6] Chin-Yew Lin and Eduard Hovy. 2003. Automatic evaluation of summaries usingn-gram co-occurrence statistics.[7] Ramesh Nallapati, Bowen Zhou, Caglar Gulcehre, Bing Xiang, et al. 2016. Ab-stractive text summarization using sequence-to-sequence rnns and beyond. arXivpreprint arXiv:1602.06023 (2016).[8] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: amethod for automatic evaluation of machine translation. Association for Compu-tational Linguistics, 311–318.[9] Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point:Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017).[10] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is allyou need. 5998–6008.[11] Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXivpreprint arXiv:1212.5701arXivpreprint arXiv:1212.5701