Judicial knowledge-enhanced magnitude-aware reasoning for numerical legal judgment prediction

Bi, Sheng; Zhou, Zhiyao; Pan, Lu; Qi, Guilin

doi:10.1007/s10506-022-09337-4

Judicial knowledge-enhanced magnitude-aware reasoning for numerical legal judgment prediction

Original Research
Published: 02 November 2022

Volume 31, pages 773–806, (2023)
Cite this article

Artificial Intelligence and Law Aims and scope Submit manuscript

Sheng Bi^1,2,
Zhiyao Zhou¹,
Lu Pan³ &
…
Guilin Qi ORCID: orcid.org/0000-0003-0150-7236¹

824 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Legal Judgment Prediction (LJP) is an essential component of legal assistant systems, which aims to automatically predict judgment results from a given criminal fact description. As a vital subtask of LJP, researchers have paid little attention to the numerical LJP, i.e., the prediction of imprisonment and penalty. Existing methods ignore numerical information in the criminal facts, making their performances far from satisfactory. For instance, the amount of theft varies, as do the prison terms and penalties. The major challenge is how the model can obtain the ability of numerical comparison and magnitude perception, e.g., 400 < 500 < 800, 500 is closer to 400 than to 800. To this end, we propose a judicial knowledge-enhanced magnitude-aware reasoning architecture, called NumLJP, for the numerical LJP task. Specifically, we first implement a contrastive learning-based judicial knowledge selector to distinguish confusing criminal cases efficiently. Unlike previous approaches that employ the law article as external knowledge, judicial knowledge is a quantitative guideline in real scenarios. It contains many numerals (called anchors) that can construct a reference frame. Then we design a masked numeral prediction task to help the model remember these anchors to acquire legal numerical commonsense from the selected judicial knowledge. We construct a scale-based numerical graph using the anchors and numerals in facts to perform magnitude-aware numerical reasoning. Finally, the representations of fact description, judicial knowledge, and numerals are fused to make decisions. We conduct extensive experiments on three real-world datasets and select several competitive baselines. The results demonstrate that the macro-F1 of NumLJP improves by at least 9.53% and 11.57% on the prediction of penalty and imprisonment, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

Article Open access 24 August 2023

Deep learning modelling techniques: current progress, applications, advantages, and challenges

Article Open access 17 April 2023

Knowledge Graphs: Opportunities and Challenges

Article Open access 03 April 2023

Notes

These numerals may change as the legal system is reformed, but they are fixed over a considerable period. Therefore, we assume that these numerals are fixed.
Legal numerical commonsense indicates the judge’s knowledge of the numerical features in the fact description, such as the amount of property stolen, the number of drugs sold, etc. Each of these numerals has its own range and probability distribution.
Here the numeral vocabulary refers to all numerical anchors that appear in a same judicial knowledge.
https://github.com/thunlp/CAIL.
https://www.datafountain.cn/competitions/277.
Existing Chinese LJP datasets are usually divided in this manner.
https://github.com/j30206868/numnet-chinese.
https://pytorch.org
PLMs utilize called WordPiece tokenizer to split words either into the full forms or into word pieces Devlin et al. (2019).
the anchors of Theft are 1,000, 3,000, 30,000, 100,000, 300,000, 500,000.
https://www.statsmodels.org/
Among all hyperparameters, the learning rate lr, gradient clipping clipping, the weight of contrastive learning loss \(\lambda \), and the temperature \(\tau \) are set empirically following previous works, which are not repeated in this paper. \(N^t\) is the multiplier assigned for interval division, and we detail its setting principle in Section 4.3.1.
The comparison chain is ordered numerals in a numerical graph.
https://wenshu.court.gov.cn.

References

Amini A, Gabriel S, Lin S, Koncel-Kedziorski R, Choi Y, Hajishirzi H (2019) Mathqa: Towards interpretable math word problem solving with operation-based formalisms. In: NAACL, pp. 2357–2367
Bakalov A, Fuxman A, Talukdar PP, Chakrabarti S (2011) Scad: Collective discovery of attribute values. In: WWW, pp. 447–456
Baly R, Karadzhov G, Saleh A, Glass JR, Nakov P (2019) Multi-task ordinal regression for jointly predicting the trustworthiness and the leading political ideology of news media. In: NAACL-HLT, pp. 2109–2116
Banerjee S, Chakrabarti S, Ramakrishnan G (2009) Learning to rank for quantity consensus queries. In: SIGIR, pp. 243–250
Bi S, Huang Y, Cheng X, Wang M, Qi G (2019) Building chinese legal hybrid knowledge network. KSEM 11775:628–639
Google Scholar
Bi S, Cheng X, Chen J, Qi G, Wang M, Zhou Y, Wang L (2019) Dispute generation in law documents via joint context and topic attention. In: JIST, pp. 116–129
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Neural Inf Process Syst 33:1877–1901
Google Scholar
Cao W, Mirjalili V, Raschka S (2020) Rank consistent ordinal regression for neural networks with application to age estimation. Pattern Recognit Lett 140:325–331
Article Google Scholar
Chalkidis I, Androutsopoulos I, Aletras N (2019) Neural legal judgment prediction in English. In: ACL, pp. 4317–4323
Chen H, Cai D, Dai W, Dai Z, Ding Y (2019) Charge-based prison term prediction with deep gating network. In: EMNLP, pp. 6361–6366
Chen K, Xu W, Cheng X, Xiaochuan Z, Zhang Y, Song L, Wang T, Qi Y, Chu W (2020) Question directed graph attention network for numerical reasoning over text. In: EMNLP, pp. 6759–6768
Cheng X, Bi S, Qi G, Wang Y (2020) Knowledge-aware method for confusing charge prediction. NLPCC 12430:667–679
Google Scholar
Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL, pp. 4171–4186
Diaz R, Marathe A (2019) Soft labels for ordinal regression. In: CVPR, pp. 4738–4747
Dong Q, Niu S (2021) Legal judgment prediction via relational learning. In: SIGIR, pp. 983–992
Dua D, Wang Y, Dasigi P, Stanovsky G, Singh S, Gardner M (2019) DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In: NAACL, pp. 2368–2378
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378
Article Google Scholar
Ge J, Huang Y, Shen X, Li C, Hu W (2021) Learning fine-grained fact-article correspondence in legal cases. TASLP 29:3694–3706
Google Scholar
George TE, Epstein L (1992) On the nature of supreme court decision making. APSR 86(2):323–337
Article Google Scholar
Geva M, Gupta A, Berant J (2020) Injecting numerical reasoning skills into language models. In: ACL, pp. 946–958
Gunel B, Du J, Conneau A, Stoyanov V (2021) Supervised contrastive learning for pre-trained language model fine-tuning. In: ICLR
Guo Z, Zhang Y, Teng Z, Lu W (2019) Densely connected graph convolutional networks for graph-to-sequence learning. TACL 7:297–312
Article Google Scholar
Gutmann M, Hyvärinen A (2010) Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. AISTATS 9:297–304
Google Scholar
Hamilton WL, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: NeurIPS, pp. 1024–1034
Hu Z, Li X, Tu C, Liu Z, Sun M (2018) Few-shot charge prediction with discriminative legal attributes. In: COLING, pp. 487–498
Huang D, Shi S, Lin C, Yin J, Ma W (2016) How well do computers solve math word problems? large-scale dataset construction and evaluation. In: ACL
Huber PJ (1992) Robust estimation of a location parameter. In: Breakthroughs in Statistics, pp. 492–518
Hénaff OJ (2020) Data-efficient image recognition with contrastive predictive coding. ICML 119:4182–4192
Google Scholar
Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F (2021) A survey on contrastive self-supervised learning. Technologies 9(1):2
Article Google Scholar
Jiang C, Nian Z, Guo K, Chu S, Zhao Y, Shen L, Tu K (2019) Learning numeral embeddings. arXiv preprint arXiv:2001.00003
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. Neural Inf Process Syst, 33
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: ICLR
Kort F (1957) Predicting supreme court decisions mathematically: a quantitative analysis of the “right to counsel’’ cases. APSR 51(1):1–12
Article Google Scholar
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL, pp. 7871–7880
Li S, Zhang H, Ye L, Su S, Guo X, Yu H, Fang B (2020) Prison term prediction on criminal case description with deep learning. Comput Mater Contin 62(3):1217–1231
Google Scholar
Lin BY, Lee S, Khanna R, Ren X (2020) Birds have four legs?! numersense: Probing numerical commonsense knowledge of pre-trained language models. In: EMNLP, pp. 6862–6868
Liu YH, Chen YL, Ho WL (2015) Predicting associated statutes for legal problems. IPM 51(1):194–211
Google Scholar
Liu C-L, Chang C-T, Ho J-H (2004) Case instance generation and refinement for case-based criminal summary judgments in chinese. JISE, 783–800
Liu CL, Liao TM (2005) Classifying criminal charges in chinese for web-based legal services. In: APCCMI
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692
Luo B, Feng Y, Xu J, Zhang X, Zhao D (2017) Learning to predict charges for criminal cases with legal basis. In: EMNLP, pp. 2727–2736
Nie Y, Williams A, Dinan E, Bansal M, Weston J, Kiela D (2020) Adversarial NLI: A new benchmark for natural language understanding. In: ACL, pp. 4885–4901
Niu Z, Zhou M, Wang L, Gao X, Hua G (2016) Ordinal regression with multiple output CNN for age estimation. In: CVPR, pp. 4920–4928
Parikh N, Boyd SP (2014) Proximal algorithms. Found. Trends Optim. 1(3):127–239
Patel A, Bhattamishra S, Goyal N (2021) Are NLP models really able to solve simple math word problems? In: NAACL, pp. 2080–2094
Qin J, Lin L, Liang X, Zhang R, Lin L (2020) Semantically-aligned universal tree-structured solver for math word problems. In: EMNLP, pp. 3780–3789
Ran Q, Lin Y, Li P, Zhou J, Liu Z (2019) Numnet: Machine reading comprehension with numerical reasoning. In: EMNLP, pp. 2474–2484
Ribeiro MT, Wu T, Guestrin C, Singh S (2020) Beyond accuracy: Behavioral testing of NLP models with checklist. In: ACL, pp. 4902–4912
Robinson J.D, Chuang C, Sra S, Jegelka S (2021) Contrastive learning with hard negative samples. In: ICLR
Saha A, Joty SR, Hoi SCH (2021) Weakly supervised neuro-symbolic module networks for numerical reasoning. CoRR abs/2101.11802
Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108
Segal JA (1984) Predicting supreme court cases probabilistically: The search and seizure cases, 1962-1981. APSA 78
Sermanet P, Lynch C, Chebotar Y, Hsu J, Jang E, Schaal S, Levine S (2018) Time-contrastive networks: Self-supervised learning from video. In: ICRA, pp. 1134–1141
Shi X, Cao W, Raschka S (2021) Deep neural networks for rank-consistent ordinal regression based on conditional probabilities. CoRR abs/2111.08851
Shorten C, Khoshgoftaar TM, Furht B (2021) Text data augmentation for deep learning. J Big Data 8(1):101
Article Google Scholar
Spithourakis GP, Riedel S (2018) Numeracy for language models: Evaluating and improving their ability to predict numbers. In: ACL, pp. 2104–2115
Thawani A, Pujara J, Ilievski F, Szekely PA (2021) Representing numbers in NLP: a survey and a vision. In: NAACL, pp. 644–656
Tian Y, Krishnan D, Isola P (2020) Contrastive multiview coding. In: ECCV, vol. 12356, pp. 776–794. Springer
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. JMLR 9(11)
van den Oord A, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. CoRR abs/1807.03748
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Neural Inf Process Syst, pp. 5998–6008
Wu Z, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: CVPR, pp. 3733–3742
Xiao C, Zhong H, Guo Z, Tu C, Liu Z, Sun M, Feng Y, Han X, Hu Z, Wang H, Xu J (2018) CAIL2018: A large-scale legal dataset for judgment prediction. CoRR abs/1807.02478
Xu N, Wang P, Chen L, Pan L, Wang X, Zhao J (2020) Distinguish confusing law articles for legal judgment prediction. In: ACL, pp. 3086–3095
Yang W, Jia W, Zhou X, Luo Y (2019) Legal judgment prediction via multi-perspective bi-feedback network. In: IJCAI, pp. 4085–4091
Yoran O, Talmor A, Berant J (2022) Turning tables: Generating examples from semi-structured tables for endowing language models with reasoning skills. In: ACL, pp. 6016–6031
Yue L, Liu Q, Jin B, Wu H, Zhang K, An Y, Cheng M, Yin B, Wu D (2021) Neurjudge: A circumstance-aware neural framework for legal judgment prediction. In: SIGIR, pp. 973–982
Zhong H, Guo Z, Tu C, Xiao C, Liu Z, Sun M (2018) Legal judgment prediction via topological learning. In: EMNLP, pp. 3540–3549
Zhong H, Xiao C, Tu C, Zhang T, Liu Z, Sun M (2020) How does NLP benefit legal system: A summary of legal artificial intelligence. In: ACL, pp. 5218–5230

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Southeast University, Nanjing, 211189, Jiangsu, China
Sheng Bi, Zhiyao Zhou & Guilin Qi
Judicial Big Data Research Centre, School of Law, Southeast University, Nanjing, 211189, Jiangsu, China
Sheng Bi
Tencent Technology (Shenzhen) Co., Ltd., Shenzhen, 518057, Guangdong, China
Lu Pan

Authors

Sheng Bi
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Lu Pan
View author publications
You can also search for this author in PubMed Google Scholar
Guilin Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guilin Qi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bi, S., Zhou, Z., Pan, L. et al. Judicial knowledge-enhanced magnitude-aware reasoning for numerical legal judgment prediction. Artif Intell Law 31, 773–806 (2023). https://doi.org/10.1007/s10506-022-09337-4

Download citation

Accepted: 05 October 2022
Published: 02 November 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10506-022-09337-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Judicial knowledge-enhanced magnitude-aware reasoning for numerical legal judgment prediction

Abstract

Access this article

Similar content being viewed by others

Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

Deep learning modelling techniques: current progress, applications, advantages, and challenges

Knowledge Graphs: Opportunities and Challenges

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Judicial knowledge-enhanced magnitude-aware reasoning for numerical legal judgment prediction

Abstract

Access this article

Similar content being viewed by others

Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

Deep learning modelling techniques: current progress, applications, advantages, and challenges

Knowledge Graphs: Opportunities and Challenges

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation