Skip to main content

Advertisement

Log in

Identification of Rhetorical Roles for Segmentation and Summarization of a Legal Judgment

  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

Legal judgments are complex in nature and hence a brief summary of the judgment, known as a headnote, is generated by experts to enable quick perusal. Headnote generation is a time consuming process and there have been attempts made at automating the process. The difficulty in interpreting such automatically generated summaries is that they are not coherent and do not convey the relative relevance of the various components of the judgment. A legal judgment can be segmented into coherent chunks based on the rhetorical roles played by the sentences. In this paper, a comprehensive system is proposed for labeling sentences with their rhetorical roles and extracting structured head notes automatically from legal judgments. An annotated data set was created with the help of legal experts and used as training data. A machine learning technique, Conditional Random Field, is applied to perform document segmentation by identifying the rhetorical roles. The present work also describes the application of probabilistic models for the extraction of key sentences and composing the relevant chunks in the form of a headnote. The understanding of basic structures and distinct segments is shown to improve the final presentation of the summary. Moreover, by adding simple additional features the system can be extended to other legal sub-domains. The proposed system has been empirically evaluated and found to be highly effective on both the segmentation and summarization tasks. The final summary generated with underlying rhetorical roles improves the readability and efficiency of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Allan J, Carbonell J, Doddington G, Yamron Y, Yang Y (1998) Topic detection and tracking pilot study final report. In: Proceedings of the DARPA broadcast news transcription and understanding workshop, pp 194–218

  • Beeferman D, Berger A, Lafferty J (1999) Statistical models for text segmentation. Mach Learn 34(1–3):177–210

    Article  MATH  Google Scholar 

  • Bhatia VK (1999) Analyzing genre: language use in professional settings. Longman, London

    Google Scholar 

  • Borkar V, Deshmukh K, Sarawagi S (2001) Automatic segmentation of text into structured records. In: Proceedings of ACM SIGMOD 2001, Santa Barbara, pp 175–186

  • Brandow R, Mitze K, Rau LF (1995) Automatic condensation of electronic publications by sentence selection. Inf Process Manag 31(5):675–685

    Article  Google Scholar 

  • Brunk C, Pazani M (1991) An investigation of noise-tolerant relational concept learning algorithms. In: Proceedings of the eighth international workshop on machine learning, Ithaca, pp 389–393

  • Buckley A, Singhal A, Mitra A, Salton G (1996) New retrieval approaches using SMART. In: Proceedings of TREC-4, pp 25–48

  • Carbonell J, Goldstein J (1998) The use of MMR, diversity-based re-ranking for reordering documents and producing summaries. In: SIGIR ‘98: proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, pp 335–336

  • Christopher DM, Schütze H (2001) Foundations of statistical natural language processing. The MIT Press, London

    Google Scholar 

  • Church KW, Gale WA (1995) Poisson mixtures. Nat Lang Eng 1(2):163–190

    Article  Google Scholar 

  • Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3(1):261–283

    Google Scholar 

  • Cohen W (1995) Fast effective rule induction, in machine learning. In: Proceedings of the twelfth international conference, Morgan Kaufmann, Lake Tahoe, California, pp 335–342

  • Cohen W, Singer Y (1999) A simple, fast, and effective rule learner. In: Proceedings of the sixteenth national conference on artificial intelligence (AAAI-99), AAAI Press, pp 335–342

  • Edmundson HP (1969) New methods in automatic abstracting. J ACM 16(2):264–285

    Article  MATH  Google Scholar 

  • Erkan G, Radev DR (2004a) Lexpagerank: prestige in multi-document text summarization. In: Lin D, Wu D (eds) Proceedings of EMNLP 2004, Association for Computational Linguistics, Barcelona, pp 365–371

  • Erkan G, Radev DR (2004b) Lexpagerank: prestige in multi-document summarization. In: EMNLP

  • Farzindar A (2005) Résumé automatique de textes juridiques. Ph.D. Thesis, Université de Montréal et Université Paris IV-Sorbonne

  • Farzindar A, Lapalme G (2004) Letsum, an automatic legal text summarization system. In: Gorden T (ed) Legal knowledge and information systems, JURIX 2004: the seventeenth annual conference, IOS Press, Amsterdam, pp 11–18

  • Filatova E, Hatzivassiloglou V (2004) Event-based extractive summarization. In: ACL workshop text summarization branches out

  • Freddy Y, Choi Y (2000) Advances in domain independent linear text segmentation. In: Proceedings of the first conference on North American chapter of the association for computational linguistics, vol 4, ACM International Conference Proceeding Series, pp 26–33

  • Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121:256–285

    Article  MATH  MathSciNet  Google Scholar 

  • Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning (ICML-96), Bari, pp 148–156

  • Friedmen JH, Popescu BE (2005) Predictive learning via rule ensembles. Technical Report, Stanford University

  • Furnkranz J, Widmer G (1994) Incremental reduced error pruning, machine learning. In: Proceedings of the eleventh international conference, New Brunswick, pp 70–77

  • Grover C, Hachey B (2006) Extractive summarization of legal texts. Artif Intell Law 14(4):305–345 (Kluwer Academic Publishers, USA)

    Google Scholar 

  • Grover C, Hachey B, Hughson I (2004) The HOLJ Corpus: supporting summarization of legal texts. In: Proceedings of the 5th international workshop on linguistically interpreted corpora (CLIN’04), Geneva, pp 47–54

  • Hajime M, Manabu O (2000) A comparison of summarization methods based on task-based evaluation. In: Proceedings of 2nd international conference on language resources and evaluation, LREC-2000, Greece, pp 633–639

  • Hearst MA (1994) Multi-paragraph segmentation of expository text. In: Proceedings of the 32nd meeting of the association for computational linguistics, Las Cruces, pp 9–16

  • Jing H, Barzilay R, Mckeown K, Elhadad M (1998) Summarization evaluation methods: experiments and analysis. Proceedings of AAAI 98 spring symposium on intelligent text summarization, pp 60–68

  • Jones KS, Galliers JR (1995) Evaluating natural language processing review. Springer, New York

    Book  Google Scholar 

  • Katz SM (1995) Distribution of content words and phrases in text and language modeling. Nat Lang Eng 2(1):15–59

    Article  Google Scholar 

  • Kozima H (1993) Text segmentation based on similarity between words. In: Proceedings of the 31st annual meeting of the association for computational linguistics, Columbus, pp 286–288

  • Krippendorff K (1980) Content analysis: an introduction to its methodologies. Sage publications, Beverly Hills

    Google Scholar 

  • Lafferty J, McCullam A, Pereira F (2001) Conditional random fields: probabilistic models and for segmenting and labeling sequence data. In: Proceedings of international conference machine learning, pp 282–289

  • Li WJ, Xu W, Wu ML, Yuan CF, Lu Q (2006) Extractive summarization using inter- and intra-event relevance. In: Proceedings of the 21st international conference computational linguistics and 44th annual meeting of ACL (ACL/COLING’06), Sydney, July 17–21, pp 369–376

  • Lin C (2004) ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the workshop on text summarization branches out (WAS 2004), Barcelona, pp 74–81

  • Lin C, Hovy E (2003) Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of the human technology conference (HLTNAACL-2003), Edmonton, pp 62–69

  • Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165

    Article  MathSciNet  Google Scholar 

  • Mani I, House D, Klein G, Hirschman L, Orbsl L, Firmin T, Chrzanowski M, Sundheirm B (1998) The TIPSTER SUMMAC text summarization evaluation, MITRE Technical report, MTR98W0000138, The MITRE Corporation

  • McCullam A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of international conference machine learning, pp 591–598

  • McDonald R (2007) A study of global inference algorithms in multi-document summarization. In: Proceedings of the 29th European conference on information retrieval (ECIR), pp 557–564

  • Morris AH, Kasper GM, Adams GA (1992) The effects and limitations of automated text condensing on reading comprehension performance. Inf Syst Res 26:17–35

    Article  Google Scholar 

  • Nakao Y (2000) An algorithm for one-page summarization of a long text based on thematic hierarchy detection. In: Proceedings of the 26th annual meeting of the association for computational linguistics, New Jersey, pp 302–309

  • Peng F, McCullam A (2006) Accurate information extraction from research papers using conditional random fields. Inf Process Manag 42(4):963–979

    Article  Google Scholar 

  • Quinlan JR (1994) C4.5: programs for machine learning, Morgan Kaufmann

  • Radev DR, Jing H, Budzikowska M (2000) Centroids-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In: Proceedings of ANLP-NAACL workshop on summarization, Seattle, Washington, pp 21–30

  • Saravanan M, Ravindran B, Raman S (2006a) Improving legal document summarization using graphical models. In: Proceedings of 19th international annual conference on legal knowledge and information systems, JURIX 2006, Paris, pp 51–60

  • Saravanan M, Raman S, Ravindran B (2006b) A probabilistic approach to multi-document summarization for generating a tiled summary. Int J Comput Intell Appl 6(2):231–243 (Imperial College)

    Article  Google Scholar 

  • Saravanan M, Ravindran B, Raman S (2008) Automatic identification of rhetorical roles using conditional random fields for legal document summarization. In: Proceedings of the third international joint conference on natural language processing, IJCNLP 2008, Hyderabad, pp 51–60

  • Schapire RE, Singer Y (1998) Improved boosting algorithms using confidence-rated predictions. In: Proceedings of the eleventh annual conference on computational learning theory, New York, pp 80–91

  • Siegal S, Castellan NJ (1988) Nonparametric statistics for the behavioral sciences. McGraw Hill, Berkeley

    Google Scholar 

  • Sutton C, McCallum A (2005) Piecewise training for undirected models. In: Proceedings of the 21st conference on uncertainty in artificial intelligence (UAI-05), Arlington, pp 568–575

  • Teufel S, Moens M (2002) Summarizing scientific articles—experiments with relevance and rhetorical status. Comput Linguist 28(4):409–445

    Article  Google Scholar 

  • Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworths, London

    Google Scholar 

  • Viterbi AJ (1967) Error bounds for convolution codes and asymptotically optimal decoding algorithm. IEEE Trans Inf Process 13:260–269

    Article  MATH  Google Scholar 

  • Wallach HM (2004) Conditional random fields: an introduction. Technical Report MS-CIS-04-21, Department of CIS, University of Pennsylvania

  • Wiebe JM (1994) Tracking point of view in narrative. Comput Linguist 20(2):223–287

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Saravanan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saravanan, M., Ravindran, B. Identification of Rhetorical Roles for Segmentation and Summarization of a Legal Judgment. Artif Intell Law 18, 45–76 (2010). https://doi.org/10.1007/s10506-010-9087-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-010-9087-7

Keywords

Navigation