Abstract
We describe research carried out as part of a text summarisation project for the legal domain for which we use a new XML corpus of judgments of the UK House of Lords. These judgments represent a particularly important part of public discourse due to the role that precedents play in English law. We present experimental results using a range of features and machine learning techniques for the task of predicting the rhetorical status of sentences and for the task of selecting the most summary-worthy sentences from a document. Results for these components are encouraging as they achieve state-of-the-art accuracy using robust, automatically generated cue phrase information. Sample output from the system illustrates the potential of summarisation technology for legal information management systems and highlights the utility of our rhetorical annotation scheme as a model of legal discourse, which provides a clear means for structuring summaries and tailoring them to different types of users.
Similar content being viewed by others
References
Aleven, V. (1997). Teaching Case-Based Argumentation through a Model and Examples. Ph.D. thesis. University of Pittsburgh, Pittsburgh, PA, USA
Aone, C., Okurowski, M. E., Gorlinsky, J., and Larsen, B. (1999). A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques. In: Mani I., and Maybury M.T. (eds), Advances in Automatic Text Summarization, 71–80. MIT Press, Cambridge Massechusetts
Banko, M., Mittal, V., Kantrowitz, M., and Goldstein, J. (1999). Generating Extraction-Based Summaries from Hand-Written Summaries by Aligning Text Spans. In Proceedings of the 4th Meeting of the Pacific Association for Computational Linguistics. Waterloo, Ontario, Canada
Borko, H., and Bernier, C. L. (1975). Abstracting Concepts and Methods. Academic Press, New York
Carletta, J., Evert, S., Heid, U., Kilgour, J., Robertson, J., and Voormann, H. (2003). The NITE XML Toolkit: Flexible Annotation for Multi-modal Language Data. Behavior Research Methods, Instruments, and Computers, special issue on Measuring Behavior 35(3):353–363
Cheung, L., Lai, T., Tsou, B., Chik, F., Luk, R., and Kwong, O. (2001). A Preliminary Study of Lexical Density for the Development of XML-based Discourse Structure tagger. In␣Proceedings of the 1st NLP and XML Workshop. Tokyo, Japan
Chinchor, N. A. (1998). In Proceedings of the 7th Message Understanding Conference. Fairfax, Virginia
Collins, M. (2000). Discriminative Reranking for Natural Language Parsing. In Proceedings of the 17th International Conference on Machine Learning. Stanford University, CA, USA
Curran, J. R. and Clark, S. (2003a). Investigating GIS and Smoothing for Maximum Entropy Taggers. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics. Budapest, Hungary
Curran, J. R. and Clark, S. (2003b). Language Independent NER using a Maximum Entropy Tagger. In Proceedings of the Conference on Computational Natural Language Learning. Edmonton, Alberta, Canada
Daelemans, W. and Osborne, M. (2003). In Proceedings of the Conference on Computational Language Learning. Edmonton, Alberta, Canada
Darroch, J. N., and Ratcliff, D. (1972). Generalized Iterative Scaling for Log-Linear Models. The Annals of Mathematical Statistics 43(5):1470–1480
Farzindar, A. (2005). Résumé Automatique de Textes Juridiques. Ph.D. thesis. Université de Montréal and Université Paris-Sorbonne
Farzindar, A. and Lapalme, G. (2004). Legal Text Summarization by Exploration of the Thematic Structure and Argumentative Roles. In Proceedings of the ACL-2004 Text Summarization Branches Out Workshop. Barcelona, Spain
Fayyad, U. and Irani, K. (1993). Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence. Chambéry, France
Greenwood, K., Bench-Capon, T., and McBurney, P. (2003). Towards a Computational Account of Persuasion in Law. In Proceedings of the 9th International Conference on Artificial Intelligence and Law. Edinburgh, Scotland
Grover, C., Matheson, C., Mikheev, A., and Moens, M. (2000). LT TTT—A Flexible Tokenisation Tool. In Proceedings of the 2nd International Conference on Language Resources and Evaluation. Athens, Greece
Grover, C., Hachey, B., Hughson, I., and Korycinski, C. (2003). Automatic Summarisation of Legal Documents. In Proceedings of the 9th International Conference on Artificial Intelligence and Law. Edinburgh, Scotland
Hachey, B. (2002). Recognising Clauses Using Symbolic and Machine Learning Approaches. Master’s thesis. University of Edinburgh, Edinburgh, Scotland
Jing, H. and McKeown, K. R. (1999). The Decomposition of Human-Written Summary Sentences. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley, CA, USA, 129–136
John, G. H. and Langley, P. (1995). Esitmating Continuous Distributions in Bayesian Classifiers. In Proceedings of the 11th Annual Conference on Uncertainty in Artificial Intelligence. Montréal, Québec, Canada
Krippendorff, K. (1980). Content Analysis: An Introduction to its Methodology. Sage Publications, Beverly Hills, CA
Kupiec, J., Pedersen, J., and Chen, F. (1995). A Trainable Document Summarizer. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, WA, USA
Lafferty, J., McCallum, A., and Pereira, F. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the 18th International Conference on Machine Learning. Williams College, MA, USA
Lapata, M. (2003). Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the 41st Meeting of the Association of Computational Linguistics. Sapporo, Japan
Littlestone, N. (1988). Learning Quickly when Irrelevant Attributes Abound: A New Linear Threshold Algorithm. Machine Learning 2:285–318
Lupo, C. and Batini, C. (2003). A Federative Approach to Laws Access by Citizens: The “Normeinrete” System. In Proceedings of the 2nd International Conference on Electronic Governance. Prague, Czech Republic
Maley, Y. (1994). The Language of the Law. In: Gibbons J. (eds): Language and the Law, 11–50. Longman, London
Malouf, R. (2002). A Comparison of Algorithms for Maximum Entropy Parameter Estimation. In Proceedings of the Conference on Computational Natural Language Learning. Taipei, Taiwan
Mani, I. (2001). Automatic Summarization. John Benjamins: Amsterdam
Mani, I. and Bloedorn, E. (1998). Machine Learning of Generic and User-focused Summarization. In Proceedings of the 15th National Conference on Artificial Intelligence. Madison, WI, USA
Mann, W. C., and Thompson, S. A. (1987). Rhetorical Structure Theory: Description and construction of text structures. In: Kempen, G. (eds), Natural Language Generation: New Results in Artificial Intelligence, Psychology, and Linguistics. Marinus Nijhoff Publishers, Dordrecht, NL, pp. 85–95
Marcu, D. (1999). The Automatic Construction of Large-Scale Corpora for Summarization Research. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley, CA, USA
McCallum, A., Freitag, D., and Pereira, F. (2000). Maximum Entropy Markov Models for Information Extraction and Segmentation. In Proceedings of the 17th International Conference on Machine Learning. Stanford University, CA, USA
McKelvie, D. (1999). XMLPERL 1.0.4 XML processing software. http://www.cogsci.ed.ac.uk/dmck/xmlperl
Mikheev, A. (1997). Automatic Rule Induction for Unknown Word Guessing. Computational Linguistics 23(3):405–423
Minnen, G., Carroll, J., and Pearce, D. (2000). Robust, Applied Morphological Generation. In Proceedings of 1st International Natural Language Generation Conference. Mitzpe Ramon, Israel
Moens, M. -F., and Busser, R. D. (2002). First Steps in Building a Model for the Retrieval of Court Decisions. International Journal of Human-Computer Studies 57(5):429–446
Moens, M. F., Uyttendaele, C., and Dumortier, J. (1997). Abstracting of Legal Cases: The SALOMON Experience. In The 6th International Conference on Artificial Intelligence and Law. Melbourne, Victoria, Australia
Molina, A., and Pla, F. (2002). Shallow Parsing Using Specialized HMMs. The Journal of Machine Learning Research 2:595–613
Myers, G. (1992). In this Paper we Report...: Speech acts and Scientific Facts. Journal of Pragmatics 17(4): 295–313
Osborne, M. (2002). Using Maximum Entropy for Sentence Extraction. In Proceedings of the ACL-2002 Automatic Summarization Workshop. Philadelphia, PA, USA
Platt, J. C. (1998). Fast Training of Support Vector Machines using Sequential Minimal Optimization. In Schölkopf, B., Burges, C. J., and Smola, A. J. (eds.), Advances in Kernel Methods: Support Vector Learning. Cambridge, MA: MIT Press
Quinlan, R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA
Ratnaparkhi, A. (1996). A Maximum Entropy Part-of-Speech Tagger. In Proceedings of the 1st Conference on Empirical Methods in Natural Language Processing. Philadelphia, PA, USA
Ratnaparkhi, A. (1998). Maximum Entropy Models for Natural Language Ambiguity Resolution. Ph.D. thesis. University of Pennsylvania, Philadelphia, PA, USA
Sang, E. T. K. and Déjean, H. (2001). Introduction to the CoNLL-2001 Shared Task: Clause Identification. In Proceedings of the Conference on Computational Language Learning. Toulouse, France
Spärck-Jones, K. (1998). Automatic Summarising: Factors and Directions. In: Mani, I. and Maybury M. T. (eds.), Advances in Automatic Text Summarisation, 1–14. Cambridge, Massechusetts: MIT Press
Swales, J. M. (1990). Genre Analysis: English in Academic and Research Settings. Cambridge University Press, Cambridge
Teufel, S. and Moens, M. (1997). Sentence Extraction as a Classification Task. In Proceedings of the ACL/EACL’97 Workshop on Intelligent and Scalable Text Summarization. Madrid, Spain
Teufel, S., Moens, M. (1998). Argumentative Classification of Extracted Sentences as a First Step Towards Flexible Abstracting. In: Mani, I., and Maybury, M. T. (eds), Advances in Automatic Text Summarization. MIT Press, Cambridge Massechusetts, pp. 137–175
Teufel, S. and Moens, M. (1999). Discourse-Level Argumentation in Scientific Articles: Human and Automatic Annotation. In Proceedings of the ACL-1999 Workshop Towards Standards and Tools for Discourse Tagging. College Park, MD, USA
Teufel, S. and Moens, M. (2000). What’s Yours and What’s Mine: Determining Intellectual Attribution in Scientific Text. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. Hong Kong
Teufel, S., Moens, M. (2002). Summarising Scientific Articles – Experiments with Relevance and Rhetorical Status. Computational Linguistics 28(4):409–445
Teufel, S., Carletta, J., and Moens, M. (1999). An Annotation Scheme for Discourse-Level Argumentation in Reserach Articles. In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics. Bergen, Norway
Thompson, H., Tobin, R., McKelvie, D., and Brew, C. (2004). LT XML version 1.2.7. http://www.ltg.ed.ac.uk/software/xml
van Engers, T. M., van Gog, R., and Sayah, K. (2004). A Case Study on Automated Norm Extraction. In Proceedings of the 17th Annual Conference on Legal Knowledge and Information Systems. Berlin, Germany
Wasson, M. (1998). Using Leading Text for News Summaries: Evaluation Results and Implications for Commercial Summarization Applications. In Proceedings of the Joint 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics. Montréal, Québec, Canada
Weber, R. O., Ashley, K. D. and Brninghaus, S. (2006). Textual Case-based Reasoning. Knowledge Engineering Review, 20(3): 255–260
Winkels, R., Boer, A., and Hoekstra, R. (2002). MetaLex: An XML Standard for Legal Documents. In Proceedings of the XML Europe Conference. London, England
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hachey, B., Grover, C. Extractive summarisation of legal texts. Artif Intell Law 14, 305–345 (2006). https://doi.org/10.1007/s10506-007-9039-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10506-007-9039-z