Skip to main content
Log in

Innovative techniques for legal text retrieval

  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

Legal text retrieval traditionally relies upon external knowledge sources such as thesauri and classification schemes, and an accurate indexing of the documents is often manually done. As a result not all legal documents can be effectively retrieved. However a number of current artificial intelligence techniques are promising for legal text retrieval. They sustain the acquisition of knowledge and the knowledge-rich processing of the content of document texts and information need, and of their matching. Currently, techniques for learning information needs, learning concept attributes of texts, information extraction, text classification and clustering, and text summarization need to be studied in legal text retrieval because of their potential for improving retrieval and decreasing the cost of manual indexing. The resulting query and text representations are semantically much richer than a set of key terms. Their use allows for more refined retrieval models in which some reasoning can be applied. This paper gives an overview of the state of the art of these innovativetechniques and their potential for legal text retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agosti, M. and Smeaton, A. F. (eds.) (1996). Information Retrieval and Hypertext. Boston: Kluwer Academic Publishers.

    Google Scholar 

  • Aleven, V. (1999). Case-Based Reasoning. In Oskamp, A. and Lodder, A. R. (eds.) Informatietechnologie voor juristen. Handboek voor de jurist in de 21ste eeuw, 211-228. Deventer: Kluwer

    Google Scholar 

  • Ashley, K. D. (1992). Case-Based Reasoning and Its Implications for Legal Expert Systems. Artificial Intelligence and Law 1: 113-208.

    Google Scholar 

  • Aslam, J., Reiss, F., and Rus, D. (2000). Scalable Information Organization. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.

    Google Scholar 

  • Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. Harlow, UK: Addison Wesley.

    Google Scholar 

  • Barzilay, R. and Elhadad, M. (1999). Using lexical chains for text summarization. In Mani, I. and Maybury, M. T. (eds.) Advances in Automatic Text Summarization, 111-121. Cambridge, MA: MIT Press.

    Google Scholar 

  • Belkin, N. J. and Croft, W. B. (1992). Information Filtering and Information Retrieval: Two Sides of the Same Coin? Communications of the ACM 35(12): 29-48.

    Google Scholar 

  • Bing, J. (ed.) (1984). Legal Information Retrieval. Butterworths: North Holland.

    Google Scholar 

  • Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press.

    Google Scholar 

  • Blair, D. C. (1990). Language and Representation in Information Retrieval. Amsterdam: Elsevier Science Publishers.

    Google Scholar 

  • Blair, D. C. and Maron, M. E. (1985). An Evaluation of Retrieval Effectiveness for a Full-text Document-retrieval System. Communications of the ACM 28(3): 289-299.

    Google Scholar 

  • Blair, D. C. and Maron, M. E. (1990). Full-text Information Retrieval: Further Analysis and Clarification. Information Processing & Management 26: 437-447.

    Google Scholar 

  • Boguraev, B. K. and Neff, M. S. (2000). Lexical Cohesion, Discourse Segmentation and Document Summarization. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.

    Google Scholar 

  • Bradshaw, J. M. (ed.) (1997). Software Agents. Menlo Park, CA: AAAI Press.

    Google Scholar 

  • Brüninghaus, S. and Ashley, K. D. (1997). Finding Factors: Learning to Classify Case Opinions under Abstract Fact Categories. In Proceedings of the Sixth International Conference on Artificial Intelligence and Law, 123-131. New York: ACM.

    Google Scholar 

  • Brüninghaus, S. and Ashley, K. D. (1999). Toward Adding Knowledge to Learning Algorithms for Indexing Legal Cases. In Proceedings of the Seventh International Conference on Artificial Intelligence and Law, 9-17. New York: ACM.

    Google Scholar 

  • Buckley, C. and Salton, G. (1995). Optimization of Relevance Feedback Weights. In Fox, E. A., Ingwersen, P., and Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 351-357. New York: ACM.

    Google Scholar 

  • Carbonell, J. G., Yang, Y., Frederking, R. E., Brown, R. D., Geng, Y., and Lee, D. (1997). Translingual Information Retrieval: A Comparative Evaluation. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, 708-728. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Chiaramella, Y. and Chevallet, J. P. (1992). About Retrieval Models and Logic. The Computer Journal 35(3): 233-242.

    Google Scholar 

  • Chiaramella, Y. and Nie, J. (1990). A Retrieval Model Based on Extended Modal Logic and Its Application to the RIME Experimental Approach. In Vidick, J.-L. (ed.) Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 25-43. New York: ACM.

    Google Scholar 

  • Conklin, J. (1987). Hypertext: an Introduction and Survey. IEEE Computer 20(9): 17-41.

    Google Scholar 

  • Coulmas F. (1989). The Writing Systems of the World. Oxford, UK: Basil Blackwell.

    Google Scholar 

  • Cowie, J. and Wilks, Y. (2000). Information Extraction. In Dale, R., Moisl, H., and Somers, H. (eds.) Handbook of Natural Language Processing, 241-260. New York: Marcel Dekker.

    Google Scholar 

  • Cowie, J., Ludovik, E., Molina-Salgado, H., Nirenburg S., and Scheremetyeva. S. (2000). Automatic Question Answering. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.

    Google Scholar 

  • Craven, M. et. al. (1998). Learning to Extract Symbolic Knowledge from the World Wide Web. In Proceedings of the Tenth Conference on Innovative Applications of Artificial Intelligence (IAAI-98). Menlo Park, CA: AAI Press/The MIT Press

    Google Scholar 

  • Croft, W. B. (1980). A Model of Cluster Searching Based on Classification. Information Systems 5: 189-195.

    Google Scholar 

  • Cutting, D. R., Karger, D. R., Pedersen, J. O., and Tukey, J. W. (1992). Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In Belkin, N. J., Ingwersen, P., and Pejtersen, A. M. (eds.) Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 318-329. New York: ACM.

    Google Scholar 

  • Danet, B. (1985). Legal Discourse. In van Dijk, T. A. (ed.), Handbook of Discourse Analysis35(3): 243-255.

  • Gaizauskas, R. and Humphreys, K. (2000). A Combined IR/NLP Approach to Question Answering Against Large Text Collections. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.

    Google Scholar 

  • Gey, F. C. (1994). Inferring Probability of Relevance Using Methods of Logistic Regression. In Croft, W. B. and van Rijsbergen, C. J. (eds.) Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 222-231. London: Springer.

    Google Scholar 

  • Graesser, A. C. and Clark, L. F. (1985). Structures and Procedures of Implicit Knowledge (Advances in Discourse Processes, XVII). Norwood, NJ: Ablex Publishing Corporation.

    Google Scholar 

  • Griffiths, A., Luckhurst, H. C, and Willett, P. (1986). Using Interdocument Similarity Information in Document Retrieval Systems. Journal of the American Society for Information Science 37(1): 3-11.

    Google Scholar 

  • Hafner, C. D. (1987). Conceptual Organization of Case Law Knowledge Bases. In Proceedings of the First International Conference on Artificial Intelligence and Law, 35-42. New York: ACM.

    Google Scholar 

  • Hahn, U. (1990). Topic Parsing: Accounting for Text Macro Structures in Full-text Analysis. Information Processing & Management 26(1): 135-170.

    Google Scholar 

  • Hahn, U. and Reimer, U. (1999). Knowledge-based Text Summarization: Salience and Generalization Operators for Knowledge Base Abstraction. In Mani, I. and Maybury, M. T. (eds.) Advances in Automatic Text Summarization, 215-232. Cambridge, MA: MIT Press.

    Google Scholar 

  • Hand, D. J. (1997). 7Construction and Assessment of Classification Rules. Chichester: John Wiley & Sons.

    Google Scholar 

  • Hearst, M. A. (1997). TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages. Computational Linguistics 23 (1): 33-64.

    Google Scholar 

  • Hearst, M. A. and Pedersen, J. O. (1996). Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In Frei, H.-P., Harman, D., Schaüble, P., and Wilkinson, R. (eds.) Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 76-84. New York: ACM.

    Google Scholar 

  • Hovy, E. and Lin, C.-Y. (1999). Automated Text Summarization in SUMMARIST. In Mani, I. and Maybury, M. T. (eds.) Advances in Automatic Text Summarization, 81-94. Cambridge, MA: MIT Press.

    Google Scholar 

  • Jacobs, P. S. (ed.) (1992). Text-based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Jones, W. P. and Furnas, G. W. (1987). Pictures of relevance: a geometric analysis of similarity measures. Journal of the American Society for Information Science 38(6): 420-442.

    Google Scholar 

  • Joachims, T. (1998). Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In European Conference on Machine Learning (ECML).

  • Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley & Sons.

    Google Scholar 

  • Kittredge, R. and Lehrberger, J. (eds.) (1982). Sublanguage: Studies of Language in Restricted Semantic Domains. Berlin: W. de Gruyter.

    Google Scholar 

  • Kupiec, J., Pedersen, J., and Chen, F. (1995). A Trainable Document Summarizer. In Fox, E. A., Ingwersen, P., and Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 68-73. New York: ACM.

    Google Scholar 

  • Lalmas, M. (1998). Logical Models in Information Retrieval: Introduction and Overview. Information Processing & Management 34(1): 19-33.

    Google Scholar 

  • Lederer, F. I. (1996). Technology Augmented Litigation. In Proceedings of the First European Conference on Law, Computers and AI Exeter April 15-16, 1996, 70-81.

  • Leuski, A. and Allan, J. (2000). Improving Interactive Retrieval by Combined Ranked Lists and clustering. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.

    Google Scholar 

  • Luhn, H. P. (1957). A Statistical Approach to Mechanized Encoding and Searching of Literary Information. IBM Journal of Research and Development 1(4): 309-317.

    Google Scholar 

  • Manning, C. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.

    Google Scholar 

  • Marcu, D. (1999). Discourse Trees Are Good Indicators of Importance in Text. In Mani, I. and Maybury. M. T. (eds.) Advances in Automatic Text Summarization, 123-136. Cambridge, MA: MIT Press.

    Google Scholar 

  • Maron, M. E. and Kuhns, J. L. (1960). On Relevance, Probabilistic Indexing and Information Retrieval. Journal of the Association for Computing Machinery 7(3): 216-244.

    Google Scholar 

  • Masand, B., Linoff, G., and Waltz, D. (1992). Classifying News Stories Using Memory Based Reasoning. In Proceedings of the Fifteenth SIGIR Conference, 59-65. New York: ACM.

    Google Scholar 

  • McCallum, A., Nigam, K., Rennie, J., and Seymore, K. (1999). A Machine Learning Approach to Building Domain-specific Search Engines. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 662-667. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • McKeown, K. and Radev, D. R. (1999). Generating Summaries of Multiple News Articles. In Mani, I. and Maybury, M. T. (eds.) Advances in Automatic Text summarization, 381-399. Cambridge, MA: MIT Press.

    Google Scholar 

  • Merkl, D. and Schweighofer, E. (1997). The Exploration of Legal Text Corpora with Hierarchical Neural Networks: A Guided Tour in Public International Law. In Proceedings of the Sixth International Conference on Artificial Intelligence and Law, 98-105. New York: ACM.

    Google Scholar 

  • Michie, D., Spiegelhalter, D. J., and Taylor, C. C. (eds.) (1994). Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood.

    Google Scholar 

  • Mitchell, T. M. (1997). Machine Learning. Boston, MA: McGraw-Hill.

    Google Scholar 

  • Moens, M.-F. (2000). Automatic Indexing and Abstracting of Document Texts (The Kluwer International Series on Information Retrieval 6). Boston: Kluwer Academic Publishers.

    Google Scholar 

  • Moens, M.-F.,Gebruers, R., and Uyttendaele, C. (1996). SALOMON: Final Report. Technical Report ICRI, K.U. Leuven.

  • Moens, M.-F., Uyttendaele, C., and Dumortier, J. (1999a). Information Extraction from Legal Texts: The Potential of Discourse Analysis. International Journal of Human-Computer Studies 51: 1155-1171.

    Google Scholar 

  • Moens, M.-F., Uyttendaele, C., and Dumortier, J. (1999b). Abstracting of Legal Cases: The Potential of Clustering Based on the Selection of Representative Objects. Journal of the American Society for Information Science 50(2): 151-161.

    Google Scholar 

  • Moens, M.-F. and Dumortier, J. (2000). Text Categorization: The Assignment of Subject Descriptors to Magazine Articles. Information Processing & Management 36, 841-861.

    Google Scholar 

  • MUC-7 (1999). Proceedings of the Seventh Message Understanding Conference. SanMateo: Morgan Kaufmann.

    Google Scholar 

  • Nie, J. (1989). An Information Retrieval Model Based on Modal Logic. Information Processing & Management 25(5): 477-494.

    Google Scholar 

  • Nie, J.-Y. (1992). Towards a Probabilistic Modal Logic for Semantic Based Information Retrieval. In Belkin, N. J., Ingwersen, P., and Pejtersen, A. M. (eds.) Proceedings of the Fifteenth ACM SIGIR Conference on Research and Development in Information Retrieval, 140-151. New York: ACM.

    Google Scholar 

  • Nielsen, J. (1995). Multimedia and Hypertext: The Internet and Beyond. Boston: AP Professional.

    Google Scholar 

  • Nilsson, N. J. (1990). The Mathematical Foundations of Learning Machines. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Raghaven, V. V. and Wong, S. K. M. (1986). A Critical Analysis of Vector Space Model for Information Retrieval. Journal of the American Society for Information Science 37(5): 279-287.

    Google Scholar 

  • Riloff, E. (1996). An Empirical Study for Automated Dictionary Construction for Information Extraction in Three Domains. Artificial Intelligence 85: 101-134.

    Google Scholar 

  • Rissland, E. L. and Daniels, J. J. (1996). The Synergistic Application of CBR to IR. Artificial Intelligence Review 10(5/6): 441-475.

    Google Scholar 

  • Robertson, S. E. and Sparck Jones, K. (1976). Relevance Weighting of Search Terms. Journal of the American Society for Information Science 27(3): 129-146.

    Google Scholar 

  • Robertson, S. E. and Walker, S. (1994). Some Simple Effective Approximations to the 2-PoissonModel for Probabilistic Weighted Retrieval. In Croft, W. B. and van Rijsbergen, C. J. (eds.) Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 232-241. London: Springer.

    Google Scholar 

  • Rocchio, J. J. (1971). Relevance Feedback in Information Retrieval. In Salton, G. (ed.) The SMART Retrieval System: Experiments in Automatic Document Processing, 313-323. Englewood Cliffs, NJ: Prentice Hall.

    Google Scholar 

  • Salton, G. (1989). Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Salton, G. and Buckley C. (1990). Improving Retrieval Performance by Relevance Feedback. Journal of the American Society for Information Science 41(4): 288-297.

    Google Scholar 

  • Salton, G., Singhal, A., Mitra, M., and Buckley, C. (1997). Automatic Text Structuring and Summarization. Information Processing & Management 33(2): 193-207.

    Google Scholar 

  • Schank, R. C. (1975). Conceptual Information Processing. Amsterdam: North Holland.

    Google Scholar 

  • Schweighofer, E. and Merkl, D. (1999). A Learning Technique for Legal Document Analysis. In Proceedings of the Seventh International Conference on Artificial Intelligence and Law, 156-163. New York: ACM.

    Google Scholar 

  • Soderland, S. (1999). Learning Information Extraction Rules for Semi-structured and Free Text. Machine Learning 34(1/3): 233-272.

    Google Scholar 

  • Sparck Jones, K. (1991). The Role of Artificial Intelligence in Information Retrieval. Journal of the American Society for Information Science 42(8): 558-565.

    Google Scholar 

  • Sparck Jones, K. (1993). What Might Be in a Summary? In Knorz, G., Krause, J., and Womser-Hacker, C. (eds.) Information Retrieval '93: Von der Modellierung zur Anwendung 9-26. Konstanz: Universitätsverlag.

    Google Scholar 

  • Sparck Jones, K., Walker, S, and Robertson, S. E. (2000). A Probabilistic Model of Information Retrieval: Development and Comparative Experiments. Information Processing & Management 36(6): 779-840.

    Google Scholar 

  • Sperber, D. and Wilson, D. (1995). Relevance: Communication and Cognition (2nd edition). Oxford, UK: Basil Blackwell.

    Google Scholar 

  • Strzalkowski, T., Stein, G. C., Bowden, G., and Bagga, A. (2000). Towards the Next Generation Information Retrieval. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.

    Google Scholar 

  • Turtle, H. (1995). Text Retrieval in the Legal World. Artificial Intelligence and Law 3: 5-54.

    Google Scholar 

  • Turtle, H. R. and Croft, W. B. (1992). A Comparison of Text Retrieval Models. The Computer Journal 35(3): 279-290.

    Google Scholar 

  • Uyttendaele, C., Moens, M.-F., and Dumortier, J. (1998). SALOMON: Abstracting of Legal Cases for Effective Access to Court Decisions. Artificial Intelligence and Law 6: 59-79.

    Google Scholar 

  • Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). London: Butterworths.

    Google Scholar 

  • Van Rijsbergen, C. J. (1986). A Non-classical Logic for Information Retrieval. The Computer Journal 29: 111-134.

    Google Scholar 

  • Van Rijsbergen, C. J. (1989). Towards an Information Logic. In Belkin, N. J. and van Rijsbergen, C. J. (eds.) Proceedings of the Twelfth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 77-86. New York: ACM.

    Google Scholar 

  • Voorhees, E. (1985). The Cluster Hypothesis Revisited. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 95-104. New York: ACM.

    Google Scholar 

  • Wang, Z. W., Wong, S. K. M., and Yao, Y. Y. (1992). An Analysis of Vector Space Models Based on Computational Geometry. In Belkin, N. J., Ingwersen, P., and Pejtersen, A.M. (eds.) Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 152-160. New York: ACM.

    Google Scholar 

  • Weiss, S. M. et al. (1999). Maximizing Text-mining Performance. IEEE Intelligent Systems July-August 1999: 63-69.

  • Willett, P. (1988). Recent Trends in Hierarchic Document Clustering: A Critical Review. Information Processing & Management 24(5): 577-597.

    Google Scholar 

  • Winkels, R., Bosscher, D., Boer A., and Hoekstra, R. (2000). Extended Conceptual Retrieval. In Legal Knowledge and Information Systems: Jurix 2000: The Thirteenth Annual Conference, 85-97. Amsterdam: IOS Press.

    Google Scholar 

  • Wong, S. K. M., Ziarko, W., and Wong, P. C. N. (1985). Generalized Vector Space Model in Information Retrieval. In ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '85), 18-25. New York: ACM.

    Google Scholar 

  • Yang, Y. and Liu, X. (1999). A Re-examination of Text Categorization Methods. In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval, 42-49. New York: ACM.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moens, MF. Innovative techniques for legal text retrieval. Artificial Intelligence and Law 9, 29–57 (2001). https://doi.org/10.1023/A:1011297104922

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011297104922

Navigation