Automatically classifying case texts and predicting outcomes

Ashley, Kevin D.; Brüninghaus, Stefanie

doi:10.1007/s10506-009-9077-9

Automatically classifying case texts and predicting outcomes

Published: 09 July 2009

Volume 17, pages 125–165, (2009)
Cite this article

Artificial Intelligence and Law Aims and scope Submit manuscript

Kevin D. Ashley¹ &
Stefanie Brüninghaus²

2229 Accesses
88 Citations
7 Altmetric
Explore all metrics

Abstract

Work on a computer program called SMILE + IBP (SMart Index Learner Plus Issue-Based Prediction) bridges case-based reasoning and extracting information from texts. The program addresses a technologically challenging task that is also very relevant from a legal viewpoint: to extract information from textual descriptions of the facts of decided cases and apply that information to predict the outcomes of new cases. The program attempts to automatically classify textual descriptions of the facts of legal problems in terms of Factors, a set of classification concepts that capture stereotypical fact patterns that effect the strength of a legal claim, here trade secret misappropriation. Using these classifications, the program can evaluate and explain predictions about a problem’s outcome given a database of previously classified cases. This paper provides an extended example illustrating both functions, prediction by IBP and text classification by SMILE, and reports empirical evaluations of each. While IBP’s results are quite strong, and SMILE’s much weaker, SMILE + IBP still has some success predicting and explaining the outcomes of case scenarios input as texts. It marks the first time to our knowledge that a program can reason automatically about legal case texts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

The (real) need for a human touch: testing a human–machine hybrid topic classification workflow on a New York Times corpus

Article Open access 17 December 2021

Miklos Sebők, Zoltán Kacsuk & Ákos Máté

Email Answering by Matching Question and Context-Specific Text Patterns: Performance and Error Analysis

Deceptive XAI: Typology, Creation and Detection

Article Open access 09 December 2023

Johannes Schneider, Christian Meske & Michalis Vlachos

Notes

In 1995, U.S. case law comprised about 50 gigabytes of text and was growing by 2 gigabytes per year (Turtle 1995, pp. 6, 47).
“The appellate courts of the United States hand down about 500 decisions a day, so there is considerable pressure upon legal publishers to process case opinions in a timely fashion.” (Jackson et al. 2003).
A nearest-neighbor classifier has in memory “all the documents in the training set and their associated features. Later, when classifying a new document, D, the classifier first selects the k documents in the training set that are closest to D [using some distance metric like cosine similarity], then picks one or more categories to assign to D, based on the categories assigned to the selected k documents.” (Jackson and Moulinier 2007, p. 144).
In an IR program with a relevance-feedback module, given feedback from a user in the form of examples of retrieved documents that are relevant to the user’s query, the IR system retrieves additional documents similar to the examples.
“‘Trade secret’ means information, […] that: (1) derives independent economic value, […] from not being generally known to, and not being readily ascertainable by proper means […] and (2) is the subject of efforts that are reasonable under the circumstances to maintain its secrecy.” Uniform Trade Secrets Act § 1(4) (1985).
“One […] is liable [for trade secret misappropriation if] (a) he discovered the secret by improper means, or (b) his disclosure or use constitutes a breach of confidence […]” Restatement (First) of Torts § 757 (1939).
409 S.W.2d 1 (Mo. 1966). This case was in the original CATO database. As in CATO, we focus only on the decision of the trade secret misappropriation claim regarding one defendant, Trieman. Other claims are ignored. In CATO, representing inconsistent results of trade secret claims against other defendants would require entering additional versions of the case.
IBP’s list of KO-Factors includes: F8 Competitive-Advantage (P) (i.e., defendant saved development time and money by using plaintiff's information), F17 Info-Independently Generated (D), F19 No-Security-Measures (D), F20 Info-Known-to-Competitors (D), F26 Deception (P), F27 Disclosure-In-Public-Forum (D).
The other Weak Factors include F1 Disclosure-in-Negotiations (D), and F16 Info-Reverse-Engineerable (D).
For other examples of the kind of explanations IBP generates to justify its predictions, see (Ashley and Brüninghaus 2006, p. 331).
Integrated Cash Management Services, Inc. v. Digital Transactions Inc., 732 F.Supp. 370 (S.D.N.Y. 1989).
Henry Hope X-Ray Products v. Marron Carrel, Inc., 674 F.2d 1336, 1339 (9th Cir. 1982).
E. I. duPont de Nemours v. American Potash & Chemical Corp., 200 A.2d 428, 535 (Del. Ch. 1964).
Surgidev Corp. v. Eye Technology, Inc., 648 F.Supp. 661, 675 (D.Minn. 1986). In this case, the opinion indicates that the employee individual defendants did sign nondisclosure agreements, but the squib-writer did not indicate that the employees were also defendants.
USM Corp. v. Marson Fastener Corp., 379 Mass. 90, 393 N.E.2d 895, 899 (1979).
For a description of CATO and additional examples of its squibs, see (Ashley 2000, 284–288; Ashley and Brüninghaus 2006).
In Table 1, the representation of the National Rejectors case as a training instance for the SMILE program differs slightly from the original CATO version, shown in the human column, input to the IBP program by the addition of F6 Security-Measures (D). There is an inconsistency between F6 Security-Measures (P) and F19 No-Security-Measures; courts sometimes focus in the same opinion on the security measures plaintiff took, justifying application of F6, and the sometimes many security measures that could have been taken but were not, justifying F19. In IBP, the inconsistency was resolved in favor of whichever the court seemed to emphasize more; for SMILE, the inconsistency was allowed to stand.
In generating the Factor classifiers, two other machine learning algorithms were tried, C4.5 (Quinlan 2004), and Naïve Bayes as implemented in Rainbow (McCallum 2004). The nearest-neighbor algorithm worked best.
The most commonly used text representation techniques in information retrieval include feature lists of words (i.e., bag-of-words), syntactic structure, frames/scripts, logic propositions, and network structures (Turtle 1995 p. 18).
445 N.E.2d 418 (Ill. App. 1 Dist. 1983).
The roles-replaced and propositional patterns text representations were introduced in (Brüninghaus and Ashley 2001, 47–48).
We developed automated information extraction techniques that use heuristics to perform the role substitution task for parties’ names and products (Brüninghaus and Ashley 2001). For the experiments reported here, however, we carried out all of the role replacements manually in order to make sure that the texts were as accurate as possible. Since the goal of our experiments here was to test whether role replacement improved Factor assignment, we believed that it was important to eliminate automated role replacements as a source of errors. If SMILE were to be used in larger-scaled applications, automatic role replacements would be necessary. Other commercial and research approaches exist for the task of performing the role replacements, which is a subset of the long-studied problems of named entity recognition and coreference (Jackson and Moulinier 2007, pp. 170–183).
In the parsing process, some identification of common phrases needs to take place. For instance, common constituents of trade secret cases described with multiword phrases such as “nondisclosure agreement” need to be recognized as individual entities (i.e., as a single head word like “nondisclosure_agreement”) for purposes of parsing and constructing ProPs. So far, we have handled that on an ad hoc basis, in effect manually substituting “nondisclosure_agreement” when a parse identifies “nondisclosure” as an adjective modifying agreement. Automating the common phrase recognition task should be possible, but we have not done it. See, (Kim and Wilbur 2000).
“ 29T Antitrust and Trade Regulation, 29TIV Trade Secrets and Proprietary Information, 29TIV(B) Actions, 29Tk429 Evidence, 29Tk432 k. Weight and Sufficiency of Evidence. … Evidence established that there were no actual trade secrets with respect to slug rejectors and electrical coin changers, in action based upon alleged misappropriation of trade secrets.”.
“ 212 Injunction …, 212II Subjects of Protection and Relief, 212II(B) Matters Relating to Property, 212k56 k. Disclosure or Use of Trade Secrets. … Although individual defendants, former employees of plaintiff, had improperly used plaintiff's materials and drawings in production of products to compete with plaintiff's products, where plaintiff had not considered information regarding its products to be trade secrets, no warning had been given against use of information, and key patent relating to plaintiff's devices had expired, plaintiff was not entitled to injunctive relief but was entitled to damages based upon profits which it lost by reason of competition of defendant corporation formed by individual defendants, during time that defendant corporation would not otherwise have been in production but for assistance obtained from use of plaintiff's drawings and materials.”
The vectors are in a multi-dimensional space with as many dimensions as there are different words in all of the sentences (Jackson and Moulinier 2007, pp. 30–33).
Jackson and Moulinier (2007, pp. 92–93) identifies a number of complexities in case opinions that bedevil information extraction: 1. The facts of the case may be intermingled with its procedural history. 2. Rulings in precedents are reported in much the same way as the ruling in the current case. 3. The opinions contain extensive quotations from other sources. 4. The opinions may contain extensive discussions of hypothetical, counter-factual, or qualified propositions. 5. The opinions often deal with many diverse points of law.
The F-Measure is the harmonic mean of the two rates, precision and recall. We used a version of the F-Measure that assigns equal weight to precision and recall. (Jackson and Moulinier 2007, p. 48).
We tested the results of our experiments for statistical significance in order to show whether the observed differences were caused by true differences between the representations and algorithms, and not merely by chance. Because our experiments were run as cross-validations, the commonly used T-test may not lead to reliable results (Dietterich 1996; Salzberg 1997). Since the cross-validation experiments compared more than two different parameter choices (specifically, three treatments are reported here, 1 machine learning algorithm X 3 representations) we followed the procedure recommended in (Dietterich 1996). We applied a non-parametric test, Friedman’s Test, to find whether there was a difference among the combinations of representations and algorithms. When this test showed significance, we used Wilcoxon’s Signed-Rank test to determine whether the difference between two variants was significant (e.g., between results obtained with ProPs and Roles-Replaced using Nearest-Neighbor.) Following convention, we say that results with probability p < 0.05 are statistically significant, and with p < 0.1 marginally significant. See (Cohen 1995a, b).
For a variety of reasons, it is difficult to compare F-measures across the legal text categorization work described supra in Section 2. Not only do the categorization tasks, classification concepts, and types of legal documents differ, but so do the relative importance of recall and precision in the particular application. For a different task, automatically categorizing opinion texts by WESTLAW topic categories, (Thompson 2001) reports F-measures with β = 2, indicating that twice as much weight is given to recall as precision. (The formula is F = ((β² + 1) × P × R)/(β² × P + R). In evaluating SMILE, we use β = 1, treating recall and precision as having equal weight.) For different categorization methods, he reports average F-measures (β = 2) across eighteen topics ranging from .495 to .637, and for individual topics using the two best categorization methods, the measures range from .253 to .860. In other work extracting the criminal offenses (based on a standardized list) and the legal principles applied from the text of legal opinions in criminal cases, the Salomon program achieved F-measures (β = 1) of .82 and .46, respectively (Uyttendaele et al. 1998). (Daniels and Rissland 1997) do not report precision and recall or F-measures for the SPIRE program, favoring a different metric, expected search length.) It should be remembered, however, that the concepts for categorizing cases in the above work were not specific enough to support a program’s reasoning about the cases as SMILE + IBP does.
National Rejectors, Inc. v. Trieman, 409 S.W.2d 1 (Sup. Ct. Mo., 1966).
Dynamics Research Corp. v. Analytic Sciences Corp. 400 N.E.2d 1274 (Mass.App.Ct., 1980).
“For current extraction technology to work, the information sought must be explicitly stated in the text. It cannot be merely implied by the text.” (Jackson and Moulinier 2007, p. 106).
These issues may interact with issues IBP’s model does address. For instance, if a defendant copied information fixed in a tangible medium of expression and covered by the subject matter of copyright, a trade secret claim may be preempted under s. 301 of the Copyright Act. A trade secret claim that involved an extra element of breach of confidence would not be preempted, but it could be if it involved only improper means. Although IBP’s model does not address preemption, some of the same Factors (e.g., regarding confidential relationship) would be relevant to the preemption analysis. One would need to modify the model to add preemption.
Currently, a legal IR system ranks the cases it retrieves according to statistical criteria that involve the frequencies of the query’s terms’ appearances in the retrieved cases and in the corpus.
Alternatively, a SMILE-type system might automatically classify cases drawn from legal IR systems and index them in a specialized database.
This AI & Law research on automatically processing case texts assumes that the opinion texts are reasonably complete and candid descriptions of courts’ decisions. SMILE + IBP’s focus on Factors for and against a court’s decision, for example, assumes that courts adequately disclose the facts in a case that favor a contradictory decision, a point about which legal scholars differ. See (Delgado and Stefancic 2007, fn. 100).

References

Aleven V (1997) Teaching case-based argumentation through a model and examples. Ph.D. dissertation, University of Pittsburgh
Aleven V (2003) Using background knowledge in case-based legal reasoning: a computational model and an intelligent learning environment. Artif Intell 150:183–237
Article MATH Google Scholar
Ashley K (1988) Modeling legal argument: reasoning with cases and hypotheticals. Ph.D. dissertation. COINS technical report No. 88–01, University of Massachusetts, Amherst
Ashley K (1990) Modeling legal argument: reasoning with cases and hypotheticals. The MIT Press, Cambridge
Google Scholar
Ashley K (2000) Designing electronic casebooks that talk back: the CATO program. Jurimetrics J 40:275–319
Google Scholar
Ashley K (2002) An AI model of case-based argument from a jurisprudential viewpoint. Artifi Intell Law 10:163–218
Article Google Scholar
Ashley K, Brüninghaus S (2006) Computer models for legal prediction. Jurimetrics J 46:309–352
Google Scholar
Bench-Capon T, Sartor G (2001) Theory based explanation of case law domains. In: Proceedings of the 8th international conference on artificial intelligence and law. ACM Press, pp 12–21
Bench-Capon T, Sartor G (2003) A model of legal reasoning with cases incorporating theories and values. Artif Intell 150:97–143
Article MATH Google Scholar
Boswell J, Chapman R, Fleeman J, Rogers P (1988) Life of Johnson. Oxford University Press, Oxford
Google Scholar
Branting L (1999) Reasoning with rules and precedents—a computational model of legal analysis. Kluwer, Dordrecht
Google Scholar
Brüninghaus S, Ashley K (1999) Bootstrapping case base development with annotated case summmaries. In: Proceedings of the third international conference on case-based reasoning. LNAI 1650. pp 59–73
Brüninghaus S, Ashley K (2001) Improving the representation of legal case texts with information extraction methods. In: Proceedings of the eighth international conference on artificial intelligence and law. pp 42–51
Brüninghaus S, Ashley K (2003) Predicting the outcome of case-based legal arguments. In: Sartor G (ed) Proceedings of the 9th international conference on artificial intelligence and law (ICAIL-03). ACM Press, pp 234–242
Brüninghaus S, Ashley K (2005) Reasoning with textual cases. In: Proceedings of the sixth international conference on case-based reasoning. Springer, pp 137–151
Burke R, Hammond K, Kulyukin V, Lytinen S, Tomuro N, Schonberg S (1997) Question answering from frequently-asked question files: experiences with the FAQ finder system. 18 Ai Magazine 18:57–66
Google Scholar
Cardie C, Howe N (1997) Improving minority class prediction using case-specific feature weights. In: Proceedings of the fourteenth international conference on machine learning. Morgan Kaufmann, pp 57–65
Chapman W, Bridewell W, Hanbury P, Cooper G, Buchanan B (2001) A simple algorithm for identifying negated findings and diseases in discharge Summaries. J Biom Inform 34:301–310, 302
Google Scholar
Chorley A, Bench-Capon T (2005) An empirical investigation of reasoning with legal cases through theory construction and application. Artif Intell Law 13:323–371
Article Google Scholar
Cohen P (1995a) Empirical methods for artificial intelligence. MIT-Press, Cambridge, MA
MATH Google Scholar
Cohen W (1995b) Text categorization and relational learning. In: Proceedings of the twelfth international conference on machine learning, pp 124–132
Cunningham C, Weber R, Proctor J, Fowler C, Murphy M (2004) Investigating graphs in textual case-based reasoning. In: Proceedings of the seventh european conference on case-based reasoning, pp 573–586
Daelemans W, Zavrel J, van der Sloot K, van den Bosch A (2004, 2007) TiMBL: Tilburg Memory Based Learner, version 5.02 (now 6.0) http://ilk.uvt.nl/timbl/
Dale R (2000) Handbook of natural language processing. Marcel Dekker, Inc., New York
Daniels J, Rissland E (1997) Finding legally relevant passages in case opinions. In: Proceedings of the sixth international conference on artificial intelligence and law. ACM Press, pp 39–46
Delgado R, Stefancic J (2007) Why do we ask the same questions? The triple helix dilemma revisited. Law Libr J 99:307–328
Google Scholar
Dietterich T (1996) Statistical tests for comparing supervised classification learning algorithms. Oregon State University Technical Report
Ejan Mackaay E, Robillard P (1974) Predicting judicial decisions: the nearest neighbour rule and visual representation of case patterns. Datenverarbeitung im Recht 3:302
Google Scholar
Fürnkranz J, Mitchell T, Riloff E (1998) A case study in using linguistic phrases for text categorization on the WWW. In: Proceedings of the ICML/AAAI-98 workshop on learning for text classification. Technical Report WS-98-05, pp 5–120
Gonçalves T, Quaresma P (2005) Is linguistic information relevant for the text legal classification problem? In: Proceedings of the tenth international conference on artificial intelligence and law. ACM Press, pp 168–176
Gordon T, Prakken H, Walton D (2007) The carneades model of argument and burden of proof. Artif Intell 171:10–11
MathSciNet Google Scholar
Grover C, Hachey B, Hughson I, Korycinski C (2003) Automatic summarisation of legal documents. In: Proceedings of ninth international conference on artificial intelligence and law. ACM Press, pp 243–251
Hachey B, Grover C (2006) Extractive summarization of legal texts. Artif Intell Law 14:305–345
Article Google Scholar
Hanson A (2002) From key numbers to keywords: how automation has transformed the law. Law Libr J 94:563
Google Scholar
Hunter D (2000) Near knowledge: inductive learning systems in law, Virginia. J.L. & Tech. 5:9
Jackson P, Moulinier I (2007) Natural language processing for online applications: text retrieval extraction and categorization, 2nd edn. John Benjamins Publishing Co, Amsterdam
Google Scholar
Jackson P, Al-Kofahi K, Tyrrell A, Vacher A (2003) Information extraction from case law and retrieval of prior cases. Artif Intell 150:239–290
Article Google Scholar
Kim Won, Wilbur W (2000) Corpus-based statistical screening for phrase identification. J Am Med Inform Assoc 7:499–511
Google Scholar
Lenz M (1999) Case retreival nets as a model for building flexible information systems Ph.D. dissertation, Humboldt University, Berlin
Lewis D (1992) Representation and learning in information retrieval. Ph.D. dissertation, University of Massachusetts, Amherst
Lewis D, Sparck Jones K (1996) Natural language processing for information retrieval. Commun ACM 39:92–101
Article Google Scholar
McCallum A (2004) Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/_mccallum/bow
McCarty LT (2007) Deep semantic interpretations of legal texts. In: Proceedings of the eleventh international conference on artificial intelligence and law, pp 217–224
Mitchell T (1997) Machine learning. McGraw-Hill, New York
MATH Google Scholar
Mitra M, Buckley C, Singhal A, Cardie C (1997) An analysis of statistical and syntactic phrases. In: Proceedings of the fifth international conference “recherche d’Information assistee par ordinateur”, pp 200–214
Moens M-F (2006) Information extraction: algorithms and prospects in a retrieval context. Springer, Dordrecht
MATH Google Scholar
Moens M-F, Boiy E, Palau R, Reed C (2007) Automatic detection of arguments in legal texts. In: Proceedings of eleventh international conference on artificial intelligence and law (ICAIL-07), pp 225–236
Popple J (1996) A pragmatic legal expert system. Dartmouth. Ashgate. Farnham, UK
Provost F, Aronis J, Buchanan B (1999) Rule-space search for knowledge-based discovery. CIIO Working Paper #IS 99-012, Stern School of Business, New York University (visited March 23, 2009) <http://pages.stern.nyu.edu/~fprovost/Papers/rule-search.pdf>
Quinlan R (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Google Scholar
Quinlan R (2004) C4.5 Release 8. http://www.rulequest.com/Personal/
Riloff E (1996) Automatically generating extraction patterns from untagged text. In: Proceedings of the thirteenth national conference on artificial intelligence, pp 1044–1049
Riloff E, Phillips W (2004) An introduction to the sundance and autoslog systems, University of Utah School of Computing Technical Report #UUCS-04-015. http://www.cs.utah.edu/~riloff/pdfs/official-sundance-tr.pdf (visited March 23, 2009)
Rose D (1994) A symbolic and connectionist approach to legal information retrieval. Lawrence Erlbaum Publishers, Taylor & Francis Group, Philadelphia
Google Scholar
Salzberg S (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Disc 1(3):317–328
Article Google Scholar
Thompson P (2001) Automatic categorization of case law. In: Proceedings of the eighth international conference on artificial intelligence and law. ACM Press, pp 70–77
Turtle H (1995) Text retrieval in the legal world. Artif Intell Law 3:5–54
Article Google Scholar
Uyttendaele C, Moens M-F, Dumortier J (1998) SALOMON: automatic abstracting of legal cases for effective access to court decisions. Artif Intell Law 6:59–79
Article Google Scholar
Vossos G (1995) Incorporating inductive case-based reasoning into an object-oriented deductive legal knowledge based system. Ph.D. dissertation, Latrobe University, pp 146, 157
Wardeh M, Bench-Capon T, Coenen F (2008) Argument based moderation of benefit assessment. In: Legal knowledge and information systems, Proceedings, Jurix 2008: The twenty-first annual conference, pp 128–137
Weber R (1998) Intelligent jurisprudence research. Doctoral dissertation. Federal University of Santa Catarina, Florianópolis, Brazil
Witten I, Eibe F (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
Zeleznikow J, Hunter D (1994) Building intelligent legal information systems–representations and reasoning in law. Kluwer, Amsterdam
Google Scholar

Download references

Acknowledgment

The research described here has been supported by Grant No. IDM-9987869 from the National Science Foundation.

Author information

Authors and Affiliations

Learning Research and Development Center, University of Pittsburgh, Pittsburgh, PA, USA
Kevin D. Ashley
Graduate Program in Intelligent Systems, University of Pittsburgh, Pittsburgh, PA, USA
Stefanie Brüninghaus

Authors

Kevin D. Ashley
View author publications
You can also search for this author in PubMed Google Scholar
Stefanie Brüninghaus
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin D. Ashley.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ashley, K.D., Brüninghaus, S. Automatically classifying case texts and predicting outcomes. Artif Intell Law 17, 125–165 (2009). https://doi.org/10.1007/s10506-009-9077-9

Download citation

Received: 26 March 2009
Accepted: 22 June 2009
Published: 09 July 2009
Issue Date: June 2009
DOI: https://doi.org/10.1007/s10506-009-9077-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Automatically classifying case texts and predicting outcomes

Abstract

Access this article

Similar content being viewed by others

The (real) need for a human touch: testing a human–machine hybrid topic classification workflow on a New York Times corpus

Email Answering by Matching Question and Context-Specific Text Patterns: Performance and Error Analysis

Deceptive XAI: Typology, Creation and Detection

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatically classifying case texts and predicting outcomes

Abstract

Access this article

Similar content being viewed by others

The (real) need for a human touch: testing a human–machine hybrid topic classification workflow on a New York Times corpus

Email Answering by Matching Question and Context-Specific Text Patterns: Performance and Error Analysis

Deceptive XAI: Typology, Creation and Detection

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation