Skip to main content

Advertisement

Log in

Evaluation of information retrieval for E-discovery

  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

The effectiveness of information retrieval technology in electronic discovery (E-discovery) has become the subject of judicial rulings and practitioner controversy. The scale and nature of E-discovery tasks, however, has pushed traditional information retrieval evaluation approaches to their limits. This paper reviews the legal and operational context of E-discovery and the approaches to evaluating search technology that have evolved in the research community. It then describes a multi-year effort carried out as part of the Text Retrieval Conference to develop evaluation methods for responsive review tasks in E-discovery. This work has led to new approaches to measuring effectiveness in both batch and interactive frameworks, large data sets, and some surprising results for the recall and precision of Boolean and statistical information retrieval methods. The paper concludes by offering some thoughts about future research in both the legal and technical communities toward the goal of reliable, effective use of information retrieval in E-discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. See, e.g., Sarbanes-Oxley Act, Title 18 of the U.S. Code, Section 1519 (U.S. securities industry requirement to preserve email for 7 years); National Archives and Records Administration regulations, 36 Code of Federal Regulations Part 1236.22 (all email that is considered to fall within the definition of “federal records” under Title 44 of the U.S. Code, Section 3301, must be archived in either paper or electronic systems).

  2. See, e.g., Qualcomm v. Broadcom, 539 F. Supp. 2d 1214 (S.D. Cal 2007), rev’d 2010 WL 1336937 (S.D. Cal. Apr. 2, 2010); In re Fannie Mae Litigation, 552 F.3d 814 (D.C. Cir. 2009).

  3. http://www.forrester.com/Research/Document/Excerpt/0,7211,40619,00.html

  4. See also Report of Anton R. Valukas, Examiner, In re Lehman Brothers Holdings Inc. (U.S. Bankruptcy Ct. S.D.N.Y. March 11, 2010), vol. 7, Appx. 5 (350 billion pages subjected to dozens of Boolean searches), available at http://lehmanreport.jenner.com/

  5. See Practice Point 1 in (The Sedona Conference 2007b) (referred to herein as the “Sedona Search Commentary”).

  6. Zubulake v. UBS Warburg LLC, 217 F.R.D. 309, 311 (2003); see generally (The Sedona Conference 2007a).

  7. See Pension Committee of the University of Montreal Pension Plan et al. v Banc of America Securities LLC, et al., 2010 WL 184312, *1 (S.D.N.Y. Jan. 15, 2010) (“Courts cannot and do not expect that any party can reach a standard of perfection.”).

  8. See (The Sedona Conference 2007b) at 202.

  9. There is, at virtually all times, an admitted asymmetry of knowledge as between the requesting party (who does not own and therefore does not know what is in the target data collection), and the receiving or responding party (who does own the collection and thus in theory could know its contents). For an exploration into ethical questions encountered when the existence of documents is not reached by a given keyword search method, see (Baron 2009).

  10. For example, see, People of the State of California v. Philip Morris, et al., Case No. J.C.C.P. 4041 (Sup. Ct. Cal.) (December 9, 1998 consent decree incorporating terms of Master Settlement Agreement or “MSA”). These documents have for the most part been digitized using Optical Character Recognition (OCR) technology and are available online on various Web sites. See the Legacy Tobacco Documents Library, available at http://legacy.library.ucsf.edu/. Portions of the MSA collection have been used in the TREC Legal Track.

  11. As used by E-discovery practitioners, “keyword search” most often refers to the use of single query terms to identify the set of all documents containing that term as part of a pre-processing step to identify documents that merit manual review.

  12. (The Sedona Conference 2007b) at 201. See also (Paul and Baron 2007).

  13. Id. at 202–03; 217 (Appendix describing alternative search methods at greater length).

  14. Id. at 202–03.

  15. In re Lorazepam & Clorazepate Antitrust Litigation, 300 F. Supp. 2d 43 (D.D.C. 2004).

  16. J.C. Associates v. Fidelity & Guaranty Ins. Co., 2006 WL 1445173 (D.D.C. 2006).

  17. For example, see Medtronic Sofamor Danck, Inc. v. Michelson, 229 F.R.D. 550 (W.D. Tenn. 2003); Treppel v. Biovail, 233 F.R.D. 363, 368–69 (S.D.N.Y. 2006) (court describes plaintiff’s refusal to cooperate with defendant in the latter’s suggestion to enter into a stipulation defining the keyword search terms to be used as a “missed opportunity” and goes on to require that certain terms be used); see also Alexander v. FBI, 194 F.R.D. 316 (D.D.C. 2000) (court places limitations on the scope of plaintiffs’ proposed keywords in a case involving White House email).

  18. In addition to cases discussed infra, see, e.g., Dunkin Donuts Franchised Restaurants, Inc. v. Grand Central Donuts, Inc, 2009 WL 175038 (E.D.N.Y. June 19, 2009) (parties directed to meet and confer on developing a workable search protocol); ClearOne Communications, Inc. v. Chiang, 2008 WL 920336 (D. Utah April 1, 2008) (court adjudicates dispute over conjunctive versus disjunctive Boolean operators).

  19. 242 F.R.D. 139 (D.D.C. 2007)

  20. Id. at 148 (citing to (Paul and Baron 2007), supra).

  21. 537 F. Supp. 2d 14, 24 (D.D.C. 2008).

  22. Id. at 16 (quoting U.S. v. O’Keefe, 2007 WL 1239204, at *3 (D.D.C. April 27, 2007)) (internal quotations omitted).

  23. 537 F. Supp. 2d at 16.

  24. Based only on what is known from the opinion, it is admittedly somewhat difficult to parse the syntax used in this search string. One is left to surmise that the ambiguity present on the face of the search protocol may have contributed to the court finding the matter of adjudicating a proper search string to be too difficult a task.

  25. 537 F. Supp. 2d at 24.

  26. Equity Analytics v. Lundin, 248 F.R.D. 331 (D.D.C. 2008) (stating that in O’Keefe “I recently commented that lawyers express as facts what are actually highly debatable propositions as to efficacy of various methods used to search electronically stored information,” and requiring an expert to describe scope of proposed search); see also discussion of Victor Stanley, Inc. v. Creative Pipe, Inc., infra.

  27. 250 F.R.D. 251 (D. Md. 2008).

  28. Id. at 254.

  29. Id. at 256–57.

  30. Id. at 259 n.9.

  31. Id. at 260 n.10.

  32. William A. Gross Construction Assocs., Inc. v. Am. Mftrs. Mutual Ins. Co., 256 F.R.D. 134, 135 (S.D.N.Y. 2009).

  33. We note that at least one important decision has been rendered by a court in the United Kingdom, which in sophisticated fashion similarly has analyzed keyword choices by parties at some length. See Digicel (St. Lucia) Ltd. & Ors. v. Cable & Wireless & Ors., [2008] EWHC 2522 (Ch.).

  34. Strictly speaking, unlike for a set, it is not meaningful to refer to the “recall” or “precision” of a ranking of documents. The popular ranked-based measures of Recall@K and Precision@K (which measure recall and precision of the set of top-ranked K documents) nominally suggest a recall or precision orientation for ranking, but actually compare ranked retrieval systems identically on individual topics. One can observe the recall-precision tradeoff in a ranking, however, by varying the cutoff K; e.g., increasing K will tend to increase recall at the expense of precision.

  35. http://ag.ca.gov/tobacco/msa.php

  36. Clearwell Systems obtained the collection from Aspen Systems and performed the processing described in this section.

  37. A pilot Interactive task was run in 2007 (Tomlinson et al. 2008), but with a very different task design.

  38. As is common, we use “Boolean query” somewhat loosely to mean queries built using not just the three basic Boolean operators (and, or, not), but also truncation and (unordered) proximity operators.

  39. http://edrm.net/

  40. Software Engineering Institute, http://www.sei.cmu.edu/cmmi/general

References

  • American Institute of Certified Public Accountants (2009) Statement on auditing standards no. 70: Service organizations. SAS 70

  • Aslam JA, Pavlu V, Yilmaz E (2006) A statistical method for system evaluation using incomplete judgments. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 541–548

  • Bales S, Wang P (2006) Consolidating user relevance criteria: a meta-ethnography of empirical studies. In: Proceedings of the 42nd annual meeting of the American society for information science and technology

  • Baron JR (2005) Toward a federal benchmarking standard for evaluation of information retrieval products used in E-discovery. Sedona Conf J 6:237–246

    Google Scholar 

  • Baron JR (2007) The TREC legal track: origins and reports from the first year. Sedona Conf J 8:237–246

    Google Scholar 

  • Baron JR (2008) Towards a new jurisprudence of information retrieval: what constitutes a ’reasonable’ search for digital evidence when using keywords?. Digit Evid Electronic Signature Law Rev 5:173–178

    Google Scholar 

  • Baron JR (2009) E-discovery and the problem of asymmetric knowledge. Mercer Law Rev 60:863

    Google Scholar 

  • Baron JR, Thompson P (2007) The search problem posed by large heterogeneous data sets in litigation: possible future approaches to research. In: Proceedings of the 11th international conference on artificial intelligence and law, pp 141–147

  • Baron JR, Lewis DD, Oard DW (2007) TREC-2006 Legal Track overview. In: The fifteenth text retrieval conference proceedings (TREC 2006), pp 79–98

  • Bauer RS, Brassil D, Hogan C, Taranto G, Brown JS (2009) Impedance matching of humans and machines in high-Q information retrieval systems. In: Proceedings of the IEEE international conference on systems, man and cybernetics, pp 97–101

  • Blair D (2006) Wittgenstein, language and information: back to the rough ground. Springer, New York

    Google Scholar 

  • Blair D, Maron ME (1985) An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun ACM 28(3):289–299

    Article  Google Scholar 

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Computer Netw ISDN Syst 30(1–7):107–117

    Article  Google Scholar 

  • Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international conference on research and development in information retrieval, pp 25–32

  • Buckley C, Voorhees EM (2005) Retrieval system evaluation. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 53–75

    Google Scholar 

  • Buckley C, Dimmick D, Soboroff I, Voorhees E (2006) Bias and the limits of pooling. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 619–620

  • Büttcher S, Clarke CLA, Yeung PCK, Soboroff I (2007) Reliable information retrieval evaluation with incomplete and biased judgements. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp 63–70

  • Carmel D, Yom-Tov E, Darlow A, Pelleg D (2006) What makes a query difficult? In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 390–397

  • Carterette B, Pavlu V, Kanoulas E, Aslam JA, Allan J (2008) Evaluation over thousands of queries. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 651–658

  • Clarke C, Craswell N, Soboroff I (2005) The TREC terabyte retrieval track. SIGIR Forum 39(1):25–25

    Article  Google Scholar 

  • Cleverdon C (1967) The Cranfield tests on index language devices. Aslib Proceed 19(6):173–194

    Article  Google Scholar 

  • Cormack GV, Lynam TR (2006) TREC 2005 spam track overview. In: The fourteenth text retrieval conference (TREC 2005), pp 91–108

  • Cormack GV, Palmer CR, Clarke CLA (1998) Efficient construction of large test collections. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, pp 282–289

  • Dumais ST, Belkin NJ (2005) The TREC interactive tracks: putting the user into search. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 123–152

    Google Scholar 

  • Fox EA (1983) Characterization of two new experimental collections in computer and information science containing textual and bibliographic concepts. Tech. rep. TR83-561, Cornell University

  • Harman DK (2005) The TREC test collections. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 21–52

    Google Scholar 

  • Hedin B, Oard DW (2009) Replication and automation of expert judgments: information engineering in legal E-discovery. In: SMC’09: Proceedings of the 2009 IEEE international conference on systems, man and cybernetics, pp 102–107

  • Hedin B, Tomlinson S, Baron JR, Oard DW (2010) Overview of the TREC 2009 Legal Track. In: The eighteenth text retrieval conference (TREC 2009)

  • Ingwersen P (1992) Information retrieval interaction. Taylor Graham, London

    Google Scholar 

  • Ingwersen P, Järvelin K (2005) The turn: integration of information seeking and retrieval in context. Springer

  • International Organization for Standards (2005) Quality management systems—fundamentals and vocabulary. ISO 9000:2005

  • Jensen JH (2000) Special issues involving electronic discovery. Kansas J Law Public Policy 9:425

    Google Scholar 

  • Kando N, Mitamura T, Sakai T (2008) Introduction to the NTCIR-6 special issue. ACM Trans Asian Lang Inform Process 7(2):1–3

    Article  Google Scholar 

  • Kazai G, Lalmas M, Fuhr N, Gövert N (2004) A report on the first year of the initiative for the evaluation of XML retrieval (INEX’02). J Am Soc Inform Sci Technol 55(6):551–556

    Article  Google Scholar 

  • Lewis D, Agam G, Argamon S, Frieder O, Grossman D, Heard J (2006) Building a test collection for complex document information processing. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 665–666

  • Lewis DD (1996) The TREC-4 filtering track. In: The fourth text retrieval conference (TREC-4), pp 165–180

  • Lynam TR, Cormack GV (2009) Multitext legal experiments at TREC 2008. In: The sixteenth text retrieval conference (TREC 2008)

  • Majumder P, Mitra M, Pal D, Bandyopadhyay A, Maiti S, Mitra S, Sen A, Pal S (2008) Text collections for FIRE. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 699–700

  • Moffat A, Zobel J (2008) Rank-biased precision for measurement of retrieval effectiveness. ACM Trans Inf Syst 27(1)

  • Oard DW, Hedin B, Tomlinson S, Baron JR (2009) Overview of the TREC 2008 Legal Track. In: The seventeenth text retrieval conference (TREC 2008)

  • Oot P, Kershaw A, Roitblat HL (2010) Mandating reasonableness in a reasonable inquiry. Denver Univ Law Rev 87:533

    Google Scholar 

  • Paul GL, Baron JR (2007) Information inflation: can the legal system adapt? Richmond J Law Technol 13(3)

  • PCI Security Standards Council (2009) Payment card industry (PCI) data security standard: requirements and security assessment procedures, version 1.2.1. http://www.pcisecuritystandards.org

  • Peters C, Braschler M (2001) European research letter: cross-language system evaluation: the CLEF campaigns. J Am Soc Inf Sci Technol 52(12):1067–1072

    Article  Google Scholar 

  • Roitblat HL, Kershaw A, Oot P (2010) Document categorization in legal electronic discovery: computer classification vs. manual review. J Am Soc Inf Sci Technol 61(1):70–80

    Google Scholar 

  • Sakai T, Kando N (2008) On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf Retr 11(5):447–470

    Article  Google Scholar 

  • Sanderson M, Joho H (2004) Forming test collections with no system pooling. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, pp 33–40

  • Sanderson M, Zobel J (2005) Information retrieval system evaluation: effort, sensitivity, and reliability. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 162–169

  • Schmidt H, Butter K, Rider C (2002) Building digital tobacco document libraries at the University of California, San Francisco Library/Center for Knowledge Management. D-Lib Mag 8(2)

  • Singhal A, Salton G, Buckley C (1995) Length normalization in degraded text collections. In: Proceedings of fifth annual symposium on document analysis and information retrieval, pp 15–17

  • Soboroff I (2007) A comparison of pooled and sampled relevance judgments. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp 785–786

  • Solomon RD, Baron JR (2009) Bake offs, demos & kicking the tires: a practical litigator’s brief guide to evaluating early case assessment software & search & review tools. http://www.kslaw.com/Library/publication/BakeOffs_Solomon.pdf

  • Spärck Jones K, van Rijsbergen CJ (1975) Report on the need for and provision of an ideal information retrieval test collection. Tech. Rep. 5266, Computer Laboratory, University of Cambridge, Cambridge (UK)

  • Taghva K, Borsack J, Condit A (1996) Effects of OCR errors on ranking and feedback using the vector space model. Inf Process Manage 32(3):317–327

    Article  Google Scholar 

  • The Sedona Conference (2007a) The Sedona Principles, second edition: best practice recommendations and principles for addressing electronic document production. http://www.thesedonaconference.org

  • The Sedona Conference (2007b) The Sedona Conference best practices commentary on the use of search and information retrieval methods in E-discovery. The Sedona Conf J 8:189–223

    Google Scholar 

  • The Sedona Conference (2009) The Sedona Conference commentary on achieving quality in the E-discovery process. The Sedona Conf J 10:299–329

    Google Scholar 

  • Tomlinson S (2007) Experiments with the negotiated Boolean queries of the TREC 2006 legal discovery track. In: The fifteenth text retrieval conference (TREC 2006)

  • Tomlinson S, Oard DW, Baron JR, Thompson P (2008) Overview of the TREC 2007 Legal Track. In: The sixteenth text retrieval conference (TREC 2007)

  • Turpin A, Scholer F (2006) User performance versus precision measures for simple search tasks. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 11–18

  • Voorhees EM, Garofolo JS (2005) Retrieving noisy text. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 183–197

    Google Scholar 

  • Voorhees EM, Harman DK (2005) The text retrieval conference. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 3–19

    Google Scholar 

  • Wayne CL (1998) Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation. In: Proceedings of the first international conference on language resources and evaluation

  • Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 15th international conference on information and knowledge management (CIKM), pp 102–111

  • Zhao FC, Oard DW, Baron JR (2009) Improving search effectiveness in the legal E-discovery process using relevance feedback. In: ICAIL 2009 DESI III Global E-Discovery/E-Disclosure Workshop. http://www.law.pitt.edu/DESI3_Workshop/DESI_III_papers.htm

  • Zobel J (1998) How reliable are the results of large-scale information retrieval experiments? In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, pp 307–314

Download references

Acknowledgments

The authors first wish to thank a number of individuals who in discussions with the authors contributed ideas and suggestions that found their way into portions of the present paper, including Thomas Bookwalter, Gordon Cormack, Todd Elmer, Maura Grossman and Richard Mark Soley. Additionally, the TREC Legal Track would not have been possible without the support of Ellen Voorhees and Ian Soboroff of NIST; the faculty, staff and students of IIT, UCSF, Tobacco Documents Online, and Roswell Park Cancer Institute who helped build IIT CDIP or the LTDL on which it was based; Celia White (the 2006 Track expert interactive searcher); Venkat Rangan of Clearwell Systems who helped to build the TREC Enron test collection; Richard Braman of The Sedona Conference® and the hundreds of law students, lawyers and Sedona colleagues who have contributed pro bono time to the project. Finally, the authors wish to thank Kevin Ashley and Jack Conrad for their support of and participation in the First and Third DESI Workshops, held as part of the Eleventh and Twelfth International Conferences on Artificial Intelligence and Law, at which many of the ideas herein were discussed.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Douglas W. Oard.

Additional information

The first three sections of this article draw upon material in the introductory sections of two papers presented at events associated with the 11th and 12th International Conferences on Artificial Intelligence and Law (ICAIL) (Baron and Thompson 2007; Zhao et al. 2009) as well as material first published in (Baron 2008), with permission.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oard, D.W., Baron, J.R., Hedin, B. et al. Evaluation of information retrieval for E-discovery. Artif Intell Law 18, 347–386 (2010). https://doi.org/10.1007/s10506-010-9093-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-010-9093-9

Keywords

Navigation