Skip to main content
Log in

Automation of legal sensemaking in e-discovery

  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

Retrieval of relevant unstructured information from the ever-increasing textual communications of individuals and businesses has become a major barrier to effective litigation/defense, mergers/acquisitions, and regulatory compliance. Such e-discovery requires simultaneously high precision with high recall (high-P/R) and is therefore a prototype for many legal reasoning tasks. The requisite exhaustive information retrieval (IR) system must employ very different techniques than those applicable in the hyper-precise, consumer search task where insignificant recall is the accepted norm. We apply Russell, et al.’s cognitive task analysis of sensemaking by intelligence analysts to develop a semi-autonomous system that achieves high IR accuracy of F1 ≥ 0.8 compared to F1 < 0.4 typical of computer-assisted human-assessment (CAHA) or alternative approaches such as Roitblat, et al.’s. By understanding the ‘Learning Loop Complexes’ of lawyers engaged in successful small-scale document review, we have used socio-technical design principles to create roles, processes, and technologies for scalable human-assisted computer-assessment (HACA). Results from the NIST-TREC Legal Track’s interactive task from both 2008 and 2009 validate the efficacy of this sensemaking approach to the high-P/R IR task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Often “sense-making”.

  2. A key consideration at this stage in the legal process is that discovery will form the evidential basis supporting legal argumentation in the case. To the extent that this evidence is included in the documentary evidence in possession of the litigator’s own client, this should be considered for production, even if not requested within a purely adversarial response to opposing requests.

  3. This process, termed User Modeling in the IR field, is further explored in (Hogan et al. 2009) and (Belkin 1980).

  4. Much of the Electronic Discovery Reference Model (EDRM 2010) is actually concerned with the pre-processing steps by which data is made ready for sensemaking.

  5. The regular expression b ?[1-9lIi] ?g ?[1-9lIi] ?f matches patterns of the form ‘ b1g1f ’ with and without spaces and with various digits ( 1-9 ) or letters that are often substituted for digits under OCR ( lIi ). For example, ‘ b 1 g f ’, ‘ b2g1f ’, and ‘ blglf ’ all match.

  6. Systems whose type (HACA or non-HACA) cannot be determined because of insufficient documentation are marked as “Unknown”.

References

  • Bauer RS, Jade T, Hedin B, Hogan C (2008) Automated legal sensemaking: the centrality of relevance and intentionality. In: Proceedings of the second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings (DESI II)

  • Bauer RS, Brassil D, Hogan C, Taranto G, Brown JS (2009) Impedance matching of humans ⇔ machines in high-Q information retrieval systems. In: Proceedings of the 2009 IEEE international conference on systems, man, and cybernetics

  • Belkin N (1980) Anomolous states of knowledge as a basis for information retrieval. Can J Inf Sci 5:133–143

    Google Scholar 

  • Blair DC, Maron ME (1985) An evaluation of retrieval effectiveness for a full-text document retrieval system. Commun ACM 28(3):289–299

    Article  Google Scholar 

  • Card SK (2005) The science of analytical reasoning. In: Illuminating the path: the research and development agenda for visual analytics. National Visualization and Analytics Center, Richland, WA. http://nvac.pnl.gov/agenda.stm#book. Accessed 20 Dec 2009

  • Cormack GV, Mojdeh M (2010) Machine learning for information retrieval: TREC 2009 web, relevance feedback and legal tracks. In: The eighteenth text retrieval conference (TREC 2009) proceedings

  • Dervin B (1983) An overview of sense-making research: concepts methods and results. Presented at the International Communication Association annual meeting, Dallas

    Google Scholar 

  • Dervin B (1992) From the mind’s eye of the user: the sense-making qualitative-quantitative methodology. In: Glazier JD, Powell RR (eds) Qualitative research in information management. Libraries Unlimited CO, Englewood, pp 61–84

    Google Scholar 

  • EDRM (2010) Electronic discovery reference model. http://edrm.net/. Accessed 4 Jan 2010

  • Fein BE, Merrell BL, Nelson FE (2010) Backstop LLP and Cleary Gottlied Steen and Hamilton LLP at TREC legal track 2009. In: The eighteenth text retrieval conference (TREC 2009) proceedings

  • Hogan C, Brassil D, Rugani SM, Reinhart J, Gerber M, Jade T (2009) H5 at TREC 2008 legal interactive: user modeling, assessment & measurement. In: Proceedings of the seventeenth text retrieval conference proceedings (TREC 2008)

  • Kershaw A (2005) Automated document review proves its reliability. Digit Discov Evid 5(11):10–12

    Google Scholar 

  • Klein G, Phillips JK, Rall EL, Peluso DA (2006a) A data-frame theory of sensemaking. In: Expertise out of context: proceedings of the sixth international conference on naturalistic decision making

  • Klein G, Moon B, Hoffman RR (2006b) Making sense of sensemaking 2: a macrocognitive model. IEEE Intell Syst 21(5):88–92

    Article  Google Scholar 

  • Koenemann J, Belkin NJ (1996) A case for interaction: a study of interactive information retrieval behavior and effectiveness. In: Proceedings of the human factors in computing systems conference (CHI’96). ACM Press, New York

  • Kuropka D (2004) Modelle zur Repräsentation natürlichsprachlicher Dokumente. Ontologie-basiertes Information-Filtering und -Retrieval mit relationalen Datenbanken. Advances in information systems and management science, Bd. 10. Logos Verlag, Berlin

    Google Scholar 

  • Lewis DD (1998) Naive (Bayes) at forty: the independence assumption in information retrieval. In: Proceedings ECML, pp 4–15. Springer

  • Linderman A (2005) Using sense-making methodology in legal and law enforcement investigations. Presented at a non-divisional workshop held at the meeting of the International Communication Association, New York City

  • Marchionini G (2006) Toward human-computer information retrieval. In: June/July 2006 bulletin of the American society for information science

  • Marcus S et al. (eds) (2004) Manual for complex litigation, fourth. Federal Judicial Center

  • Oard DW, Hedin B, Tomlinson S, Baron JR (2009) Overview of the TREC 2008 legal track. In: Proceedings of the seventeenth text retrieval conference proceedings (TREC 2008)

  • Rangan V, Jiang M (2010) Clearwell systems at TREC 2009 legal interactive. In: The eighteenth text retrieval conference (TREC 2009) proceedings

  • Roitblat HL, Kershaw A, Oot P (2010) Document categorization in legal electronic discovery: computer classification vs manual review. J Am Soc Inf Sci Technol 61(1):1–11

    Google Scholar 

  • Rosenfeld L, Morville P (2002) Information architecture for the World Wide Web, 2nd edn. O’Reilly Media, Sebastopol

    Google Scholar 

  • Russell DM, Stefik MJ, Pirolli PL, Card SK (1993) The cost structure of sensemaking. In: Proceedings of the INTERACT ‘93 and CHI ‘93 conference on human factors in computing systems, pp 269–276

  • Saracevic T, Spink A, Wu MW (2007) Users and intermediaries in information retrieval: What are they talking about? In: Proceedings of the sixth international conference on user modeling (UM97), pp 43–54

  • Schaffer TL, Elkins JR (1987) Legal interviewing and counseling in a nutshell, 2nd edn. West Publishing, Rochester

    Google Scholar 

  • Sterenzy T (2010) EQUIVIO at TREC 2009 legal interactive. in the eighteenth text retrieval conference (TREC 2009) proceedings

  • Takayama L, Card SK (2008) Tracing the microstructure of sensemaking. In: Proceedings of the CHI 2008 workshop on sensemaking

  • Thompson P, Turtle H, Yang B, Flood J (1995) TREC-3 Ad Hoc retrieval and routing experiments using the WIN System. In Proceedings of the third text retrieval conference (TREC-3)

  • Voorhees EM, Harman DK (2005) TREC: experiment and evaluation in information retrieval. The MIT Press, Cambridge, MA

    Google Scholar 

  • Wang J, Coles C, Elliot R, Adrianakou S (2010a) ZL technologies at TREC 2009 legal interactive: comparing exclusionary and investigative approaches for electronic discovery using the TREC Enron Corpus. In The eighteenth text retrieval conference (TREC 2009) proceedings

  • Wang J, Sun Y, Thompson P (2010b) TREC 2009 at the University of Buffalo: interactive legal e-discovery with Enron Emails. In the eighteenth text retrieval conference (TREC 2009) proceedings

  • Willgang TE, Shapard J, Sienstra D, Miletich D (1997) Discovery and disclosure practice, problems, and proposals for change: a case-based national survey of counsel in closed federal civil cases. Reports on discovery for the advisory committee on civil rules of the judicial conference of the United States, federal judicial center

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christopher Hogan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hogan, C., Bauer, R.S. & Brassil, D. Automation of legal sensemaking in e-discovery. Artif Intell Law 18, 431–457 (2010). https://doi.org/10.1007/s10506-010-9100-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-010-9100-1

Keywords

Navigation