Automation of legal sensemaking in e-discovery

Hogan, Christopher; Bauer, Robert S.; Brassil, Dan

doi:10.1007/s10506-010-9100-1

Automation of legal sensemaking in e-discovery

Published: 06 October 2010

Volume 18, pages 431–457, (2010)
Cite this article

Artificial Intelligence and Law Aims and scope Submit manuscript

Christopher Hogan¹,
Robert S. Bauer¹ &
Dan Brassil¹

698 Accesses
6 Citations
Explore all metrics

Abstract

Retrieval of relevant unstructured information from the ever-increasing textual communications of individuals and businesses has become a major barrier to effective litigation/defense, mergers/acquisitions, and regulatory compliance. Such e-discovery requires simultaneously high precision with high recall (high-P/R) and is therefore a prototype for many legal reasoning tasks. The requisite exhaustive information retrieval (IR) system must employ very different techniques than those applicable in the hyper-precise, consumer search task where insignificant recall is the accepted norm. We apply Russell, et al.’s cognitive task analysis of sensemaking by intelligence analysts to develop a semi-autonomous system that achieves high IR accuracy of F1 ≥ 0.8 compared to F1 < 0.4 typical of computer-assisted human-assessment (CAHA) or alternative approaches such as Roitblat, et al.’s. By understanding the ‘Learning Loop Complexes’ of lawyers engaged in successful small-scale document review, we have used socio-technical design principles to create roles, processes, and technologies for scalable human-assisted computer-assessment (HACA). Results from the NIST-TREC Legal Track’s interactive task from both 2008 and 2009 validate the efficacy of this sensemaking approach to the high-P/R IR task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Often “sense-making”.
A key consideration at this stage in the legal process is that discovery will form the evidential basis supporting legal argumentation in the case. To the extent that this evidence is included in the documentary evidence in possession of the litigator’s own client, this should be considered for production, even if not requested within a purely adversarial response to opposing requests.
This process, termed User Modeling in the IR field, is further explored in (Hogan et al. 2009) and (Belkin 1980).
Much of the Electronic Discovery Reference Model (EDRM 2010) is actually concerned with the pre-processing steps by which data is made ready for sensemaking.
The regular expression b ?[1-9lIi] ?g ?[1-9lIi] ?f matches patterns of the form ‘ b1g1f ’ with and without spaces and with various digits ( 1-9 ) or letters that are often substituted for digits under OCR ( lIi ). For example, ‘ b 1 g f ’, ‘ b2g1f ’, and ‘ blglf ’ all match.
Systems whose type (HACA or non-HACA) cannot be determined because of insufficient documentation are marked as “Unknown”.

References

Bauer RS, Jade T, Hedin B, Hogan C (2008) Automated legal sensemaking: the centrality of relevance and intentionality. In: Proceedings of the second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings (DESI II)
Bauer RS, Brassil D, Hogan C, Taranto G, Brown JS (2009) Impedance matching of humans ⇔ machines in high-Q information retrieval systems. In: Proceedings of the 2009 IEEE international conference on systems, man, and cybernetics
Belkin N (1980) Anomolous states of knowledge as a basis for information retrieval. Can J Inf Sci 5:133–143
Google Scholar
Blair DC, Maron ME (1985) An evaluation of retrieval effectiveness for a full-text document retrieval system. Commun ACM 28(3):289–299
Article Google Scholar
Card SK (2005) The science of analytical reasoning. In: Illuminating the path: the research and development agenda for visual analytics. National Visualization and Analytics Center, Richland, WA. http://nvac.pnl.gov/agenda.stm#book. Accessed 20 Dec 2009
Cormack GV, Mojdeh M (2010) Machine learning for information retrieval: TREC 2009 web, relevance feedback and legal tracks. In: The eighteenth text retrieval conference (TREC 2009) proceedings
Dervin B (1983) An overview of sense-making research: concepts methods and results. Presented at the International Communication Association annual meeting, Dallas
Google Scholar
Dervin B (1992) From the mind’s eye of the user: the sense-making qualitative-quantitative methodology. In: Glazier JD, Powell RR (eds) Qualitative research in information management. Libraries Unlimited CO, Englewood, pp 61–84
Google Scholar
EDRM (2010) Electronic discovery reference model. http://edrm.net/. Accessed 4 Jan 2010
Fein BE, Merrell BL, Nelson FE (2010) Backstop LLP and Cleary Gottlied Steen and Hamilton LLP at TREC legal track 2009. In: The eighteenth text retrieval conference (TREC 2009) proceedings
Hogan C, Brassil D, Rugani SM, Reinhart J, Gerber M, Jade T (2009) H5 at TREC 2008 legal interactive: user modeling, assessment & measurement. In: Proceedings of the seventeenth text retrieval conference proceedings (TREC 2008)
Kershaw A (2005) Automated document review proves its reliability. Digit Discov Evid 5(11):10–12
Google Scholar
Klein G, Phillips JK, Rall EL, Peluso DA (2006a) A data-frame theory of sensemaking. In: Expertise out of context: proceedings of the sixth international conference on naturalistic decision making
Klein G, Moon B, Hoffman RR (2006b) Making sense of sensemaking 2: a macrocognitive model. IEEE Intell Syst 21(5):88–92
Article Google Scholar
Koenemann J, Belkin NJ (1996) A case for interaction: a study of interactive information retrieval behavior and effectiveness. In: Proceedings of the human factors in computing systems conference (CHI’96). ACM Press, New York
Kuropka D (2004) Modelle zur Repräsentation natürlichsprachlicher Dokumente. Ontologie-basiertes Information-Filtering und -Retrieval mit relationalen Datenbanken. Advances in information systems and management science, Bd. 10. Logos Verlag, Berlin
Google Scholar
Lewis DD (1998) Naive (Bayes) at forty: the independence assumption in information retrieval. In: Proceedings ECML, pp 4–15. Springer
Linderman A (2005) Using sense-making methodology in legal and law enforcement investigations. Presented at a non-divisional workshop held at the meeting of the International Communication Association, New York City
Marchionini G (2006) Toward human-computer information retrieval. In: June/July 2006 bulletin of the American society for information science
Marcus S et al. (eds) (2004) Manual for complex litigation, fourth. Federal Judicial Center
Oard DW, Hedin B, Tomlinson S, Baron JR (2009) Overview of the TREC 2008 legal track. In: Proceedings of the seventeenth text retrieval conference proceedings (TREC 2008)
Rangan V, Jiang M (2010) Clearwell systems at TREC 2009 legal interactive. In: The eighteenth text retrieval conference (TREC 2009) proceedings
Roitblat HL, Kershaw A, Oot P (2010) Document categorization in legal electronic discovery: computer classification vs manual review. J Am Soc Inf Sci Technol 61(1):1–11
Google Scholar
Rosenfeld L, Morville P (2002) Information architecture for the World Wide Web, 2nd edn. O’Reilly Media, Sebastopol
Google Scholar
Russell DM, Stefik MJ, Pirolli PL, Card SK (1993) The cost structure of sensemaking. In: Proceedings of the INTERACT ‘93 and CHI ‘93 conference on human factors in computing systems, pp 269–276
Saracevic T, Spink A, Wu MW (2007) Users and intermediaries in information retrieval: What are they talking about? In: Proceedings of the sixth international conference on user modeling (UM97), pp 43–54
Schaffer TL, Elkins JR (1987) Legal interviewing and counseling in a nutshell, 2nd edn. West Publishing, Rochester
Google Scholar
Sterenzy T (2010) EQUIVIO at TREC 2009 legal interactive. in the eighteenth text retrieval conference (TREC 2009) proceedings
Takayama L, Card SK (2008) Tracing the microstructure of sensemaking. In: Proceedings of the CHI 2008 workshop on sensemaking
Thompson P, Turtle H, Yang B, Flood J (1995) TREC-3 Ad Hoc retrieval and routing experiments using the WIN System. In Proceedings of the third text retrieval conference (TREC-3)
Voorhees EM, Harman DK (2005) TREC: experiment and evaluation in information retrieval. The MIT Press, Cambridge, MA
Google Scholar
Wang J, Coles C, Elliot R, Adrianakou S (2010a) ZL technologies at TREC 2009 legal interactive: comparing exclusionary and investigative approaches for electronic discovery using the TREC Enron Corpus. In The eighteenth text retrieval conference (TREC 2009) proceedings
Wang J, Sun Y, Thompson P (2010b) TREC 2009 at the University of Buffalo: interactive legal e-discovery with Enron Emails. In the eighteenth text retrieval conference (TREC 2009) proceedings
Willgang TE, Shapard J, Sienstra D, Miletich D (1997) Discovery and disclosure practice, problems, and proposals for change: a case-based national survey of counsel in closed federal civil cases. Reports on discovery for the advisory committee on civil rules of the judicial conference of the United States, federal judicial center

Download references

Author information

Authors and Affiliations

H5, San Francisco, CA
Christopher Hogan, Robert S. Bauer & Dan Brassil

Authors

Christopher Hogan
View author publications
You can also search for this author in PubMed Google Scholar
Robert S. Bauer
View author publications
You can also search for this author in PubMed Google Scholar
Dan Brassil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher Hogan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hogan, C., Bauer, R.S. & Brassil, D. Automation of legal sensemaking in e-discovery. Artif Intell Law 18, 431–457 (2010). https://doi.org/10.1007/s10506-010-9100-1

Download citation

Published: 06 October 2010
Issue Date: December 2010
DOI: https://doi.org/10.1007/s10506-010-9100-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automation of legal sensemaking in e-discovery

Abstract

Access this article

Similar content being viewed by others

A System to Support Legal Case Building and Reasoning

Decision support for detecting sensitive text in government records

Designing a Predictive Coding System for Electronic Discovery

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automation of legal sensemaking in e-discovery

Abstract

Access this article

Similar content being viewed by others

A System to Support Legal Case Building and Reasoning

Decision support for detecting sensitive text in government records

Designing a Predictive Coding System for Electronic Discovery

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation