Abstract
Retrieval of relevant unstructured information from the ever-increasing textual communications of individuals and businesses has become a major barrier to effective litigation/defense, mergers/acquisitions, and regulatory compliance. Such e-discovery requires simultaneously high precision with high recall (high-P/R) and is therefore a prototype for many legal reasoning tasks. The requisite exhaustive information retrieval (IR) system must employ very different techniques than those applicable in the hyper-precise, consumer search task where insignificant recall is the accepted norm. We apply Russell, et al.’s cognitive task analysis of sensemaking by intelligence analysts to develop a semi-autonomous system that achieves high IR accuracy of F1 ≥ 0.8 compared to F1 < 0.4 typical of computer-assisted human-assessment (CAHA) or alternative approaches such as Roitblat, et al.’s. By understanding the ‘Learning Loop Complexes’ of lawyers engaged in successful small-scale document review, we have used socio-technical design principles to create roles, processes, and technologies for scalable human-assisted computer-assessment (HACA). Results from the NIST-TREC Legal Track’s interactive task from both 2008 and 2009 validate the efficacy of this sensemaking approach to the high-P/R IR task.
Similar content being viewed by others
Notes
Often “sense-making”.
A key consideration at this stage in the legal process is that discovery will form the evidential basis supporting legal argumentation in the case. To the extent that this evidence is included in the documentary evidence in possession of the litigator’s own client, this should be considered for production, even if not requested within a purely adversarial response to opposing requests.
Much of the Electronic Discovery Reference Model (EDRM 2010) is actually concerned with the pre-processing steps by which data is made ready for sensemaking.
The regular expression b ?[1-9lIi] ?g ?[1-9lIi] ?f matches patterns of the form ‘ b1g1f ’ with and without spaces and with various digits ( 1-9 ) or letters that are often substituted for digits under OCR ( lIi ). For example, ‘ b 1 g f ’, ‘ b2g1f ’, and ‘ blglf ’ all match.
Systems whose type (HACA or non-HACA) cannot be determined because of insufficient documentation are marked as “Unknown”.
References
Bauer RS, Jade T, Hedin B, Hogan C (2008) Automated legal sensemaking: the centrality of relevance and intentionality. In: Proceedings of the second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings (DESI II)
Bauer RS, Brassil D, Hogan C, Taranto G, Brown JS (2009) Impedance matching of humans ⇔ machines in high-Q information retrieval systems. In: Proceedings of the 2009 IEEE international conference on systems, man, and cybernetics
Belkin N (1980) Anomolous states of knowledge as a basis for information retrieval. Can J Inf Sci 5:133–143
Blair DC, Maron ME (1985) An evaluation of retrieval effectiveness for a full-text document retrieval system. Commun ACM 28(3):289–299
Card SK (2005) The science of analytical reasoning. In: Illuminating the path: the research and development agenda for visual analytics. National Visualization and Analytics Center, Richland, WA. http://nvac.pnl.gov/agenda.stm#book. Accessed 20 Dec 2009
Cormack GV, Mojdeh M (2010) Machine learning for information retrieval: TREC 2009 web, relevance feedback and legal tracks. In: The eighteenth text retrieval conference (TREC 2009) proceedings
Dervin B (1983) An overview of sense-making research: concepts methods and results. Presented at the International Communication Association annual meeting, Dallas
Dervin B (1992) From the mind’s eye of the user: the sense-making qualitative-quantitative methodology. In: Glazier JD, Powell RR (eds) Qualitative research in information management. Libraries Unlimited CO, Englewood, pp 61–84
EDRM (2010) Electronic discovery reference model. http://edrm.net/. Accessed 4 Jan 2010
Fein BE, Merrell BL, Nelson FE (2010) Backstop LLP and Cleary Gottlied Steen and Hamilton LLP at TREC legal track 2009. In: The eighteenth text retrieval conference (TREC 2009) proceedings
Hogan C, Brassil D, Rugani SM, Reinhart J, Gerber M, Jade T (2009) H5 at TREC 2008 legal interactive: user modeling, assessment & measurement. In: Proceedings of the seventeenth text retrieval conference proceedings (TREC 2008)
Kershaw A (2005) Automated document review proves its reliability. Digit Discov Evid 5(11):10–12
Klein G, Phillips JK, Rall EL, Peluso DA (2006a) A data-frame theory of sensemaking. In: Expertise out of context: proceedings of the sixth international conference on naturalistic decision making
Klein G, Moon B, Hoffman RR (2006b) Making sense of sensemaking 2: a macrocognitive model. IEEE Intell Syst 21(5):88–92
Koenemann J, Belkin NJ (1996) A case for interaction: a study of interactive information retrieval behavior and effectiveness. In: Proceedings of the human factors in computing systems conference (CHI’96). ACM Press, New York
Kuropka D (2004) Modelle zur Repräsentation natürlichsprachlicher Dokumente. Ontologie-basiertes Information-Filtering und -Retrieval mit relationalen Datenbanken. Advances in information systems and management science, Bd. 10. Logos Verlag, Berlin
Lewis DD (1998) Naive (Bayes) at forty: the independence assumption in information retrieval. In: Proceedings ECML, pp 4–15. Springer
Linderman A (2005) Using sense-making methodology in legal and law enforcement investigations. Presented at a non-divisional workshop held at the meeting of the International Communication Association, New York City
Marchionini G (2006) Toward human-computer information retrieval. In: June/July 2006 bulletin of the American society for information science
Marcus S et al. (eds) (2004) Manual for complex litigation, fourth. Federal Judicial Center
Oard DW, Hedin B, Tomlinson S, Baron JR (2009) Overview of the TREC 2008 legal track. In: Proceedings of the seventeenth text retrieval conference proceedings (TREC 2008)
Rangan V, Jiang M (2010) Clearwell systems at TREC 2009 legal interactive. In: The eighteenth text retrieval conference (TREC 2009) proceedings
Roitblat HL, Kershaw A, Oot P (2010) Document categorization in legal electronic discovery: computer classification vs manual review. J Am Soc Inf Sci Technol 61(1):1–11
Rosenfeld L, Morville P (2002) Information architecture for the World Wide Web, 2nd edn. O’Reilly Media, Sebastopol
Russell DM, Stefik MJ, Pirolli PL, Card SK (1993) The cost structure of sensemaking. In: Proceedings of the INTERACT ‘93 and CHI ‘93 conference on human factors in computing systems, pp 269–276
Saracevic T, Spink A, Wu MW (2007) Users and intermediaries in information retrieval: What are they talking about? In: Proceedings of the sixth international conference on user modeling (UM97), pp 43–54
Schaffer TL, Elkins JR (1987) Legal interviewing and counseling in a nutshell, 2nd edn. West Publishing, Rochester
Sterenzy T (2010) EQUIVIO at TREC 2009 legal interactive. in the eighteenth text retrieval conference (TREC 2009) proceedings
Takayama L, Card SK (2008) Tracing the microstructure of sensemaking. In: Proceedings of the CHI 2008 workshop on sensemaking
Thompson P, Turtle H, Yang B, Flood J (1995) TREC-3 Ad Hoc retrieval and routing experiments using the WIN System. In Proceedings of the third text retrieval conference (TREC-3)
Voorhees EM, Harman DK (2005) TREC: experiment and evaluation in information retrieval. The MIT Press, Cambridge, MA
Wang J, Coles C, Elliot R, Adrianakou S (2010a) ZL technologies at TREC 2009 legal interactive: comparing exclusionary and investigative approaches for electronic discovery using the TREC Enron Corpus. In The eighteenth text retrieval conference (TREC 2009) proceedings
Wang J, Sun Y, Thompson P (2010b) TREC 2009 at the University of Buffalo: interactive legal e-discovery with Enron Emails. In the eighteenth text retrieval conference (TREC 2009) proceedings
Willgang TE, Shapard J, Sienstra D, Miletich D (1997) Discovery and disclosure practice, problems, and proposals for change: a case-based national survey of counsel in closed federal civil cases. Reports on discovery for the advisory committee on civil rules of the judicial conference of the United States, federal judicial center
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hogan, C., Bauer, R.S. & Brassil, D. Automation of legal sensemaking in e-discovery. Artif Intell Law 18, 431–457 (2010). https://doi.org/10.1007/s10506-010-9100-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10506-010-9100-1