Abstract
The information overload in E-Discovery proceedings makes reviewing expensive and it increases the risk of failure to produce results on time and consistently. New interactive techniques have been introduced to increase reviewer productivity. In contrast, the techniques presented in this article propose an alternative method that tries to reduce information during culling so that less information needs to be reviewed. The proposed method first focuses on mapping the email collection universe using straightforward statistical methods based on keyword filtering combined with date time and custodian identities. Subsequently, a social network is constructed from the email collection that is analyzed by filtering on date time and keywords. By using the network context we expect to provide a better understanding of the keyword hits and the ability to discard certain parts of the collection.
Similar content being viewed by others
References
Ashley KD, Bridewell W (2009) Emerging AI+law approaches to automating analysis and retrieval of ESI in discovery proceedings. DESI III Global E-Discovery/E-Disclosure workshop, Barcelona. http://www.law.pitt.edu/DESI3_Workshop/Papers/DESI_III.KAshley.pdf
Batagelj V, Mrvar A (2003) Pajek—analysis and visualization of large networks. In: Jünger M, Mutzel P (eds) Graph drawing software. Springer, New York, pp 77–103
Bobrow D, King T, Lee L (2007) Enhancing legal discovery with linguistic processing. DESI I. Second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings. http://www.umiacs.umd.edu/~oard/desi-ws/papers/bobrow.pdf
Bommarito II MJ, Katz D, Zelner J (2009) Law as a seamless web? Comparison of various network representations of the United States supreme court corpus (1791–2005). In: Proceedings of the 12th international conference on artificial intelligence and law, pp 234–235
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst Arch 30(1–7):107–117. http://infolab.stanford.edu/~backrub/google.html
Chaplin D (2008) Conceptual search—ESI, litigation and the issue of language. DESI II. Second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings. http://www.cs.ucl.ac.uk/staff/S.Attfield/desi/9.%20Chaplin.pdf
Craswell N, de Vries A, Soboroff I (2005) Overview of the TREC-2005 enterprise track. In: The fourteenth text retrieval conference proceedings (TREC 2005). http://trec.nist.gov/pubs/trec14/papers/ENTERPRISE.OVERVIEW.pdf
Culotta A, Bekkerman R, Mccallum A (2004) Extracting social networks and contact information from email and the web. In CEAS-1. http://www.ceas.cc/papers-2004/176.pdf
Görg C, Stasko J (2008) Jigsaw: investigative analysis on text document collections through visualization. DESI II. Second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings. http://www.cs.ucl.ac.uk/staff/S.Attfield/desi/7.%20Gorg.pdf
Heer J (2005) Exploring Enron: visual data mining of email. Available online at http://jheer.org/enron/
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5): 604–632. http://www.cs.cornell.edu/home/kleinber/auth.pdf
Klimt B, Yang Y (2004) Introducing the Enron corpus. In: Proceedings of the collaboration, electronic messaging, anti-abuse and spam conference. http://www.ceas.cc/papers-2004/168.pdf
Krause J (2009) In search of the perfect search. ABA J. http://www.abajournal.com/magazine/in_search_of_the_perfect_search
Mazzega P, Bourcier D, Boulet R (2009) The network of French legal codes. In: Proceedings of the 12th international conference on artificial intelligence and law, pp 236–237
Paul G, Baron J (2007) Information inflation: can the legal system adapt? Richmond J Law Technol XIII(3). http://law.richmond.edu/jolt/v13i3/article10.pdf
Reeves A, May C (2008) Term testing: a case study. DESI II. Second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings. http://www.cs.ucl.ac.uk/staff/S.Attfield/desi/4.%20May.pdf
Scott J (1991) Social network analysis. Sage, London
Socha-Gelbmann (2006) EDRM E-discovery reference model. http://www.edrm.net
Tuulos VH, Perkiö J, Tirri H (2005) Multi-faceted information retrieval system for large scale email archives. In: SIGIR ‘05, pp 683–683. http://cosco.hiit.fi/Articles/wi05-mail.pdf
Viégas F, Boyd D, Nguyen D, Potter J, Donath J (2004) Digital artifacts for remembering and storytelling: post history and social network fragments. In: HICSS-37. http://alumni.media.mit.edu/~fviegas/papers/posthistory_snf.pdf
Viégas F, Golder S, Donath J (2006) Visualizing email content: portraying relationships from conversational histories. In: Proceedings of ACM CHI 2006, pp 979–988. http://www.research.ibm.com/visual/papers/themail_chi_paper.pdf
Weerkamp W, Balog K, de Rijke M (2009) Using contextual information to improve search in email archives. In: 31st European conference on information retrieval conference (ECIR 2009), LNCS 5478, pp 400–411. http://staff.science.uva.nl/~mdr/Publications/Files/ecir2009-discsearch.pdf
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Henseler, H. Network-based filtering for large email collections in E-Discovery. Artif Intell Law 18, 413–430 (2010). https://doi.org/10.1007/s10506-010-9099-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10506-010-9099-3