Network-based filtering for large email collections in E-Discovery

Henseler, Hans

doi:10.1007/s10506-010-9099-3

Network-based filtering for large email collections in E-Discovery

Published: 23 December 2010

Volume 18, pages 413–430, (2010)
Cite this article

Artificial Intelligence and Law Aims and scope Submit manuscript

Hans Henseler¹

463 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

The information overload in E-Discovery proceedings makes reviewing expensive and it increases the risk of failure to produce results on time and consistently. New interactive techniques have been introduced to increase reviewer productivity. In contrast, the techniques presented in this article propose an alternative method that tries to reduce information during culling so that less information needs to be reviewed. The proposed method first focuses on mapping the email collection universe using straightforward statistical methods based on keyword filtering combined with date time and custodian identities. Subsequently, a social network is constructed from the email collection that is analyzed by filtering on date time and keywords. By using the network context we expect to provide a better understanding of the keyword hits and the ability to discard certain parts of the collection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ashley KD, Bridewell W (2009) Emerging AI+law approaches to automating analysis and retrieval of ESI in discovery proceedings. DESI III Global E-Discovery/E-Disclosure workshop, Barcelona. http://www.law.pitt.edu/DESI3_Workshop/Papers/DESI_III.KAshley.pdf
Batagelj V, Mrvar A (2003) Pajek—analysis and visualization of large networks. In: Jünger M, Mutzel P (eds) Graph drawing software. Springer, New York, pp 77–103
Google Scholar
Bobrow D, King T, Lee L (2007) Enhancing legal discovery with linguistic processing. DESI I. Second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings. http://www.umiacs.umd.edu/~oard/desi-ws/papers/bobrow.pdf
Bommarito II MJ, Katz D, Zelner J (2009) Law as a seamless web? Comparison of various network representations of the United States supreme court corpus (1791–2005). In: Proceedings of the 12th international conference on artificial intelligence and law, pp 234–235
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst Arch 30(1–7):107–117. http://infolab.stanford.edu/~backrub/google.html
Google Scholar
Chaplin D (2008) Conceptual search—ESI, litigation and the issue of language. DESI II. Second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings. http://www.cs.ucl.ac.uk/staff/S.Attfield/desi/9.%20Chaplin.pdf
Craswell N, de Vries A, Soboroff I (2005) Overview of the TREC-2005 enterprise track. In: The fourteenth text retrieval conference proceedings (TREC 2005). http://trec.nist.gov/pubs/trec14/papers/ENTERPRISE.OVERVIEW.pdf
Culotta A, Bekkerman R, Mccallum A (2004) Extracting social networks and contact information from email and the web. In CEAS-1. http://www.ceas.cc/papers-2004/176.pdf
Görg C, Stasko J (2008) Jigsaw: investigative analysis on text document collections through visualization. DESI II. Second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings. http://www.cs.ucl.ac.uk/staff/S.Attfield/desi/7.%20Gorg.pdf
Heer J (2005) Exploring Enron: visual data mining of email. Available online at http://jheer.org/enron/
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5): 604–632. http://www.cs.cornell.edu/home/kleinber/auth.pdf
Google Scholar
Klimt B, Yang Y (2004) Introducing the Enron corpus. In: Proceedings of the collaboration, electronic messaging, anti-abuse and spam conference. http://www.ceas.cc/papers-2004/168.pdf
Krause J (2009) In search of the perfect search. ABA J. http://www.abajournal.com/magazine/in_search_of_the_perfect_search
Mazzega P, Bourcier D, Boulet R (2009) The network of French legal codes. In: Proceedings of the 12th international conference on artificial intelligence and law, pp 236–237
Paul G, Baron J (2007) Information inflation: can the legal system adapt? Richmond J Law Technol XIII(3). http://law.richmond.edu/jolt/v13i3/article10.pdf
Reeves A, May C (2008) Term testing: a case study. DESI II. Second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings. http://www.cs.ucl.ac.uk/staff/S.Attfield/desi/4.%20May.pdf
Scott J (1991) Social network analysis. Sage, London
Google Scholar
Socha-Gelbmann (2006) EDRM E-discovery reference model. http://www.edrm.net
Tuulos VH, Perkiö J, Tirri H (2005) Multi-faceted information retrieval system for large scale email archives. In: SIGIR ‘05, pp 683–683. http://cosco.hiit.fi/Articles/wi05-mail.pdf
Viégas F, Boyd D, Nguyen D, Potter J, Donath J (2004) Digital artifacts for remembering and storytelling: post history and social network fragments. In: HICSS-37. http://alumni.media.mit.edu/~fviegas/papers/posthistory_snf.pdf
Viégas F, Golder S, Donath J (2006) Visualizing email content: portraying relationships from conversational histories. In: Proceedings of ACM CHI 2006, pp 979–988. http://www.research.ibm.com/visual/papers/themail_chi_paper.pdf
Weerkamp W, Balog K, de Rijke M (2009) Using contextual information to improve search in email archives. In: 31st European conference on information retrieval conference (ECIR 2009), LNCS 5478, pp 400–411. http://staff.science.uva.nl/~mdr/Publications/Files/ecir2009-discsearch.pdf

Download references

Author information

Authors and Affiliations

Amsterdam University of Applied Sciences, Amsterdam, The Netherlands
Hans Henseler

Authors

Hans Henseler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans Henseler.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Henseler, H. Network-based filtering for large email collections in E-Discovery. Artif Intell Law 18, 413–430 (2010). https://doi.org/10.1007/s10506-010-9099-3

Download citation

Published: 23 December 2010
Issue Date: December 2010
DOI: https://doi.org/10.1007/s10506-010-9099-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Network-based filtering for large email collections in E-Discovery

Abstract

Access this article

Similar content being viewed by others

Social media analytics: a survey of techniques, tools and platforms

A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts

Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Network-based filtering for large email collections in E-Discovery

Abstract

Access this article

Similar content being viewed by others

Social media analytics: a survey of techniques, tools and platforms

A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts

Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation