Skip to main content
Log in

Semi-automatic knowledge population in a legal document management system

  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

Every organization has to deal with operational risks, arising from the execution of a company’s primary business functions. In this paper, we describe a legal knowledge management system which helps users understand the meaning of legislative text and the relationship between norms. While much of the knowledge requires the input of legal experts, we focus in this article on NLP applications that semi-automate essential time-consuming and lower-skill tasks—classifying legal documents, identifying cross-references and legislative amendments, linking legal terms to the most relevant definitions, and extracting key elements of legal provisions to facilitate clarity and advanced search options. The use of Natural Language Processing tools to semi-automate such tasks makes the proposal a realistic commercial prospect as it helps keep costs down while allowing greater coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.normattiva.it.

  2. http://arianna.consiglioregionale.piemonte.it/.

  3. www.xmleges.org

  4. The Arianna portal already exports documents to NIR XML format.

  5. https://ec.europa.eu/jrc/en/publication/contributions-conferences/jrc-acquis-multilingual-aligned-parallel-corpus-20-languages.

  6. We specified a maximum distance of 2 words in order to encompass both sentences of the form ‘Il rif1 è soppresso’ (The rif1 is suppressed) and sentences of the form ‘Il rif1 è stato soppresso’ (The rif1 has been suppressed). In Italian, the lemma of both words ‘è’ and ‘stato’ is ‘essere’.

  7. http://www.cirsfid.unibo.it.

References

  • Ajani G, Boella G, Caro L, Robaldo L, Humphreys L, Praduroux S, Rossi P, Violato A (2016) The European Taxonomy Syllabus: a multi-lingual, multi-level ontology framework to untangle the web of european legal terminology. Appl Ontol 11(4):325–375

    Article  Google Scholar 

  • Ajani G, Lesmo L, Boella G, Mazzei A, Rossi P (2007) Terminological and ontological analysis of european directives: multilinguism in law. In: Proceedings of the 11th international conference on artificial intelligence and law: ICAIL. ACM, pp 43–48

  • Berland M, Charniak E (1999) Finding parts in very large corpora. In: Annual meeting association for computational linguistics, vol 37. Association for Computational Linguistics, pp 57–64

  • Biagioli C, Francesconi E, Passerini A, Montemagni S, Soria C (2005) Automatic semantics extraction in law documents. In: Proceedings of the tenth international conference on artificial intelligence and law: ICAIL. ACM, pp 133–140

  • Biemann C (2005) Ontology learning from text: a survey of methods. LDV Forum 20:75–93

    Google Scholar 

  • Boella G, Di Caro L, Graziadei M, Cupi L, Salaroglio CE, Humphreys L, Konstantinov H, Marko K, Robaldo L, Ruffini C et al (2015) Linking legal open data: breaking the accessibility and language barrier in european legislation and case law. In: Proceedings of the 15th international conference on artificial intelligence and law. ACM, pp 171–175

  • Boella G, Di Caro L, Humphreys L, Robaldo L, van der Torre L (2012) Nlp challenges for eunomos, a tool to build and manage legal knowledge. In: Language resources and evaluation (LREC), pp 3672–3678

  • Boella G, Di Caro L, Robaldo L (2013) Semantic relation extraction from legislative text using generalized syntactic dependencies and support vector machines. In: International workshop on rules and rule markup languages for the semantic web. Springer, pp 218–225

  • Bosco C, Montemagni A, Mazzei A, Lombardo V, Dell’Orletta F, Lenci A, Lesmo L, Attardi G, Simi M, Lavelli A, Hall J, Nilsson J, Nivre J (2010) Comparing italian parsers on a common treebank: the evalita experience. In: Proceedings of the 6th international conference on language resources and evaluation (LREC 2010)

  • Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771

    Article  Google Scholar 

  • Buitelaar P, Cimiano P, Magnini B (2005) Ontology learning from text: an overview. Ontol Learn Text Methods Eval Appl 123:3–12

    Google Scholar 

  • Candan KS, Di Caro L, Sapino ML (2008) Creating tag hierarchies for effective navigation in social media. In: Proceedings of the 2008 ACM workshop on search in social media. ACM, pp 75–82

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  • de Maat E, Krabben K, Winkels R (2010a) Machine learning versus knowledge based classification of legal texts. In: Proceedings of legal knowledge and information systems conference: JURIX 2010, pp 87–96

  • de Maat E, Krabben K, Winkels R (2010b) Machine learning versus knowledge based classification of legal texts. In: Proceedings of legal knowledge and information systems conference: JURIX 2010. IOS Press, pp 87–96

  • Del Gaudio R, Branco A (2007) Automatic extraction of definitions in Portuguese: a rule-based approach. In: Progress in artificial intelligence, pp 659–670

  • Di Caro L, Candan KS, Sapino ML (2008) Using tagflake for condensing navigable tag hierarchies from tag clouds. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1069–1072

  • Di Caro L, Candan KS, Sapino ML (2011) Navigating within news collections using tag-flakes. J Vis Lang Comput 22(2):120–139

    Article  Google Scholar 

  • Diplaris S, Tsoumakas G, Mitkas P, Vlahavas I (2005) Protein classification with multiple algorithms. In: Bozanis P, Houstis EN (eds) Advances in informatics. PCI 2005. Lecture notes in computer science, vol 3746. Springer, Berlin

  • Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874

    MATH  Google Scholar 

  • Fortuna B, Mladenič D, Grobelnik M (2006) Semi-automatic construction of topic ontologies. In: Ackermann M et al (eds) Semantics, web and mining. EWMF 2005, KDO 2005. Lecture notes in computer science, vol 4289. Springer, Berlin

  • Harris Z (1954) Distributional structure. Word 10(23):146–162

    Article  Google Scholar 

  • Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics-volume 2. Association for Computational Linguistics, pp 539–545

  • Ittoo A, Bouma G (2013) Minimally-supervised extraction of domain-specific part-whole relations using wikipedia as knowledge-base. Data Knowl Eng 85:57–79

    Article  Google Scholar 

  • Lauser B, Hotho A (2003) Automatic multi-label subject indexing in a multilingual environment. In: Koch T, Sølvberg IT (eds) Research and advanced technology for digital libraries. ECDL 2003. Lecture Notes in Computer Science, vol 2769. Springer, Berlin, pp 140–151

  • Lesmo L (2007) The rule-based parser of the NLP group of the University of Torino. Intell Artif 2(4):46–47

    Google Scholar 

  • Lesmo L (2009) The turin university parser at evalita 2009. In: Proceedings of EVALITA, p 9

  • Lesmo L, Mazzei A, Palmirani M, Radicioni DP (2013) Tulsi: an nlp system for extracting legal modificatory provisions. Artif Intell Law 21(2):139–172

    Article  Google Scholar 

  • Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  • Moschitti A, Bejan CA (2004) A semantic kernel for predicate argument classification. In: CoNLL-2004

  • Navigli R, Velardi P (2010) Learning word-class lattices for definition and hypernym extraction. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden. Association for Computational Linguistics, pp 1318–1327

  • Ponzetto SP, Strube M (2007) Deriving a large scale taxonomy from wikipedia. In: Proceedings of the 22nd national conference on artificial intelligence, vol 2. MIT Press, Cambridge, pp 1440–1445

  • Robaldo L (2010) Interpretation and inference with maximal referential terms. J Comput Syst Sci 76(5):373–388

    Article  MathSciNet  MATH  Google Scholar 

  • Robaldo L (2011) Distributivity, collectivity, and cumulativity in terms of (in)dependence and maximality. J Log Lang Inf 20(2):233–271

    Article  MathSciNet  MATH  Google Scholar 

  • Robaldo L, Caselli T, Russo I, Grella M (2011) From italian text to timeml document via dependency parsing. In: Proceedings of the 12th international computational linguistics and intelligent text processing conference (CICLing 2011), Tokyo, Japan, 2011, pp 177–187

  • Robaldo L, Di Caro L, Antonini A (2013) Sentitagger - automatically tagging text in opinionmining-ml. In: ESSEM@AI*IA, volume 1096 of CEUR workshop proceedings. CEUR-WS.org, pp 177–180

  • Robaldo L, Sun X (2017) Reified input/output logic: combining input/output logic and reification to represent norms coming from existing legislation. J Log Comput 27(8):2471–2503

    Article  MathSciNet  MATH  Google Scholar 

  • Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523

    Article  Google Scholar 

  • Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18:613–620

    Article  MATH  Google Scholar 

  • Steinberger R, Mohamed E, Turchi M (2012) Jrc eurovoc indexer jex-a freely available multilabel categorisation tool. In: Proceedings of the 8th international conference on language resources and evaluation (LREC 2012)

  • Tran OT, Bach NX, Le NM, Shimazu A (2014) Automated reference resolution in legal texts. Artif Intell Law 22(1):29–60

    Article  Google Scholar 

  • Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min (IJDWM) 3(3):1–13

    Article  Google Scholar 

  • Velardi P, Faralli S, Navigli R (2013) Ontolearn reloaded: a graph-based algorithm for taxonomy induction. Comput Linguist 39:665–707

    Article  Google Scholar 

  • Yamada I, Torisawa K, Kazama J, Kuroda K, Murata M, De Saeger S, Bond F, Sumida A (2009) Hypernym discovery based on distributional similarity and hierarchical structures. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 2. Association for Computational Linguistics, pp 929–937

  • Yang H, Callan J (2008) Ontology generation for large email collections. In: Proceedings of the 2008 international conference on Digital government research. Digital Government Society of North America, pp 254–261

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luigi Di Caro.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boella, G., Di Caro, L. & Leone, V. Semi-automatic knowledge population in a legal document management system. Artif Intell Law 27, 227–251 (2019). https://doi.org/10.1007/s10506-018-9239-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-018-9239-8

Navigation