Abstract
Every organization has to deal with operational risks, arising from the execution of a company’s primary business functions. In this paper, we describe a legal knowledge management system which helps users understand the meaning of legislative text and the relationship between norms. While much of the knowledge requires the input of legal experts, we focus in this article on NLP applications that semi-automate essential time-consuming and lower-skill tasks—classifying legal documents, identifying cross-references and legislative amendments, linking legal terms to the most relevant definitions, and extracting key elements of legal provisions to facilitate clarity and advanced search options. The use of Natural Language Processing tools to semi-automate such tasks makes the proposal a realistic commercial prospect as it helps keep costs down while allowing greater coverage.
Similar content being viewed by others
Notes
The Arianna portal already exports documents to NIR XML format.
We specified a maximum distance of 2 words in order to encompass both sentences of the form ‘Il rif1 è soppresso’ (The rif1 is suppressed) and sentences of the form ‘Il rif1 è stato soppresso’ (The rif1 has been suppressed). In Italian, the lemma of both words ‘è’ and ‘stato’ is ‘essere’.
References
Ajani G, Boella G, Caro L, Robaldo L, Humphreys L, Praduroux S, Rossi P, Violato A (2016) The European Taxonomy Syllabus: a multi-lingual, multi-level ontology framework to untangle the web of european legal terminology. Appl Ontol 11(4):325–375
Ajani G, Lesmo L, Boella G, Mazzei A, Rossi P (2007) Terminological and ontological analysis of european directives: multilinguism in law. In: Proceedings of the 11th international conference on artificial intelligence and law: ICAIL. ACM, pp 43–48
Berland M, Charniak E (1999) Finding parts in very large corpora. In: Annual meeting association for computational linguistics, vol 37. Association for Computational Linguistics, pp 57–64
Biagioli C, Francesconi E, Passerini A, Montemagni S, Soria C (2005) Automatic semantics extraction in law documents. In: Proceedings of the tenth international conference on artificial intelligence and law: ICAIL. ACM, pp 133–140
Biemann C (2005) Ontology learning from text: a survey of methods. LDV Forum 20:75–93
Boella G, Di Caro L, Graziadei M, Cupi L, Salaroglio CE, Humphreys L, Konstantinov H, Marko K, Robaldo L, Ruffini C et al (2015) Linking legal open data: breaking the accessibility and language barrier in european legislation and case law. In: Proceedings of the 15th international conference on artificial intelligence and law. ACM, pp 171–175
Boella G, Di Caro L, Humphreys L, Robaldo L, van der Torre L (2012) Nlp challenges for eunomos, a tool to build and manage legal knowledge. In: Language resources and evaluation (LREC), pp 3672–3678
Boella G, Di Caro L, Robaldo L (2013) Semantic relation extraction from legislative text using generalized syntactic dependencies and support vector machines. In: International workshop on rules and rule markup languages for the semantic web. Springer, pp 218–225
Bosco C, Montemagni A, Mazzei A, Lombardo V, Dell’Orletta F, Lenci A, Lesmo L, Attardi G, Simi M, Lavelli A, Hall J, Nilsson J, Nivre J (2010) Comparing italian parsers on a common treebank: the evalita experience. In: Proceedings of the 6th international conference on language resources and evaluation (LREC 2010)
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771
Buitelaar P, Cimiano P, Magnini B (2005) Ontology learning from text: an overview. Ontol Learn Text Methods Eval Appl 123:3–12
Candan KS, Di Caro L, Sapino ML (2008) Creating tag hierarchies for effective navigation in social media. In: Proceedings of the 2008 ACM workshop on search in social media. ACM, pp 75–82
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
de Maat E, Krabben K, Winkels R (2010a) Machine learning versus knowledge based classification of legal texts. In: Proceedings of legal knowledge and information systems conference: JURIX 2010, pp 87–96
de Maat E, Krabben K, Winkels R (2010b) Machine learning versus knowledge based classification of legal texts. In: Proceedings of legal knowledge and information systems conference: JURIX 2010. IOS Press, pp 87–96
Del Gaudio R, Branco A (2007) Automatic extraction of definitions in Portuguese: a rule-based approach. In: Progress in artificial intelligence, pp 659–670
Di Caro L, Candan KS, Sapino ML (2008) Using tagflake for condensing navigable tag hierarchies from tag clouds. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1069–1072
Di Caro L, Candan KS, Sapino ML (2011) Navigating within news collections using tag-flakes. J Vis Lang Comput 22(2):120–139
Diplaris S, Tsoumakas G, Mitkas P, Vlahavas I (2005) Protein classification with multiple algorithms. In: Bozanis P, Houstis EN (eds) Advances in informatics. PCI 2005. Lecture notes in computer science, vol 3746. Springer, Berlin
Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874
Fortuna B, Mladenič D, Grobelnik M (2006) Semi-automatic construction of topic ontologies. In: Ackermann M et al (eds) Semantics, web and mining. EWMF 2005, KDO 2005. Lecture notes in computer science, vol 4289. Springer, Berlin
Harris Z (1954) Distributional structure. Word 10(23):146–162
Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics-volume 2. Association for Computational Linguistics, pp 539–545
Ittoo A, Bouma G (2013) Minimally-supervised extraction of domain-specific part-whole relations using wikipedia as knowledge-base. Data Knowl Eng 85:57–79
Lauser B, Hotho A (2003) Automatic multi-label subject indexing in a multilingual environment. In: Koch T, Sølvberg IT (eds) Research and advanced technology for digital libraries. ECDL 2003. Lecture Notes in Computer Science, vol 2769. Springer, Berlin, pp 140–151
Lesmo L (2007) The rule-based parser of the NLP group of the University of Torino. Intell Artif 2(4):46–47
Lesmo L (2009) The turin university parser at evalita 2009. In: Proceedings of EVALITA, p 9
Lesmo L, Mazzei A, Palmirani M, Radicioni DP (2013) Tulsi: an nlp system for extracting legal modificatory provisions. Artif Intell Law 21(2):139–172
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Moschitti A, Bejan CA (2004) A semantic kernel for predicate argument classification. In: CoNLL-2004
Navigli R, Velardi P (2010) Learning word-class lattices for definition and hypernym extraction. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden. Association for Computational Linguistics, pp 1318–1327
Ponzetto SP, Strube M (2007) Deriving a large scale taxonomy from wikipedia. In: Proceedings of the 22nd national conference on artificial intelligence, vol 2. MIT Press, Cambridge, pp 1440–1445
Robaldo L (2010) Interpretation and inference with maximal referential terms. J Comput Syst Sci 76(5):373–388
Robaldo L (2011) Distributivity, collectivity, and cumulativity in terms of (in)dependence and maximality. J Log Lang Inf 20(2):233–271
Robaldo L, Caselli T, Russo I, Grella M (2011) From italian text to timeml document via dependency parsing. In: Proceedings of the 12th international computational linguistics and intelligent text processing conference (CICLing 2011), Tokyo, Japan, 2011, pp 177–187
Robaldo L, Di Caro L, Antonini A (2013) Sentitagger - automatically tagging text in opinionmining-ml. In: ESSEM@AI*IA, volume 1096 of CEUR workshop proceedings. CEUR-WS.org, pp 177–180
Robaldo L, Sun X (2017) Reified input/output logic: combining input/output logic and reification to represent norms coming from existing legislation. J Log Comput 27(8):2471–2503
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18:613–620
Steinberger R, Mohamed E, Turchi M (2012) Jrc eurovoc indexer jex-a freely available multilabel categorisation tool. In: Proceedings of the 8th international conference on language resources and evaluation (LREC 2012)
Tran OT, Bach NX, Le NM, Shimazu A (2014) Automated reference resolution in legal texts. Artif Intell Law 22(1):29–60
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min (IJDWM) 3(3):1–13
Velardi P, Faralli S, Navigli R (2013) Ontolearn reloaded: a graph-based algorithm for taxonomy induction. Comput Linguist 39:665–707
Yamada I, Torisawa K, Kazama J, Kuroda K, Murata M, De Saeger S, Bond F, Sumida A (2009) Hypernym discovery based on distributional similarity and hierarchical structures. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 2. Association for Computational Linguistics, pp 929–937
Yang H, Callan J (2008) Ontology generation for large email collections. In: Proceedings of the 2008 international conference on Digital government research. Digital Government Society of North America, pp 254–261
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Boella, G., Di Caro, L. & Leone, V. Semi-automatic knowledge population in a legal document management system. Artif Intell Law 27, 227–251 (2019). https://doi.org/10.1007/s10506-018-9239-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10506-018-9239-8