Semi-automatic knowledge population in a legal document management system

Boella, Guido; Di Caro, Luigi; Leone, Valentina

doi:10.1007/s10506-018-9239-8

Semi-automatic knowledge population in a legal document management system

Published: 13 December 2018

Volume 27, pages 227–251, (2019)
Cite this article

Artificial Intelligence and Law Aims and scope Submit manuscript

1855 Accesses
11 Citations
Explore all metrics

Abstract

Every organization has to deal with operational risks, arising from the execution of a company’s primary business functions. In this paper, we describe a legal knowledge management system which helps users understand the meaning of legislative text and the relationship between norms. While much of the knowledge requires the input of legal experts, we focus in this article on NLP applications that semi-automate essential time-consuming and lower-skill tasks—classifying legal documents, identifying cross-references and legislative amendments, linking legal terms to the most relevant definitions, and extracting key elements of legal provisions to facilitate clarity and advanced search options. The use of Natural Language Processing tools to semi-automate such tasks makes the proposal a realistic commercial prospect as it helps keep costs down while allowing greater coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Eunomos, a legal document and knowledge management system for the Web to provide relevant, reliable and up-to-date information on the law

Article 28 June 2016

An automated framework for the extraction of semantic legal metadata from legal texts

Article 24 March 2021

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

Notes

http://www.normattiva.it.
http://arianna.consiglioregionale.piemonte.it/.
www.xmleges.org
The Arianna portal already exports documents to NIR XML format.
https://ec.europa.eu/jrc/en/publication/contributions-conferences/jrc-acquis-multilingual-aligned-parallel-corpus-20-languages.
We specified a maximum distance of 2 words in order to encompass both sentences of the form ‘Il rif1 è soppresso’ (The rif1 is suppressed) and sentences of the form ‘Il rif1 è stato soppresso’ (The rif1 has been suppressed). In Italian, the lemma of both words ‘è’ and ‘stato’ is ‘essere’.
http://www.cirsfid.unibo.it.

References

Ajani G, Boella G, Caro L, Robaldo L, Humphreys L, Praduroux S, Rossi P, Violato A (2016) The European Taxonomy Syllabus: a multi-lingual, multi-level ontology framework to untangle the web of european legal terminology. Appl Ontol 11(4):325–375
Article Google Scholar
Ajani G, Lesmo L, Boella G, Mazzei A, Rossi P (2007) Terminological and ontological analysis of european directives: multilinguism in law. In: Proceedings of the 11th international conference on artificial intelligence and law: ICAIL. ACM, pp 43–48
Berland M, Charniak E (1999) Finding parts in very large corpora. In: Annual meeting association for computational linguistics, vol 37. Association for Computational Linguistics, pp 57–64
Biagioli C, Francesconi E, Passerini A, Montemagni S, Soria C (2005) Automatic semantics extraction in law documents. In: Proceedings of the tenth international conference on artificial intelligence and law: ICAIL. ACM, pp 133–140
Biemann C (2005) Ontology learning from text: a survey of methods. LDV Forum 20:75–93
Google Scholar
Boella G, Di Caro L, Graziadei M, Cupi L, Salaroglio CE, Humphreys L, Konstantinov H, Marko K, Robaldo L, Ruffini C et al (2015) Linking legal open data: breaking the accessibility and language barrier in european legislation and case law. In: Proceedings of the 15th international conference on artificial intelligence and law. ACM, pp 171–175
Boella G, Di Caro L, Humphreys L, Robaldo L, van der Torre L (2012) Nlp challenges for eunomos, a tool to build and manage legal knowledge. In: Language resources and evaluation (LREC), pp 3672–3678
Boella G, Di Caro L, Robaldo L (2013) Semantic relation extraction from legislative text using generalized syntactic dependencies and support vector machines. In: International workshop on rules and rule markup languages for the semantic web. Springer, pp 218–225
Bosco C, Montemagni A, Mazzei A, Lombardo V, Dell’Orletta F, Lenci A, Lesmo L, Attardi G, Simi M, Lavelli A, Hall J, Nilsson J, Nivre J (2010) Comparing italian parsers on a common treebank: the evalita experience. In: Proceedings of the 6th international conference on language resources and evaluation (LREC 2010)
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771
Article Google Scholar
Buitelaar P, Cimiano P, Magnini B (2005) Ontology learning from text: an overview. Ontol Learn Text Methods Eval Appl 123:3–12
Google Scholar
Candan KS, Di Caro L, Sapino ML (2008) Creating tag hierarchies for effective navigation in social media. In: Proceedings of the 2008 ACM workshop on search in social media. ACM, pp 75–82
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
de Maat E, Krabben K, Winkels R (2010a) Machine learning versus knowledge based classification of legal texts. In: Proceedings of legal knowledge and information systems conference: JURIX 2010, pp 87–96
de Maat E, Krabben K, Winkels R (2010b) Machine learning versus knowledge based classification of legal texts. In: Proceedings of legal knowledge and information systems conference: JURIX 2010. IOS Press, pp 87–96
Del Gaudio R, Branco A (2007) Automatic extraction of definitions in Portuguese: a rule-based approach. In: Progress in artificial intelligence, pp 659–670
Di Caro L, Candan KS, Sapino ML (2008) Using tagflake for condensing navigable tag hierarchies from tag clouds. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1069–1072
Di Caro L, Candan KS, Sapino ML (2011) Navigating within news collections using tag-flakes. J Vis Lang Comput 22(2):120–139
Article Google Scholar
Diplaris S, Tsoumakas G, Mitkas P, Vlahavas I (2005) Protein classification with multiple algorithms. In: Bozanis P, Houstis EN (eds) Advances in informatics. PCI 2005. Lecture notes in computer science, vol 3746. Springer, Berlin
Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Fortuna B, Mladenič D, Grobelnik M (2006) Semi-automatic construction of topic ontologies. In: Ackermann M et al (eds) Semantics, web and mining. EWMF 2005, KDO 2005. Lecture notes in computer science, vol 4289. Springer, Berlin
Harris Z (1954) Distributional structure. Word 10(23):146–162
Article Google Scholar
Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics-volume 2. Association for Computational Linguistics, pp 539–545
Ittoo A, Bouma G (2013) Minimally-supervised extraction of domain-specific part-whole relations using wikipedia as knowledge-base. Data Knowl Eng 85:57–79
Article Google Scholar
Lauser B, Hotho A (2003) Automatic multi-label subject indexing in a multilingual environment. In: Koch T, Sølvberg IT (eds) Research and advanced technology for digital libraries. ECDL 2003. Lecture Notes in Computer Science, vol 2769. Springer, Berlin, pp 140–151
Lesmo L (2007) The rule-based parser of the NLP group of the University of Torino. Intell Artif 2(4):46–47
Google Scholar
Lesmo L (2009) The turin university parser at evalita 2009. In: Proceedings of EVALITA, p 9
Lesmo L, Mazzei A, Palmirani M, Radicioni DP (2013) Tulsi: an nlp system for extracting legal modificatory provisions. Artif Intell Law 21(2):139–172
Article Google Scholar
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Article Google Scholar
Moschitti A, Bejan CA (2004) A semantic kernel for predicate argument classification. In: CoNLL-2004
Navigli R, Velardi P (2010) Learning word-class lattices for definition and hypernym extraction. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden. Association for Computational Linguistics, pp 1318–1327
Ponzetto SP, Strube M (2007) Deriving a large scale taxonomy from wikipedia. In: Proceedings of the 22nd national conference on artificial intelligence, vol 2. MIT Press, Cambridge, pp 1440–1445
Robaldo L (2010) Interpretation and inference with maximal referential terms. J Comput Syst Sci 76(5):373–388
Article MathSciNet MATH Google Scholar
Robaldo L (2011) Distributivity, collectivity, and cumulativity in terms of (in)dependence and maximality. J Log Lang Inf 20(2):233–271
Article MathSciNet MATH Google Scholar
Robaldo L, Caselli T, Russo I, Grella M (2011) From italian text to timeml document via dependency parsing. In: Proceedings of the 12th international computational linguistics and intelligent text processing conference (CICLing 2011), Tokyo, Japan, 2011, pp 177–187
Robaldo L, Di Caro L, Antonini A (2013) Sentitagger - automatically tagging text in opinionmining-ml. In: ESSEM@AI*IA, volume 1096 of CEUR workshop proceedings. CEUR-WS.org, pp 177–180
Robaldo L, Sun X (2017) Reified input/output logic: combining input/output logic and reification to represent norms coming from existing legislation. J Log Comput 27(8):2471–2503
Article MathSciNet MATH Google Scholar
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Article Google Scholar
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18:613–620
Article MATH Google Scholar
Steinberger R, Mohamed E, Turchi M (2012) Jrc eurovoc indexer jex-a freely available multilabel categorisation tool. In: Proceedings of the 8th international conference on language resources and evaluation (LREC 2012)
Tran OT, Bach NX, Le NM, Shimazu A (2014) Automated reference resolution in legal texts. Artif Intell Law 22(1):29–60
Article Google Scholar
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min (IJDWM) 3(3):1–13
Article Google Scholar
Velardi P, Faralli S, Navigli R (2013) Ontolearn reloaded: a graph-based algorithm for taxonomy induction. Comput Linguist 39:665–707
Article Google Scholar
Yamada I, Torisawa K, Kazama J, Kuroda K, Murata M, De Saeger S, Bond F, Sumida A (2009) Hypernym discovery based on distributional similarity and hierarchical structures. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 2. Association for Computational Linguistics, pp 929–937
Yang H, Callan J (2008) Ontology generation for large email collections. In: Proceedings of the 2008 international conference on Digital government research. Digital Government Society of North America, pp 254–261

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Turin, Turin, Italy
Guido Boella, Luigi Di Caro & Valentina Leone

Authors

Guido Boella
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Di Caro
View author publications
You can also search for this author in PubMed Google Scholar
Valentina Leone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luigi Di Caro.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boella, G., Di Caro, L. & Leone, V. Semi-automatic knowledge population in a legal document management system. Artif Intell Law 27, 227–251 (2019). https://doi.org/10.1007/s10506-018-9239-8

Download citation

Published: 13 December 2018
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s10506-018-9239-8

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-automatic knowledge population in a legal document management system

Abstract

Access this article

Similar content being viewed by others

Eunomos, a legal document and knowledge management system for the Web to provide relevant, reliable and up-to-date information on the law

An automated framework for the extraction of semantic legal metadata from legal texts

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Semi-automatic knowledge population in a legal document management system

Abstract

Access this article

Similar content being viewed by others

Eunomos, a legal document and knowledge management system for the Web to provide relevant, reliable and up-to-date information on the law

An automated framework for the extraction of semantic legal metadata from legal texts

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation