Mining legal arguments in court decisions

Habernal, Ivan; Faber, Daniel; Recchia, Nicola; Bretthauer, Sebastian; Gurevych, Iryna; Spiecker genannt Döhmann, Indra; Burchard, Christoph

doi:10.1007/s10506-023-09361-y

Mining legal arguments in court decisions

Original Research
Open access
Published: 23 June 2023

(2023)
Cite this article

Download PDF

You have full access to this open access article

Artificial Intelligence and Law Aims and scope Submit manuscript

Mining legal arguments in court decisions

Download PDF

Ivan Habernal ORCID: orcid.org/0000-0002-0990-4554¹,
Daniel Faber¹,
Nicola Recchia⁴,
Sebastian Bretthauer²,
Iryna Gurevych³,
Indra Spiecker genannt Döhmann² &
…
Christoph Burchard²

3912 Accesses
3 Citations
107 Altmetric
1 Mention
Explore all metrics

Abstract

Identifying, classifying, and analyzing arguments in legal discourse has been a prominent area of research since the inception of the argument mining field. However, there has been a major discrepancy between the way natural language processing (NLP) researchers model and annotate arguments in court decisions and the way legal experts understand and analyze legal argumentation. While computational approaches typically simplify arguments into generic premises and claims, arguments in legal research usually exhibit a rich typology that is important for gaining insights into the particular case and applications of law in general. We address this problem and make several substantial contributions to move the field forward. First, we design a new annotation scheme for legal arguments in proceedings of the European Court of Human Rights (ECHR) that is deeply rooted in the theory and practice of legal argumentation research. Second, we compile and annotate a large corpus of 373 court decisions (2.3M tokens and 15k annotated argument spans). Finally, we train an argument mining model that outperforms state-of-the-art models in the legal NLP domain and provide a thorough expert-based evaluation. All datasets and source codes are available under open lincenses at https://github.com/trusthlt/mining-legal-arguments.

A Decade of Legal Argumentation Mining: Datasets and Approaches

Claim Detection in Judgments of the EU Court of Justice

Building a corpus of legal argumentation in Japanese judgement documents: towards structure-based summarisation

Article Open access 15 February 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

One of the most important tasks of law is to provide resolution mechanisms for conflicts. For this purpose, there are numerous laws which permeate almost all areas of daily life. If this leads to legal disputes between state institutions and citizens or between private individuals, courts must decide these conflicts and interpret them on the basis of the relevant legal norms. It is thus the primary task of the courts, especially in public law, to rule on the lawfulness of state decisions. Therefore, according to legal theory, argumentation forms the backbone of rational and objective decision-making in court proceedings.

Legal argumentation research spans also a wide variety of disciplines and a rich history dating back to the early days of ‘classical’ scholars in the second century BC, with some modern twists by contemporary philosophers (Toulmin 1958). The end of the 20th century brought legal argumentation to the research agenda of AI (Skalak and Rissland 1992). Attempts to automatically identify, classify, and analyze arguments in legal cases stood at the beginning of early works in the field of ‘argument mining’ (Mochales and Moens 2008).

To date, however, there has been a major discrepancy between the way legal experts analyze legal argumentation and the way natural language processing (NLP) researchers model, annotate, and mine legal arguments. While computational approaches typically treat arguments as structures of premises and claims (Stede and Schneider 2018), arguments in legal research usually exhibit a rich typology that is important for understanding how parties argue (Trachtman 2013).

This paper aims to fill this gap by addressing the following research questions. First, we ask how reliably we can operationalize models of arguments from legal theory in terms of discourse annotations. Second, we want to explore how we can develop robust argument mining models that outperform the state of the art under the constraints of extremely expensive, expert-labeled data.

Our work makes several important contributions. First, we design a new annotation scheme for legal arguments in proceedings of the European Court of Human Rights (ECHR) that is deeply rooted in the theory and practice of legal argumentation research (Grabenwarter and Pabel 2021; Rüthers et al. 2022; Schabas 2015; Ammann 2019; Barak 2012). Second, we compile and annotate a large corpus of 373 court decisions (2,395,100 tokens and 15,205 annotated argument spans) from the ECHR covering Articles 3, 7, and 8 of the European Convention on Human Rights. Third, we develop an argument mining model that outperforms state-of-the-art models in the legal NLP domain and provide a thorough expert-based evaluation. Finally, we preliminarily experiment with supervised linear models to investigate whether particular legal argument patterns affect the overall importance of the case as determined by the ECHR Bureau. All datasets and source codes are available at https://github.com/trusthlt/mining-legal-arguments.

2 Related work

We review existing works in the legal domain, in particular argument mining in legal texts and works dealing with the ECHR judgments. Additionally, we focus on domain-specific pre-training that gives the background for our further experiments with language modeling.

Argument mining in court decisions

In their earliest work, Moens et al. (2007) proposed a binary classification of isolated sentences as either argumentative or non-argumentative, on a corpus^{Footnote 1} containing five court reports from the UK, US and Canada, among others. However, no further details about the exact way of selecting the data, expertise of the annotators, or the agreement scores are known.

This work was later extended by Mochales-Palau and Moens (2007) who fine-grained the previously binary classification into three classes, namely premise, conclusion, or non-argumentative sentence. Moreover, each sentence was classified in the context of the preceding and the succeeding sentence. This work also introduced ECHR for the first time, where two lawyers annotated 12k sentences from 29 admissibility reports and 25 “legal cases” over the course of four weeks. This resulted in a corpus of roughly 12k sentences, with the majority (10k) non-arguments, 2,335 conclusions, and 419 premises. Despite being pioneering in analyzing ECHR, there are two open questions. The minor one is that the paper lacks any information on the annotator agreement. The major one is the very motivation for using this particular scheme of premise/conclusion/non-argument for analyzing legal argumentation. The authors developed their annotation scheme by taking an indirect inspiration from Walton (1996) and claim that this would “enable one to identify and evaluate common types of argumentation in everyday discourse”. However, the referenced Walton’s works Reed and Walton (2001) hardly deals with legal argumentation.^{Footnote 2} The utility of this scheme for legal argument analysis thus remains unaddressed.

A similar motivation for using Walton’s schemes and the premise/conclusion model has been later adopted in consecutive works by the same authors (Mochales and Moens 2008, 2011) where the latter defines an argument as a set of propositions that adhere to one of the Walton’s argumentation schemes and thus can be challenged by critical questions. Mochales and Moens (2008) studied 10 judgments and decisions from ECHR and obtained a Kappa agreement between two independent lawyers of 0.58. In a follow up, Mochales and Moens (2011) experimented with 47 annotated ECHR documents from which only The Law section had been selected for annotation. While the authors original aim was to annotate a tree structure over the entire document, the actual corpus consisted of implicit relations between sentences, where a list of consecutive sentences is grouped into an argument, and each sentence is labeled either as support, against, or conclusion. It is unclear whether the Walton’s schemes had been assigned to the arguments.

More recently, Poudyal et al. (2020) published an annotated corpus of 42 ECHR decisions based on the previously annotated corpus by Mochales and Moens (2008). They evaluated three tasks using a simple RoBERTa model on their published dataset. First, clause detection identifies whether a clause belongs to an argument or not. Second, argument relation prediction for each pair of arguments decides whether they are related. Third, premise and conclusion recognition decides which clause is a premise and which is a conclusion. The last two tasks require perfect recognition of the clauses from the first task. The dataset is in a JSON format with pre-extracted sentences, such that the full original texts of judgments are not available.

With the goal of summarizing court decisions, Yamada et al. (2019a) annotated 89 Japanese civil case judgments (37k sentences) with a tree-structured argument representation; a feature typical to these particular legal documents. They used four phases of annotations performed by a Japanese law Ph.D. student and reported experiments on classifying each sentence into one of the seven classes (Yamada et al. 2019b).

Xu et al. (2020) explored argument mining to improve case summaries, which should contain the following key information: (1) the main issues the court addressed in the case, (2) the court’s conclusion on each issue, and (3) a description of the reasons the court gave for its conclusion. They called these key pieces of information “legal argument triples” and intended to use them to create concise summaries, for which they annotated the human summaries with these triples, followed by annotating the text in the full court case corresponding to the summary triples. Finally, they performed argument mining to extract these triples from both the summaries and the full court case files.

Compared to previous work, we depart from the usual premise-conclusion scheme by using a novel annotation scheme taken from a legal research perspective. Moreover, our dataset on the ECHR is much larger, with a total of 393 annotated decisions.

Pretraining large language models

Gururangan et al. (2020) examined the effects of continued pre-training of language models such as RoBERTa on domain- and task-specific data. To this end, they studied four domains, namely news, reviews, biomedical papers, and computer science papers. First, they pretrained a masking model on a large corpus of unlabeled domain-specific text, which they called domain adaptive pretraining. They found that this consistently improved performance on tasks in the target domain. Second, they investigated whether to continue pretraining only on the task-specific training data, which they called task-adaptive pretraining. They found that it consistently outperformed the RoBERTa baseline and matched the performance of domain adaptive pretraining in some tasks. Finally, they combined the two procedures, running domain adaptive pretraining first and then task-adaptive pretraining, which performed best.

Gu et al. (2022) investigated pretraining in a specialized biomedical domain using biomedical abstracts from PubMed. They found that domain-specific pretraining from scratch worked best, along with the creation of a new domain-specific vocabulary for the pretraining process. In addition, masking whole words instead of masking sub-word tokens improved performance. They concluded that, as in the case of biomedicine, it may be more beneficial to pre-train from scratch if a sufficient amount of domain-specific data is available.

Chalkidis et al. (2020) compared different approaches to adapting BERT to legal corpora and tasks by using vanilla BERT, adapting BERT by further pre-training on a domain-specific legal corpora, and pre-training BERT from scratch on a domain-specific legal corpora similar to the pre-training of the original BERT. They compared these approaches on classification and sequence labeling tasks and found that the best model can vary between further pre-training and pre-training from scratch depending on the task. They also found that the more challenging the final tasks, the more the model benefits from domain-internal knowledge. Finally, they published their model trained from scratch as Legal-BERT.

Zheng et al. (2021) examined the use of pretraining in the legal domain. They created a new dataset of Case Holdings on Legal Decisions with over 53,000 multiple-choice questions to identify the relevant statement of a cited case, a task they found to be legally significant and difficult from an NLP perspective. They pre-trained two different models. First, they continued the pre-training of BERT on their case law corpus. Second, they trained a model from scratch with a legal domain-specific vocabulary. They compared these two models to the standard BERT model as well as to the BERT model trained in twice the number of steps. They compared these models for different legal tasks and concluded that pretraining may not be worthwhile if the tasks are either too easy or not domain-specific in terms of the pretraining corpus. The more difficult and domain-specific the task, the greater the benefit of pretraining.

The above observations support our motivation for extensive pretraining. Since we want to extract and classify the arguments in a schema mainly used by lawyers, which we consider a very domain-specific task, we will also use the pretraining to build our own language model.

3 LAM:ECHR Corpus

3.1 Annotation scheme

This section introduces our new annotation scheme. From the NLP perspective, this scheme is 1) a text span annotation, 2) flat, non-hierarchical, and non-overlapping, 3) multi-class single-label, 4) aligned to tokens but independent of sentence boundaries, 5) cannot cross paragraphs as present in the court case and finally 6) each span is annotated with exactly two orthogonal tagsets, one for the argument type and one for the argument actor.

From the legal perspective the scheme is not based on logical categories of language (including legal language), as it is usual in the previously reviewed literature. Instead, we have chosen to break down the legal part of the ECHR’s decisions by resorting to the usual legal categorization of the arguments used by ECHR. This allows for a deeper analysis of the Court’s motivational itinerary, which is more similar to that of most jurists. Such an annotation, although much more complex in terms of NLP, has the advantage of enabling a fine-grained search that allows relevant arguments to be quickly found and filtered out. For example, it is easy to identify in the Court’s vast jurisprudence all the instances in which the Court uses a particular canon of argumentation, or to discuss, on the basis of empirical data, the quantitative or qualitative relationship in the use of different canons and also to observe their evolution over time.

Although the proposed annotation scheme has been tailored to ECHR decisions, the argument types we use are generally recognized in legal theory and can be found in many different courts. It should then be possible to apply the bulk of this annotation scheme to other courts, perhaps after some revision in the light of the particularities of the court in question. For example, the specifics of a country must be taken into account when interpreting constitutional documents. Further changes would be necessary for the Actors, where some of the entries would be deleted, such as Commission/Chamber, and others would certainly change their names.

The annotation scheme is divided into two main categories: the actors and their arguments. It should be noted that these two categories are completely orthogonal and independent, i.e., at least in theory, any actor can be linked to any kind of argument and vice versa.

3.1.1 Actors

The annotation scheme covers five different types of actors.

1.
ECHR The ECHR is the most common agent and includes all arguments that the ECHR introduces.
2.
Applicant The applicant is the person or Contracting Party that litigates an alleged violation of a fundamental right as enshrined in the Convention.
3.
State The respondent means the party that is assumed by the applicant to be responsible for the alleged violation. Since the respondent is most of the time a Contracting Party, i.e. a state, we used this word for this category for the sake of clarity for the annotators.
4.
Third parties Third parties stand for all other parties that take part in the procedure, e.g., other Contracting Parties or NGOs such as Amnesty International or Human Rights Watch.
5.
Commission/Chamber Finally, Commission/Chamber concerns all arguments that originate from the Commission (until this organ was eliminated with the entry into force of Protocol no. 11 to the Convention in 1998) or from a Chamber, in the event of a Great Chamber decision, and are merely reproduced by the ECHR.

3.1.2 Argument types

The annotation scheme includes sixteen different categories of arguments. Each of these categories is the subject of an in-depth and complex analysis in legal theory. Our aim to take them up in their canonical content and give a brief definition; this was already done at the beginning of our project for the benefit of the annotators (who were also given concrete examples from ECHR judgments in addition to such brief definitions, see Appendix B).

1.
Procedural arguments—Non contestation by the parties This category describes the situation of consensus on a fact or argument between the parties, that allows the Court not to discuss further the matter, so that judicial time and resources can be saved.
2.
Method of interpretation—Textual interpretation The textual interpretation, as can be found in Art. 31 §1–3 and in Art. 33 §1–3 of the Vienna Convention on the Law of Treaties (VCLT), is usually seen as the starting point for the interpretation of a norm. Textual interpretation can be referred to the meaning of the norm wording at the time of its origin or its application as well as its meaning in the technical or (most subsidiarily) colloquial language (Rüthers et al. 2022, §731 et seq.; Ammann 2019, §197 et seq.). Also in the specific context of the ECHR, the textual interpretation and the corresponding norm of the VCLT have been seen as “the backbone for the interpretation of the Convention” (see, also more generally on the role of the VCLT in the case-law of the ECHR, Schabas 2015, 34 et seq.).
3.
Method of interpretation—Historical interpretation The historical interpretation as foreseen by Art. 31 para. 4 and Art. 32 VCLT involves the analysis of the historical circumstances at the moment of the enactment of the norm in order to ascertain its objective; cf. further (Rüthers et al. 2022, §778 et seq.; Ammann 2019, §778 et seq.). As foreseen by Art. 31 §4 and Art. 32 VCLT it is only of subsidiary importance in international treaties (Grabenwarter and Pabel 2021, §5). This is especially true with reference to the ECHR since its preparatory work “are rather sparse, especially with respect to the definitions of fundamental rights and issues related to their interpretation and application” (Schabas 2015, 45 et seq.).
4.
Method of interpretation—Systematic interpretation The systematic interpretation—as provided by in Art. 31 §3 lit. c VCLT—is based on the ideal of a self-consistent legal system. Each norm is thus to be interpreted only from its position and function within the complete legal system (Rüthers et al. 2022, §744 et seq.; Ammann 2019, p. 202). The ECHR has clearly recognized a duty to interpret the Convention in harmony with other rules of international law of which it forms part (see further Schabas 2015, 37 et seq.). At the same time, the Court also insists that “the Convention must be read as a whole, and interpreted in such a way as to promote internal consistency and harmony between its various provisions” (see further Schabas 2015, 47).
5.
Method of interpretation—Teleological interpretation We have encompassed in the same category a number of similar if by no means identical arguments to which the ECHR sometimes resorts. First of all, the teleological interpretation stricto sensu, as foreseen by Art. 33 §4 VCLT, i.e. the hermeneutic argument that is concerned with the objective (telos) that is to be achieved by the norm. The decisive factor is here not the historical intention of the legislator, but the objective purpose expressed in the norm which is characterized significantly by the textual, systematic and historical interpretation (Rüthers et al. 2022, §717 et seq.; Ammann 2019, p. 208). Together with the teleological interpretation this category is also used for the dynamic or evolutive interpretation of the ECHR, which has been seen by the Court as a “living instrument” (Grabenwarter and Pabel 2021, §5 para. 14 et seq.; Schabas 2015, 47 et seq.). This dynamic or evolutive interpretation of the ECHR is often also associated to the idea that the ECHR should offer an effective protection of fundamental rights (on the principle of effectiveness in the context of the ECHR see further (Schabas 2015, 49 et seq.)).
6.
Method of interpretation—Comparative law With this argument type, all instances were annotated in which the ECHR makes references to legal provisions or case law of the Contracting Parties or to other legal orders, such as prominently the EU and the case law of the Court of Justice of the European Union. Very common are also references to international law more generally than in the stricter sense relevant in the sense of Art. 31 §3 lit. c VCLT for the systematic interpretation (see further Schabas 2015, 38 et seq.).
7.
Test of the principle of proportionality—Legal basis In a constitutional democracy, a constitutional right cannot be limited unless such a limitation is authorized by law. This is the principle of legality. From here stems the requirement—which can be found in modern constitutions’ limitation clauses, as well as in other international documents—that any limitation on a right be “prescribed by law”. At the basis of this requirement stands the principle of the rule of law (Barak 2012, p. 107; with specific reference to the ECHR context Grabenwarter and Pabel 2021, §18 para. 7 et seq.)
8.
Test of the principle of proportionality—Legitimate purpose This component of the proportionality test examines whether it is possible to ascertain a legitimate purpose in the law that limits a fundamental right (cf. further in general Barak 2012, p. 107; with specific reference to the ECHR context again Grabenwarter and Pabel 2021 §18 para. 7 et seq.)
9.
Test of the principle of proportionality—Suitability This component of the proportionality test analyses whether the means chosen by the law fit, i.e. can effectively realize or advance, the legitimate purpose of the law itself Barak 2012, p. 303 et seq.; Grabenwarter and Pabel 2021, §18 para. 15).
10.
Test of the principle of proportionality—Necessity/Proportionality Since the ECHR does not strictly differentiate between the categories of necessity and proportionality in a strict sense (see Grabenwarter and Pabel 2021, §18 para. 15), considerations of necessity—if present—also fall into this category. The test of necessity dictates that the legislator chooses—of all suitable means—only the ones that would limit at least the human right in question (see generally Barak 2012, p. 317 et seq.). On the other end the component of proportionality stricto sensu dictates that, “in order to justify a limitation on a constitutional right, a proper relation (‘proportional’ in the narrow sense of the term) should exist between benefits gained by the public and harm caused to the constitutional right from obtaining that purpose” (Barak 2012, p. 340 et seq.; Grabenwarter and Pabel 2021, §18 para. 14 et seq.)
11.
Institutional arguments—Overruling This category refers to the amendment of a precedent on a horizontal level. It can only be done under the premise of fundamental deficits of the previous precedent. Since an overruling is extremely rare, it both fulfils the requirement of continuity and legal certainty and allows for viability and adaptability (cf. further Maultzsch 2017, para. 1342; for an in-depth analysis regarding the ECHR see Mowbray 2009).
12.
Institutional arguments—Distinguishing This category is relevant when looking at a precedent and assessing an essential difference of facts, which leads to a non-transfer of a precedent to the new case (cf. further Maultzsch 2017, para. 1346),
13.
Institutional arguments—Margin of Appreciation The margin of appreciation is a margin of discretion granted by the ECHR to the judiciary, legislature and executive of the Member States before a violation of the ECHR is assumed and is functional to modulate the strictness of the review by the ECHR in different areas and contexts (Grabenwarter and Pabel 2021, §18 para. 20 et seq.; Schabas 2015, 78 et seq.)
14.
Precedents of the ECHR This category concerns the effect of the legal content of earlier judgments of the ECHR for later judgments (Maultzsch 2017, para. 1330 et seq.). In the context of the ECHR there is not a rule of binding precedent or stare decisis, but nevertheless the Court normally follow its precedents (cf. further Schabas 2015, 46 et seq.).
15.
Decision of the ECHR This category contains the decisions of the ECHR which can be the final sentence on the result of the interpretation of a norm as well as the final sentence of the part of the judgment on the application to the concrete case.
16.
Application to the concrete case Application to the concrete case is concerned with determining the relation between the concrete case and the abstract legal norm by the subsumption of the facts of a case under a legal norm, i.e. examining whether the offence is fulfilled and the legal consequence thereby triggered (Rüthers et al. 2022, para. 677 et seq.).

3.2 Data and annotation process

We scraped and extracted a large ‘raw’ data collection from the HUDOC web interface^{Footnote 3} which contains, among others, judgements and decisions following a relatively rigid structure. A case begins with the list of the judges of the court, the registar, the indication of the applicant, the respondent and possibly other parties that have been admitted to the case. Next, the procedure before the Court and facts of the case are described. After the facts, in the “The Law” Section the arguments of the parties and of the Court on each alleged violation of the Convention are presented. Finally, the Courts renders its decision; see an example in Appendix A. All judgments that were later selected for annotation were processed by a bespoke html-to-xml extraction that retained the case and paragraph structure, and all paragraphs were tokenized using spaCy.^{Footnote 4}

3.2.1 Annotation process

We hired six law students for a 12-month period as annotators, supervised by two postdoctoral researchers (experts in public and criminal law) and consulted by two law professors (also public and criminal law). We randomly selected ECHR cases concerning Article 8 (right to respect for private and family life, home and correspondence) and also to a lesser degree Article 7 (no punishment without law) to match our legal expertise. We also tried to balance cases in terms of their importance (from level one to four) as well as their recency.

The annotation process was conducted using the INCepTION platform (Klie et al. 2018) and consisted of multiple rounds in which the annotators were given feedback about fully and partially missing annotations. In the end, we annotated 375 ECHR cases from which we excluded two documents as they had less than five arguments.

We measured pairwise annotation agreement using Krippendorff’s unitized alpha \(\alpha _u\) (Krippendorff 2014) as implemented by Meyer et al. (2014). In the first annotation round, the \(\alpha _u\) was in 0.70 s. After the first round the causes of the disagreements were discussed and the annotation guidelines were updated to reflect that. In the rounds after that the \(\alpha _u\) was in the 0.80 s with a few rare outliers where the score degraded by around 0.2.^{Footnote 5} Values of \(\alpha _u\) above 0.80 are considered very high for span annotations. This indicates that even for human experts the argument annotation is a difficult task. For the final single gold-standard dataset, the annotations were manually curated and merged by an independent expert annotator who solved disagreements. Initially the documents were annotated by all six annotators, then we ran several batches with three persons per document (those were used for calculating the inter-annotator agreement) and the last part of the study was ran by each annotator independently.

Our final gold-standard dataset LAM:ECHR consists of 373 ECHR cases annotated in the UIMA-XMI stand-off annotation format which keeps the original text along with span information about tokens, paragraphs and arguments as well as their labels. This enables easy post-processing by frameworks such as DKPro (Eckart de Castilho and Gurevych 2014) or dkpro-cassis for any arbitrary downstream experiments. Furthermore, we also release the individual raw annotations from each annotator to facilitate future work for evaluation human judgments under uncertainty, such as in (Simpson and Gurevych 2019a).

3.2.2 Representing arguments for argument mining

Since a single ECHR case is easily multiple thousand words long and the models we work with cannot handle such long sequences, we break down each case into smaller units. For this, we treat each paragraph as a single short annotated text. This design choice is supported by the fact that legal arguments rarely span more than single paragraph in the ECHR judgments which was also considered in our annotation scheme (see Sect. 3.1).

Similarly to Poudyal et al. (2020) we trim the cases and only include paragraphs from The Law section onwards. This has the advantages that training is avoided on paragraphs without arguments which also rapidly shortens the training time. We represent the argument spans using BIO encoding, see Fig. 1 for an example.

Finally, we split the 373 cases into an training/dev/test sets, resulting in 299 cases for training, 37 for development and 37 for testing. The data were stratified such that the distribution of labels remains balanced across splits and to ensure that low-support labels are included in all splits.

3.2.3 Data analysis

Additionally, we collected statistics about our gold data in general and for the arguments on the argument level as well as on the BIO-tag level. The whole corpus argument high-level statistic can be seen in Table 1. Since Historical interpretation is not present in our dataset, we removed this argument type from our further work. Furthermore, we can already see that our argument type distribution is highly imbalanced with Application to the concrete case making up more than 50% of the dataset and around half of our labels having a support of less than 1%, going as low as two instances of Overruling in the whole dataset. This can be a hindrance in the model learning. In fact, a more detailed statistics on the BIO-tag level in Table 13 (Appendix C) reveals that only six out of 31 argument labels have a support higher than 1%.

Table 1 Number of text spans associated with a particular argument type in the entire annotated corpus

Mining legal arguments in court decisions

Abstract

Similar content being viewed by others

A Decade of Legal Argumentation Mining: Datasets and Approaches

Claim Detection in Judgments of the EU Court of Justice

Building a corpus of legal argumentation in Japanese judgement documents: towards structure-based summarisation

1 Introduction

2 Related work

3 LAM:ECHR Corpus

3.1 Annotation scheme

3.1.1 Actors

3.1.2 Argument types

3.2 Data and annotation process

3.2.1 Annotation process

3.2.2 Representing arguments for argument mining

3.2.3 Data analysis

4 Experiments

4.1 Datasets

4.2 Models

4.2.1 Multitask fine-tuning with transformers

4.2.2 Postprocessing

4.2.3 Baseline models

4.3 Pre-training for robust domain adaptation

4.3.1 Further pretraining

4.4 Downstream task results and analysis

4.4.1 Error analysis

4.5 Generalization to Article 3

4.5.1 Limitations

4.6 Case study: predicting case importance using arguments as features

4.6.1 Model and features

4.6.2 Analysis

5 Conclusion

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A ECHR judgment example

Appendix B argument types examples

Appendix C Supplementary data analysis

Appendix D Failed pretraining from scratch

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation