1 Introduction

Contests are common in real-life settings, such as labour markets, industrial economics, sports, and public choice. In the majority of cases, ability levels vary across contestants, and this heterogeneity has adverse effects on contestants’ willingness to invest resources, often referred to as the ‘discouragement effect’ (e.g. Konrad, 2009).Footnote 1 In a dynamic (tournament) setting with multiple stages and asymmetric abilities, a contestant’s investment in a current stage can also be affected by the strength of the opponents in future stages.

In US presidential primaries for example, a candidate may ‘skip’ a state, by reducing campaign expenditures in that state, because it offers a low probability of victory and instead save and reallocate those resources to subsequent efforts in states where the outcomes of the election are less certain. More generally, disadvantaged subjects tend to drop out of competitions to save effort. In team sports, coaches are more likely to spare top players in very unbalanced matches to avoid injuries and reduce (mental) fatigue for the upcoming game—in case the upcoming game is anticipated to be a close contest. Such occurrences are most recently discussed as ‘load management’ in the media.

We thus hypothesise that contestants self-restrict in the current stage of a dynamic battle as long as the gap in abilities at this stage is sufficiently large. This hypothesis corresponds to the discouragement effect. We also predict a spillover effect, such that contestants tend to self-restrict in the current stage when heterogeneity in the subsequent contest (upcoming stage) is sufficiently small.Footnote 2

To establish empirical evidence, we use data from the German Bundesliga, a professional association soccer league with a double round-robin tournament structure.Footnote 3 Professional soccer leagues offer a formalised structure and full information about the number of rounds and opponents at the beginning of the tournament; they also entail contests with high incentives. To test our predictions, we leverage a unique rule stating that a player who accumulates a critical number of yellow cards (also called ‘bookings’) will be suspended for the next match. Teams thus may self-restrict by not fielding a player threatened by a ban or by strategically provoking a booking to obtain a suspension for the subsequent match.

In line with our predictions, we find that teams systematically self-restrict in round t if the difference in ability, or heterogeneity, in t is sufficiently large. Results regarding the effect of heterogeneity in t + 1 on the decision to hold back resources in t are somewhat mixed, but they provide strong evidence of anticipating behaviour.

This article thus contributes to three strands of literature. First, it relates to work on the impact of heterogeneity in contests in general, starting with Rosen (1986), who clearly suggests that unbalanced match-ups result in lower effort levels. Theoretical work strongly supports this discouragement effect, including contributions by Baik (1994), Stein (2002), Szymanski (2003), or Konrad (2009). Intuitively, underdogs lower their willingness to invest when winning becomes too costly, whereas favourites respond by lowering their investment when the outcome appears predefined. Empirical evidence obtained in laboratory settings (e.g., Dechenaux et al., 2015; Hart et al., 2015) and from the field (e.g., Ehrenberg & Bognanno, 1990; Iqbal and Krumer, 2019; Sunde, 2009) similarly affirms these discouragement effects. We therefore expect that the investment of resources depends critically on heterogeneity in the competition.

Second, this paper contributes to a growing body of research on anticipating behaviour and inter-temporal effort provision in multi-period contests with fixed prize structures with theoretical framework provided by Szentes and Rosenthal (2003) and Konrad and Kovenock (2009). Laboratory setups show that participants in multi-stage contests exert effort in sub-optimal ways (e.g., Altmann et al., 2012; Mago and Sheremeta, 2017).

Third, and most directly, our study also relates to ‘spillover’ or ‘carryover’ effects in tournaments with heterogeneous players. While there is much evidence suggesting strategic behaviour in sport contests in general (e.g., Genakos & Pagliero, 2012; Malueg & Yates, 2010; Taylor & Trogdon, 2002), three papers are explicitly studying the effect of past and future opponents on players’ current effort levels. In their pioneer work, Harbaugh and Klumpp (2005) consider budget constraints in a tournament model and show that both underdogs and favourites have distinct incentives to reserve resources for upcoming battles; the underdogs benefit more from investing more of their resources in initial stages. Most existing field evidence comes from sport settings, where actions and performance are observable for a schedule (or tournament tree) that is known in advance. For instance, Harbaugh and Klumpp (2005) show that the introduction of a ‘rest day’ improves the performance of the favourites in NCAA basketball, indicating a shift in incentives to conserve resources. However, direct evidence of such strategic behaviour is missing. Brown and Minor (2014), using data from top-level tennis tournaments, find that the probability that the favourite wins the current stage decreases with the strength of an expected future opponent. They argue that taking the competitor’s ability in the next round into account changes the favourite’s valuation of the tournament and hence optimal effort provision.Footnote 4 Lackner et al. (2015) also find that the intensity of play (measured by personal fouls) increases when the expected relative ability of the next-stage contestant decreases in NBA and NCAA playoffs. Similar to Brown and Minor (2014), their approach builds on the idea that future opponents affect the probability of winning the tournament, or rather the continuation value, and thus the incentives to exert effort in the current stage.

The main novelty of our empirical approach is that we provide direct evidence of strategic action for achieving an optimal allocation of resources across the tournament. Existing articles use observable characteristics like scores (Brown & Minor, 2014) or personal fouls (Lackner et al., 2015) as discrete proxies for unobservable effort provision, which is a continuous decision. In contrast, we consider a discrete decision that requires self-restriction (in the line-up or by deliberately picking up a fifth yellow card), which is directly observable.

This article proceeds as follows: In Sect. 2, we explain how self-restriction works in the case of soccer and derive the main hypotheses. We empirically test them with field data in Sect. 3. Finally, Sect. 4 concludes.

2 Self-restriction in soccer

European soccer leagues are commonly designed as double round-robin tournaments, where all teams are matched twice in pairwise contests. For the final table, teams are ranked according to the number of points won, and prizes and rewards are distributed according to this ranking. Teams draw on one specific resource to accomplish their seasonal goals: players under contract. These players are not of unlimited availability, though, due to specific rules that sanction illegitimate behaviour.Footnote 5 In particular, a decision to self-restrict in the German Bundesliga emerges from the ‘yellow card rule’, which mandates that a player who has received the fifth/tenth/fifteenthFootnote 6 yellow card in a match t is suspended for match t + 1.Footnote 7,Footnote 8

A team can anticipate that a player who has accumulated four (or nine) yellow cards up to match t—hereafter referred to as yellowplayers—will not be available soonFootnote 9, so it has some leeway to decide which match t + n such a player will (not) miss. Since yellowplayers typically are those fielded very frequentlyFootnote 10 and hence are vital for their teams, we hypothesise that yellowplayers are not suspended ‘randomly’ but that teams strategically decide which match a yellowplayer will (not) miss, depending on current and future opponents. The league’s schedule is publicly known in advance, so a team’s decision to restrict its resources in match t or t + 1 crucially depend on the expected heterogeneity of match t or t + 1.

The strategic use of the yellow card rule can be exemplified by the case of two players from the Werder Bremen team who, in a March 2016 hearing, were accused of intentionally picking up yellow cards. Their team—struggling against relegation in the end of the season—has just claimed an important home victory against their direct competitor, Hannover 96. In the final phase of this match, the two midfielders were booked for ’tactical’ fouls. Since it was their fifth and tenth yellow card in season, it was assumed that the players, one of whom was the team captain, provoked the bookings in the hope of avoiding the subsequent game against the league’s dominant leader Bayern Munich (which Werder Bremen ultimately lost by a crushing 5–0 score). After serving the one-game suspension, the two players would have been allowed to play the following ‘more important’ and more winnable games. They later admitted that their plan had been arranged in advance.Footnote 11

This example offers three key takeaways. First, teams have incentives to let players at risk of a ban deliberately pick up a critical yellow card and take a pause when their resources are needed less. Such a scenario likely arises if an upcoming match t + 1 is sufficiently heterogeneous, such that the outcome is sufficiently certain.Footnote 12 Second, players at risk of picking up a one-game suspension due to a critical number of yellow cards are fielded in match t if both the teams have similar chances of winning the game—even though their probability of being unavailable in match t + 1 thus increases. Third, a ‘strategic’ yellow card also might be more likely near the end of the game, to minimise the risk that the player receives yet another yellow card within the same match. A second yellow card (or yellow–red card) results in an immediate sending-off and a suspension for match t + 1 without ‘resetting’ the number of yellow cards.Footnote 13

In conclusion, because soccer players are central resources for accomplishing a team’s goals, we expect teams to use their resources strategically. Players are not infinitely available—due to the yellow card rule—so teams must decide when to restrict themselves. According to our strategic considerations, a team’s decision for or against self-restriction crucially depends on the heterogeneity of competition at time t and t+1. Thus, we expect two hypotheses to hold:

  1. (1)

    Teams self-restrict in t if the heterogeneity in match t is sufficiently strong.

  2. (2)

    Teams self-restrict in t if the heterogeneity in match \(t+1\) is sufficiently low.

While these hypotheses are based on selective observations, they also result from a theoretical model of a stylised round-robin tournament, which we present in the Appendix.

3 Empirical analysis

The empirical analysis uses a rich data set from men’s German top-level (Bundesliga) soccer, which comprises detailed information for all players on a game-by-game level and covers five seasons from 2011/12 to 2015/16. Each season is organised as a double round-robin tournament among 18 teams, so there are 34 games per team and season. The order of games is publicly known in advance of the season.

Our empirical approach is twofold. First, we focus on the starting line-up and examine the impact of heterogeneity of match t and t + 1 on the decision to field a yellowplayer or not. Second, we focus on yellow cards and investigate whether the incidence of receiving a fifth/tenth yellow card is related to the heterogeneity of match t or t + 1.

3.1 Measuring heterogeneity

Our analysis focuses on the impact of heterogeneity (in match t and \(t+1\)) on a team’s decisions to conserve resources in t and \(t+1\). We measure the heterogeneity of match t using players’ market values. The data stem from the website www.transfermarkt.de. We calculate the average market value of each team at the start of each season, considering all players listed in the squad.Footnote 14

Previous research has shown that market values have a high predictive power for outcomes of soccer games (e.g., Peeters, 2018). As average market values increase over the course of time, we use relative market values, that is we divide a team’s average market value by the seasonal average of all teams. The heterogeneity of match t then is calculated as the (absolute) difference between the team’s and the opponent’s relative market values.

Betting odds are another established measure of heterogeneityFootnote 15 in sport contests (e.g., Bartling et al. 2015; Deutscher et al., 2013; Sunde, 2009) and have proven to be an efficient forecasting instrument (see e.g. Forrest et al., 2005; Groll et al., 2015; Spann & Skiera, 2009). However, in our setting, betting odds have the disadvantage of being available only a few days prior to a match and thus only one game in advance. Consequently, the betting odds for match t + 1 may incorporate information about match t which are not available prior to match t, such as injuries or expulsions of certain players in match t. Therefore, closing odds for match t + 1 may differ from the expected heterogeneity of match t + 1 prior to match t. For this reason, we prefer market values as a more constant measure of heterogeneity between teams, acknowledging that there might be some inaccuracies due to time lags (Massey & Thaler,2013). The fact that both heterogeneity measures are closely correlated (r = 0.715) suggests that potential biases are rather small.Footnote 16 The results from regressions in which we calculate our heterogeneity measure from betting oddsFootnote 17 are qualitatively similar, see Appendix 3.

3.2 The starting line-up decision

In this section, we focus on a team’s starting line-up and its decision to field a yellowplayer or not. We hypothesise that this decision crucially depends on the expected heterogeneity of matches t and t + 1. We consider a team’s decision to not field such a player in match t as a case of self-restriction (see Fig. 1).

Fig. 1
figure 1

Timing of events (starting line-up decision)

According to Hypothesis 1, we expect yellowplayers not to be fielded in t if the heterogeneity of game t is sufficiently large. In other words, teams may self-restrict and conserve restricted resources in match t if they are sufficiently sure they will win or lose this match. According to Hypothesis 2, players threatened by a suspension may be more (less) likely to be fielded in t if the subsequent match t + 1 is sufficiently unbalanced (balanced), because in this case, a potential suspension of such a player in match t + 1, caused by a fifth yellow card in t, would be less (more) harmful to the team.

We evaluate roster data for each team and match, which refer to information about a team’s entire squad (i.e., starting players plus substitutes) on a particular matchday. Prior to each match, teams can nominate 18 players:11 players for the starting line-up and up to 7 substitutes.Footnote 18 A team’s squad is always announced shortly before a game starts. After excluding the first five roundsFootnote 19 and the final round of each seasonFootnote 20, goalkeepersFootnote 21, players who were never a starter in the respective time period, and missings, we are left with 37,673 player-game observations (769 players and 1260 matches).

Table 1 Yellowplayer-observations per matchday

Table 1 shows that the number of yellowplayers increases over the course of a season: The share of players at risk of a ban amounts to 3.5% in the first half of the season (matchdays 6–17) and 12% in the second half (matchdays 18–33).

We estimate a logit model to investigate the impact of heterogeneity in games t and \(t+1\) on a team’s decision to field a yellowplayer:

$$\begin{aligned} {\text {starting}}11_{i,t}&=\beta _0 + \beta _1 {\text {yellowplayer}}_{i,t} + \beta _2 {\text {Het}}_{i,t} + \beta _3 {\text {Het}}_{i,t+1} + \beta _4 {\text {yellowplayer}}_{i,t} \cdot {\text {Het}}_{i,t} \nonumber \\&\quad + \beta _5 {\text {yellowplayer}}_{i,t}\cdot {\text {Het}}_{i,t+1} + \gamma '{} \mathbf{X} + \varepsilon _{i,t}. \end{aligned}$$
(1)

The dependent variable, \({\text {starting}}11_{i,t}\), is a binary outcome measure that takes a value of 1 if player i starts in match t and 0 otherwise, meaning this player is left on the bench.Footnote 22 Our main variable of interest is the binary variable yellowplayer, such that yellowplayer = 1 indicates a player with a critical number of yellow cards (4, 9, or 14) prior to match t, and yellowplayer = 0 otherwise.

To isolate the effects of heterogeneity on the decision to choose a yellowplayer for the starting eleven (captured by \(\beta _4\) and \(\beta _5\)), we need to control for other factors that could affect the dependent variable. Therefore, \(\mathbf{X}\) is a vector of player- and game-specific control variables, including a proxy for a player’s importance to a team (% of minutes played), a player’s age and position, the team size (roster), game attendance (attendance), a dummy variable signalling regional rivalry games (derby), a dummy indicating whether the match is played away (away), a variable indicating the overall quality of both teams represented by the sum of the teams’ rankings prior to a matchday (quality), referee dummies, coach dummies, and data about the matchday (matchday).Footnote 23 We measure a player’s importance by the percentage of minutes that player is on the field in matches prior to a match t. The more minutes he plays prior to match t, the more important he is for a team, and, hence, the more likely it is that he is in the starting line-up in match t. Moreover, \(\alpha _{i}\) controls for unobserved player-specific effects. Finally, \(\varepsilon _{i,t}\) is the error term that captures all other unobserved factors that affects starting11.

Table 2 contains the descriptive statistics for the main variables. Note that yellowplayer = 1 applies to a total of 3,277 observations for 377 different players.

Table 2 Descriptive statistics—starting11

Table 3 presents the estimation results from our preferred model defined in Eq. (1) (columns (1)–(3)). As a robustness check, we add player fixed effects to the regression model (columns (4)–(6)). Yet, this means that the coefficient estimates solely depend on the variation of the dependent and independent variables within players. So, given the rather small number of ‘yellowplayer situations’ per player along with variations of \({\text {Het}}_{t}\) and \({\text {Het}}_{t+1}\), this approach should be seen as a second-best option.

Our results indicate three main conclusions. First, the percentage of minutes played prior to t increases the probability of being fielded in t. This finding is not surprising; the variable % of minutes played was designed explicitly to proxy for the player’s importance to the team. Second, the estimated coefficient of the first interaction term (yellowplayer * Het\(_{t}\)) \(\hat{\beta _4}\) is negative and differs significantly from zero at the 5% (columns (1) and (2)) and 10% significance level (columns (3)–(6)). That is, the tendency to choose a yellowplayer as a starter decreases with the heterogeneity of the current competition. In other words, the player is protected from the risk of suspension when the match appears to be decided in advance. We take this result as first (yet not strong) evidence in favour of our Hypothesis 1. To ease interpretation, Table 7 in the Appendix presents OLS estimates with standardised heterogeneity measures (z-score). A one standard deviation increase of Het\(_{t}\) is associated with a decrease of being in the starting eleven by 1.6% (2.5% when evaluated at the sample mean) for yellowplayers. As a complementary effect, players not at risk of a yellow-card suspension are more likely to start in match t when this match is more heterogeneous (Het\(_{t}\), columns (1)–(3)). Taken together, these findings can easily be linked with the discouragement effect, as we discuss in Sect. 4.

Table 3 Self-restriction and the starting line-up decision–logit regressions

Third, we do not find similar effects for the next round, which means that the heterogeneity of the upcoming match \(t+1\) does not affect a team’s decision (rows five and six). We refrain from taking this result as a disproof of Hypothesis 2 though, because assuming that teams put a player in the starting lineup in t just to provoke a ban in t + 1 would probably undervalue the importance of match t. Consequently, we proceed with a more subtle approach to study anticipating behaviour and self-restriction in round \(t+1\) in the next section.

3.3 The decision to receive yellow cards

In the second part of our analysis, we seek to provide evidence of strategic self-restriction in a setting that demands anticipating behaviour. In detail, we examine whether heterogeneity in upcoming rounds influences the probability of receiving a critical yellow card followed by a suspension.Footnote 24 In this context, we have to separate the decision to self-restrict from being self-restricted, because they refer to different points in time: Being self-restricted in t + 1 results from the decision to receive a fifth yellow card in match t (see Fig. 2). Hence, Hypothesis 1 states that teams are more likely to be self-restricted in match t + 1 (due to a fifth/tenth yellow card in t) if the heterogeneity in match t + 1 is sufficiently strong. Hypothesis 2 implies that self-restriction in match t + 1 is less likely if the heterogeneity in match t + 2 is sufficiently strong. Intuitively, this means that a team tends to self-restrict in t + 1 when it is important to have full resources in t + 2 as this match is expected to be tight.

Fig. 2
figure 2

Timing of events (decision to deliberately receive a fifth yellow card)

To test our hypotheses, it is necessary to further restrict our sample. First, we exclude players without any playing time and players who were never ‘booked’. Second, as we now rely on information from the match after the next, we also have to discard those observations where this information is not available. The final sample covers 29,555 observations.

Table 4 Descriptive statistics—yellow card

The dependent variable (yellow) is a dummy that indicates whether a player i has received an yellow card in match t. We suggest the following empirical model:

$$\begin{aligned} {\text {yellow}}_{i,t}=&\beta _0 + \beta _1 {\text {yellowplayer}}_{i,t} + \beta _2 {\text {Het}}_{i,t} + \beta _3 {\text {Het}}_{i,t+1} + \beta _4 {\text {Het}}_{i,t+2}\nonumber \\&+ \beta _5 {\text {yellowplayer}}_{i,t} \cdot {\text {Het}}_{i,t+1} + \beta _6 {\text {yellowplayer}}_{i,t}\cdot {\text {Het}}_{i,t+2} + \gamma '{} \mathbf{X} + \varepsilon _{i,t}, \end{aligned}$$
(2)

which closely resembles Eq. (1) in Sect. 3.2 but also includes round t + 2.Footnote 25 Table 4 presents the descriptive statistics.

Table 5 The effect of heterogeneity on yellow card suspensions—logit regression

The results from the logit regressions are presented in Table 5. Again, as a robustness check, player fixed effects were added to our preferred model (columns (5)–(8)). Our main finding is that the estimated \(\beta _5\) differs significantly from zero, meaning that the probability of reaching the limit of five yellow cards increases with the heterogeneity of the next round’s competition (yellowplayer * Het\(_{t+1}\)), in accordance with our Hypothesis 1. The result points to anticipating behaviour among competitors in a multi-stage tournament setting. Moreover, we find significant effects of the heterogeneity of the match after the next one on a yellowplayer’s probability to get booked (yellowplayer * Het\(_{t+2}\)). The likelihood of being suspended for the next match decreases with increasing heterogeneity of the match after next, which is in line with our Hypothesis 2.

Again, we demonstrate economic significance using OLS regressions and standardised values of Het\(_{t}\), Het\(_{t+1}\) and Het\(_{\mathrm{t}+2}\) in Table 11 in Appendix 3. A one standard deviation increase of Het\(_{t+1}\) (\(\hbox {Het}_{\mathrm{t}+2}\)) increases (decreases) the likelihood of getting a ‘booking’ for yellowplayers by around 1.5 percentage points (or 11% when evaluated at the sample mean).

Despite our set of controls, these findings do not establish direct proof of strategic behaviour though. Yet, if players provoke yellow card suspensions on purpose, we expect them to do so at the end of a game to minimise the risk of a second yellow card within the same game. We observe exactly this trend, such that when we split the sample according to playing time remaining (columns (2) to (4) and (6) to (8)), it becomes apparent that the estimated coefficients of the interaction term (fifth row) increases toward the end of a 90-minute game. In addition, we estimate a simple model of substitution where substituted is a binary variable equal to one if a player is substituted, and zero otherwise. The sample is restricted to yellowplayers only. Results suggest that the likelihood of being substituted after getting a fifth yellow card increases significantly in the heterogeneity of the next match (see Table 13 in Appendix 3).

We take both findings as evidence in favour of deliberate and strategic behaviour.

4 Conclusion and implications

This article provides evidence of strategic investment decisions in anticipation of the future need for resources. This empirical analysis of German Bundesliga soccer indicates that anticipated heterogeneity in contests decreases the willingness to invest resources. Players therefore, self-restrict in current and future competition when the heterogeneity between contestants is sufficiently large. Players at risk of a ban tend to be excluded from the starting eleven when the contest is unbalanced. Furthermore, players are more likely to receive a crucial yellow card that triggers a ban if the subsequent contest (which they will miss due to the ban) is sufficiently unbalanced. Finally, players also tend to receive a crucial yellow card if the second to next game (for which they return from their ban) is rather homogeneous.

These findings reflect a discouragement effect such that effort provision in contests falls short of what might be expected—considering the prize at stake—due to lopsided competition. We demonstrate that the decision to self-restrict is affected not only by the strength of the current opponent but also by the anticipated strength of the future competitor in the next stage of the tournament. Thus, our study not only expands the literature on ‘spillover’ or ‘carryover’ effects in tournaments with heterogeneous players, but also provides direct evidence for strategic behaviour in the field.

Our results have important policy implications for soccer itself but also for other contests like promotion tournaments or political campaigns. In multi-stage contests with multiple contestants, heterogeneity is detrimental to mediocre contestants. The best contestants can save resources against the weakest contestants without substantially lowering their chances of winning. Correspondingly, the weakest contestants save their resources in matches when playing the strongest opponents, because their ex ante chances of winning are very low. Abilities, and thus winning probabilities, are more balanced against mediocre teams, such that neither the strongest nor the weakest contestants conserve resources and instead meet them at full strength. Mediocre contestants cannot afford to self-restrict in any contest without risking a negative impact on the outcome. Thus, the structure of the contest and the possibility to self-restrict for the next competition creates a disadvantage for mediocre teams and has a stabilising effect on the top-level hierarchy. The schedule itself could be disadvantageous to some competitors too, especially if they face an opponent that most recently played in a heterogeneous match-up (Krumer and Lechner 2018). For the case of soccer (or any other sports played in league system), the findings are also critically important. The prize distribution is typically top-heavy, with the very top performers receiving disproportionately large parts of the prize at stake. The findings presented in this paper clearly indicate a disadvantage to mediocre teams and result in an entry barrier to join the top level and receive the highest awards.

Tournament organisers could counteract these effects by making self-restriction advantages less pronounced. First, in the case of soccer, they could change the rules, such that a crucial fifth yellow card would lead to a randomly drawn ban some time in the next five matches. Then the strategic element of self-restriction would diminish, because contestants would not know in advance whether they would lose their valuable resources at exactly the moment they do not need it. In the same way, penalties like in Ice Hockey where the offending player is sent off the field for a set number of minutes instead of yellow cards would prevent self-restricting behaviour. Second, schedule imbalances can be lowered if the round-robin format repeats multiple times (as in the Bundesliga, with two rounds of round-robin). Rearranging the sequence of games after each round then would prevent teams from facing opponents that profit from self-restriction because their last match was unbalanced.