1 Introduction

Settings characterized by high uncertainty on outcomes are likely to impair fully rational decisions, which renders learning from previous attempts oftentimes not fruitful. As a consequence, this process may lead to an inefficient allocation of efforts in a trial-and-error search regime, that can easily result in losses (Dosi et al. 2001). A prominent example of such setting is the pharmaceutical industry (Pammolli et al. 2011), where the races for drug discovery lead to enormous investments by companies that are not always fruitful and may induce companies to abstain from investing.Footnote 1 Similar industries, characterized by a high investment in research and development (R&D), may vary in the structure of earnings: if the competition is based on incremental innovation, then the successful agents manage to capture a larger market share (Breitmoser et al. 2010). In other cases, the successful competitor gains the whole market similar to a lottery (Sutton 1998), e.g., by patenting new first-in-class drugs. The aim of this study is to identify, by means of experiments, how the payoff structure and the level of competition affect the learning and participation dynamics of agents.

Our candidate setting is the Tullock rent-seeking contest in which subjects compete for a single prize, whose assignment probability depends on the relative share of subjects’ efforts (Tullock 1980). In rent-seeking contests, subjects persistently deviate from what standard game theoretical models predict. A survey of the experimental contest literature by Dechenaux et al. (2015) has highlighted that contestants spend on average considerably more than the theoretical equilibrium.Footnote 2 Most studies find an overall decrease in expenditures when subjects repeatedly play this game (e.g., Cason et al. 2012), which is usually attributed to learning, without further specifying the process behind it. Another empirical regularity that has not yet received much attention is that many participants choose not to spend any resources to win the contest. By looking across a subset of experimental studies on winner-take-all contests (Abbink et al. 2010; Cason et al. 2012; Mago et al. 2016; Sheremeta 2010; Sheremeta and Zhang 2010; Price and Sheremeta 2011; Sheremeta 2011), we find that zero expenditures are indeed a frequent, sometimes modal, choice of participants. The fraction of zeros is higher in larger competing groups and persistent even in later stages of the experiments. If both stylized facts are the consequence of learning, then we should investigate more carefully this process, forming the main contribution of this paper.

To explore how the uncertainty on outcomes and the number of opponents affect learning strategies and behavioral patterns, such as zero expenditures, in contest settings over a long time horizon, we set up a laboratory experiment in which we compare, over 60 periods, participants’ expenditures choices in the standard winner-take-all (WTA) Tullock contest versus a non-probabilistic equivalent proportional-prize (PP) contest.

The WTA contest allows only one winner of the prize, whose winning probability is proportional to the share of own investments over the total group investments. Applications range from the seminal rent-seeking hypothesis by Tullock (1980), to political polls (Snyder 1989), sport tournaments (Szymanski 2003), patent races (Fudenberg et al. 1983; Harris and Vickers 1985) and cryptocurrency mining (Dimitri 2017). In the deterministic PP contest, contestants receive a fraction of the prize proportional to their share of total group investments. The PP contest provides a ‘replication’ of standard oligopoly settings by varying the payoff structure. The early work by Friedman (1958) constitutes a first attempt to use the PP contest to model the allocation of advertisement budget across media. Proportional-prize assignments are also observed in electoral schemes (Schram and Sonnemans 1996), lobbying (Krueger 1974) and labor compensation (Kruse 1992). Under the assumption of risk neutrality, both contest settings are equivalent in terms of equilibrium predictions. Varying the contest type and the group size of three and five contestants, we create a 2 \(\times \) 2 experimental design, which allows us to test how styles of learning, and consequently contestant behavior, change in environments characterized by uncertainty over outcomes compared to those with a tighter link between effort and outcome.

We find that the average levels of effort in PP contests are well described by the standard game theoretical predictions. Conversely, in the WTA contests we are unable to distinguish total group expenditures between the two group sizes. The decline of average expenditures in the five-player WTA contests coincides with a significant increase in what we label ‘dropouts’, i.e., zero expenditures that are not justified either by the myopic best response or weighted fictitious play. Dropouts are instead significantly less frequent in PP contests and, if anything, decrease over time. The distinct expenditure patterns found between the two settings suggest that differences in the contests’ payoff structures affect subjects’ learning process. As our main contribution to the literature on learning in games, we test this hypothesis by estimating the experience-weighted attraction model (EWA, Camerer and Ho 1999). The results reveal that WTA contestants learn significantly more from their own past payoffs than players in the PP contests (experiential or reinforcement learning Roth and Erev 1995). Moreover, our results support recent findings by Alós-Ferrer and Ritschel (2018) on subjects’ frequent use of the reinforcement heuristic ’win–stay, lose–shift’ rather than a more reasoned approach based on myopic best response. Therefore, the strong reliance on experiential learning in WTA settings can explain both the decreasing expenditures over time and the increasing propensity of zero expenditures choices. The more often a WTA participant loses, the more she will discourage positive expenditures up to non-participation. Further analyses confirm that expenditures decline significantly with an increase in prior accumulated losses. Since experienced losses are more frequent in larger competing groups, expenditures are expected to decrease at a faster rate. These results carry practical relevance for contest designers, who wish to maximize repeated participation, and are comparable to empirical regularities found in industrial dynamics, so-called ‘industry shakeouts’, in which many firms decide to drop out of an industry during its competitive expansion phase.

Previous experimental studies have explored subjects’ behavior in contests, whose design or methods partially overlap with ours by varying the group size (e.g., Lim et al. 2014), the payout function (Chowdhury et al. 2014; Ghosh and Hummel 2018), and the matching protocol (Baik et al. 2015), but no one has so far explored the interaction between treatments varying the group size as well as the outcome uncertainty. More importantly, differences in behavior across contest structures have been proposed to stem from differences in learning (Fallucchi et al. 2013), which has not been rigorously tested yet. Similarly, the high fraction of zero expenditures has often been attributed to myopic best reply without proper analysis. In this paper, we investigate experimentally how expenditures dynamics, such as frequently observed zero expenditure choices, can be explained by distinct learning mechanisms across contest structures.

The paper is organized as follows: Sect. 2 introduces the contest forms and offers a brief review of the related experimental literature on contests. We analyze subjects’ behavior in the two contest structures and their equivalence in expected payoff terms. Section 3 presents the experimental design and procedures. The experimental results are presented and discussed in Sect. 4. The first, descriptive, part of our result section highlights differences in group expenditures and in the fraction of zero expenditures. The second part presents the EWA model estimations and shows support for different learning modes across contest games. We conclude in Sect. 5 with a discussion of our findings and highlight how these results may well represent some empirical regularities in winner-take-all settings outside the experimental literature.

2 Theoretical background and experiments

The Tullock model of rent seeking (Tullock 1980) is extensively used to model a variety of contests (Konrad 2009). In the simplified model, often referred to as winner-take-all (WTA) or lottery contest, N agents compete for a prize of size V, where \(x_i\) is the amount of expenditure of agent i and X is the aggregate expenditure. The individual profits \(\pi _i\) depend on all agents’ expenditures, the prize assignment and a homogeneous initial endowment denoted by e:

$$\begin{aligned} \pi _i={\left\{ \begin{array}{ll} e-x_i+V & \text{ with } \text { probability} \quad x_i/X \\ e-x_i & \text{ otherwise }. \end{array}\right. } \end{aligned}$$
(1)

Therefore, the probability of one agent receiving the prize increases with own expenditures, but decreases with the expenditures of others.

In an alternative version of the contest, also known as proportional-prize (PP) or share contest, the prize is not assigned to one agent only, but divided across all agents N proportionally to their own expenditures \(x_i\) and the aggregate expenditures X. Thus, each agent with positive expenditure receives a share of the prize. The payoff function in this case is equal to:

$$\begin{aligned} \pi _i=e-x_i + V (x_i/X). \end{aligned}$$
(2)

The two contests share the same expected payoffs and, under the assumption of risk neutrality, the same equilibrium predictions, where \(x_i^*=V(N-1)/N^2\). However, the realized payoff in the WTA contest differs from the one in the PP contest due to the stochastic winner-take-all nature of the game.

As it is often difficult to capture the expenditures’ dynamics with field data, laboratory experiments have become increasingly popular in recent years to characterize behavior in different contest settings.Footnote 3 Many experiments support pervasive over-dissipation in WTA contests paired with high heterogeneity in effort levels across contestants (e.g., Millner and Pratt 1989; Sheremeta and Zhang 2010; Mago et al. 2016). Based on a sample of 30 studies, Sheremeta (2013) report a median overbidding rate of 72% compared to the equilibrium predictions. Contrary to the WTA contests, PP contests display less variation in individual spending behavior and a quicker convergence over time toward the predicted equilibrium level (Fallucchi et al. 2013; Chowdhury et al. 2014; Cason et al. 2010, 2020).

Table 1 Meta-analysis of zero expenditures plays in WTA contests

Since it is common to focus on mean expenditures when analyzing overbidding in contests, the choice of zero expenditures has often been overlooked. We summarize the data of seven contest experiments, considering in total of ten independent standard repeated WTA treatments (as specified in Eq. 1). Table 1 shows that zero expenditures are the modal choice in four-player WTA contests making up 12% of the total choices and are increasing over time (see Fig. 1). In the two-player settings, the share of zero expenditures is lower (3.9%) and stable over time. Yet, most of them, especially in later periods, are not a myopic best response to previous opponent choices. We refer to these zero expenditures as ‘dropout’. In four-player treatments, on average 50% of the zero expenditures are ‘dropouts’—a share that is increasing over time.

Fig. 1
figure 1

Fraction of zero expenditures over time from meta-analysis

The literature offers multiple explanations for overbidding patterns in WTA contests such as: bounded rationality (Lim et al. 2014), heterogeneous preferences (Shupp et al. 2013), and utility from winning (Schmitt et al. 2004). Therefore, zero expenditures are usually associated with the best response to over-dissipation. This cannot be an exclusive explanation, given the collected evidence from prior studies. An alternative motivation that we explore in this paper is that WTA contestants choose zero investments, because they encounter difficulties to adapt optimal expenditures levels due to the stochastic nature of the outcome.

We are aware of a handful of studies that analyze learning in repeated games with stochastic outcomes. They differ from our experimental setting and learning identification strategy in many aspects. Yet, they support the use of simple learning heuristics by decision makers. Gunnthorsdottir and Rapoport (2006) find that reinforcement learning (Roth and Erev 1995) explains aggregate efforts in a two-stage group game with an inter-group lottery in the first stage. Reinforcement learning combined with directional learning (Selten and Stoecker 1986) describe well individuals’ behavior of a Tullock contest with group size uncertainty (Boosey et al. 2017). In addition, learning spillovers between PP and WTA contests are found in a within-subjects experiment by Masiliūnas (2019).

Even though learning behavior in the PP contests has so far only received minor attention in the literature, the expected payoff structure provides a useful benchmark to observe behavior in the standard contest, and allows us to treat them as a special case of the more commonly studied Cournot oligopoly.Footnote 4 Evidence on learning from oligopoly experiments suggests that players employ a mix of sophisticated and imitative learning. For example, Bigoni and Fort (2013), with an application of a modified EWA model to a Cournot game under endogenous information disclosure, find that participants use a mixture of reinforcement, imitation and belief learning, with the latter accounting for the major share.Footnote 5

Lastly, learning models have been used to explain behavior in repeated auction experiments. From the bidders’ perspective, auctions look similarly stochastic to lotteries, since the value of the prize is usually drawn for each bidder from a random distribution, and thus bids submitted by rivals appear uncertain. In addition, overbidding is commonly observed in first-price auctions (Filiz-Ozbay and Ozbay 2007). Experiential and observational learning is found to reduce overbidding in first-price common value auctions (Garvin and Kagel 1994), while directional learning can explain repeated individual bids (Neugebauer and Selten 2006).

Through comparison of the behavior in both contest structures in a laboratory setting, we can identify how the randomness of the outcome inhibits learning, and therefore impacts the expenditures’ dynamics that we observe in stochastic environments. Our analysis extends to competition under different group size. The reason for this choice is twofold: firstly, previous evidence has shown that varying the number of active firms in an oligopoly industry has important implications on the level of competitiveness (e.g., Huck et al. 2004); secondly, we check if the increase in the level of competition may exacerbate zero expenditures, as suggested by our meta-analysis, linked to lower earnings in proportional-prize contests or more frequent expected losses in winner-take-all contests.

3 Experimental design and procedures

The experiment was conducted at the University of Nottingham using the software z-tree (Fischbacher 2007) where 140 students from a wide range of disciplines were recruited through the online recruiting system ORSEE (Greiner 2015). No participant took part in more than one session or had taken part in any previous contest experiments.

At the beginning of each session, participants were randomly matched into groups that remained the same for the whole experiment. We opted for this fixed group matching, as it allows us to test whether participants myopically best respond to the choices of their opponents.Footnote 6 Moreover, this is the standard adopted in experiments that test for learning in other settings (e.g., Huck et al. 1999; Bigoni and Fort 2013) and therefore allows us to compare the results of our PP contests with previous evidence of learning in oligopoly settings.

Participants did not know the identities of the other subjects in the room with whom they were grouped. They were given instructions for the experiment (reproduced in “Appendix C”) which were read aloud by the experimenter. Any questions were answered by the experimenter in private, and no communication between participants was allowed. No information passed across groups during the entire session.

In all sessions, the decision-making part of the experiment consisted of 60 periods.Footnote 7 In each period, subjects were endowed with 1000 points and competed to win a prize of 1000 points. Subjects simultaneously chose how many contest tokens to purchase, at the price of one point per contest token, and any points not used to purchase tokens were added to their total balance. At the end of the period, each subject possibly received contest earnings which were added to their total balance. If none of the subjects bought any tokens, the prize was not assigned. We adopt a 2 \(\times \) 2 design where treatments differ in the group size, with three (3) or five (5) contestants,Footnote 8 and the contest structure, proportional prize (P) and winner-take-all (W).Footnote 9 We conducted two sessions for each treatment, either with 15 or 20 subjects, resulting in ten independent observations in treatments with three-player groups and eight independent observations in treatments with five-player groups. A summary of the treatments is reported in Table 2.

After each period, subjects were reminded of their own choice and informed of the total expenditures of the other members of the group to which they belong and their own earnings. We opted for this “partial” feedback disclosure to rule out imitative behavior among contestants, although we could not rule out the other simple behavioral rule of imitating the average expenditures by opponents in the previous round.Footnote 10 Subjects accumulated points across the 60 periods and at the end of each session were paid in private and cash. Earnings averaged £9.30 for a session lasting about 60 min. At the end of the experiment, we conducted a socio-demographic questionnaire in which we also elicited risk attitude using a survey measure validated in a representative subject pool (see Dohmen et al. 2011).Footnote 11 The two contest structures share the same expected payoff and, under the assumption of risk neutrality, the same Nash equilibria. Introducing risk aversion could potentially alter theoretical predictions; however, the direction and extent of risk aversion on contest expenditures remain ambiguous under general conditions (Skaperdas and Gan 1995; Konrad and Schlesinger 1997). Also in the experimental literature, there seems to exist no consensus regarding the effect of risk attitude on contest expenditures (see for example Shupp et al. 2013; Mago et al. 2013). To be able to compare both contest structures, we thus need to maintain the assumption of risk neutral contestants. The theoretical prediction of symmetric group expenditures under risk neutrality, given by \(x^* = NV(N - 1)/N^2\), corresponds to \(666.{\bar{6}}\) points for three-player contests and to 800 points for five-player contests. Hence, predicted equilibrium expenditures at individual level are 222 and 160, respectively.

Table 2 Experimental 2 \(\times \) 2 Composition

4 Results

We lay out the results in two subsections. In Sect. 4.1, we illustrate the spending and participation behavior of contestants of all treatments. In Sect. 4.2, we illustrate how EWA estimation results differ across treatments and analyze expenditures using Tobit mixed effect regressions. Reported p values (p) for within-treatment comparisons between the two halves of the experiment are based on Wilcoxon matched-pairs signed-rank tests, while for between-treatment comparisons we report p values of two-sided Wilcoxon rank-sum tests, treating each group as a single, independent observation.

4.1 Group expenditures and participation

Result 1 (a) Average total expenditures in proportional-prize contests increase significantly with an increase in the group size. (b) Average total expenditures in winner-take-all contests, contrary to predictions, do not significantly increase with an increase in the group size.

Figure 2 shows the average group expenditure patterns of all treatments relative to their predicted theoretical equilibria. In all cases, the initial expenditures lie substantially above the Nash equilibrium predictions, and are higher in larger groups. In the PP treatments, mean expenditures decline quickly to a level close to the equilibrium and exhibit no noticeable time trend thereafter. This result is in line with previous experimental evidence. Instead, we find the results of the WTA contest surprising: over-expenditures compared to the predicted equilibrium levels are persistent throughout the experiment, but average expenditures do not differ across different group sizes. Moreover, over the longer horizon, total expenditures are lower in larger groups, sometimes even below the level predicted by the Nash equilibrium. We report in Table 3 the average group expenditures for the first, second half and overall periods and p values of within- and between-treatment comparisons. In the PP contests, we find that average total expenditures are significantly higher in 5P than in 3P for all intervals considered (all \(p \leqslant 0.01\)). Contrary to the PP contests, between-treatment comparisons of WTA contests at group level confirm the pattern observed in Fig. 2, with an overall similar level of expenditures for all intervals considered (all \(p\geqslant 0.25\)).Footnote 12

Fig. 2
figure 2

Average group expenditure for all treatments over time

Table 3 Average group expenditures, group comparisons for 1st half, 2nd half and all periods between and within treatments

This last finding contradicts theoretical results that predict higher group expenditures for larger groups. Previous studies that explored behavior in contests with different group size have supported the theoretical claim, yet considered only one-shot decisions (Anderson and Stafford 2003) or ten repetitions (Lim et al. 2014).Footnote 13 Over a longer time horizon, average group expenditures seem to show different dynamics.

Result 2 (a) The fraction of zero expenditures is significantly higher in the winner-take-all contests than in the proportional-prize contests and increases significantly over time for larger winner-take-all groups.

Our results question the hypothesis that expenditures in the WTA contests converge toward group size-dependent equilibria. We thus look for other justifications that explain the decrease in expenditures in another common finding from contest experiments: the zero expenditures. To get a first glimpse of the prevalence and dynamics of zero expenditures in contests, we compute the total fraction of zero expenditures across treatments. The total share of zeros increases with group size (3P vs. 5P: \(p=0.01\), 3W vs. 5W: \(p=0.01\)) and is more pronounced in the WTA contest (3P vs. 3W: \(p<0.01\), 5P vs. 5W: \(p<0.01\)) reaching up to 40% in the late game of 5W. Moreover, as shown by the black lines in Fig. 3, the fraction of zero expenditures is stable across time in 3W (periods 1–30 vs. 31–60 \(p=0.92\)) and increases in 5W (periods 1-30 vs. 31-60 \(p=0.09\)). Result 1 can thus be explained by the increasing fraction of zero expenditures in 5W which lead to a faster decrease in average group efforts than in 3W.

The first justification given to the pronounced fraction of zero expenditures in the WTA contests could be that players expect their opponents to overbid.Footnote 14 Since we cannot observe players’ expectations on future opponent expenditures, we assume that expectations are formed based on past opponent behavior. Thus, we assess if zero expenditures are a best response (BR) given the history of opponents’ decisions using two forms of ‘weighted-fictitious play’. A choice j of player i in period \(t+1\) is justified under weighted fictitious play if j maximizes the following expression:

$$\begin{aligned} \mathop {{{\,\mathrm{argmax}\,}}}\limits _j \Bigg (\frac{\phi ^{t-1}\pi _i\big (s^j_i,s_{-i}(1)\big )+\cdots + \phi ^{1}\pi _i\big (s^j_i,s_{-i}(t-1)\big )+ \pi _i\big (s^j_i,s_{-i}(t)\big )}{\phi ^{t-1}+\cdots +\phi +1}\Big ). \end{aligned}$$
(3)
Fig. 3
figure 3

Fraction of zero expenditures across treatments over time

The parameter \(\phi \) acts as a discount factor. If \(\phi =0\), then the expression reduces to the myopic best-response case of \(\pi _i\big (s^j_i,s_{-i}(t-1)\big )\) which denotes the hypothetical payoff of player i choosing an expenditures level j given the choices of its opponents \(s_-i\) at time t (reported as the dashed lines in Fig. 3). At the other extreme, when \(\phi =1\), all hypothetical past payoffs from strategy j are weighted equally for each period. In this case, the best choice is the one that would have resulted in the highest average payoff across all rounds played, also known as ‘fictitious play’ (reported as the gray lines in Fig. 3).

Figure 3a–d shows the fraction of zero expenditures over time for each of the four treatments. Most zero choices can neither be justified by myopic best responses nor by fictitious play (average fraction of zeros not justified by myopic bestvresponses: 66% in 3W, 62% in 5W; by fictitious play: 87% in 3W, 81% in 5W).Footnote 15 Yet, myopic best responses account for more choices than fictitious play, consistent with the previous findings by Rockenbach and Waligora (2016) that WTA contestants hold myopic beliefs. We hence focus on the zero expenditures that are not explained by a myopic best response and define them as ‘dropouts’. In case of a ‘dropout’, a player decides to spend nothing even if it is payoff maximizing to bid a positive amount based on a myopic best response. The average fraction of dropout per round can be assessed from Fig. 3 as the difference between the fraction of total zero expenditures and the fraction of zero expenditures under a myopic best response.

Result 2 (b) The share of dropouts is higher in winner-take-all contests and increases in the five-player treatment.

Dropouts in PP contests are more frequent in larger groups (3P vs. 5P: \(p=0.04\)), yet their fraction is relatively low when compared to WTA treatments and stable over time (average fraction of dropout over all choices in 3P: 1%, 5P: 8%, 3W: 11%, 5W: 21%). In the WTA contests, the dropout fractions are higher and differ not only in group size, but also with respect to the PP contests (3P vs. 3W: \(p<0.01\), 5P vs. 5W: \(p=0.01\), 3W vs. 5W: \(p=0.02\)). The quota of dropout is highest in 5W and increases significantly over time (periods 1–30 vs. 31–60 \(p=0.04\)), while we do not find such an increase in 3W (periods 1–30 vs. 31–60 \(p=0.22\)).

To characterize the strength of zero bids on the individual level, we show in Fig. 4 the percentage of periods in which contestants choose to bid zero after their initial zero effort choice. For each treatment, players that display at least one zero bid are ordered based on the frequency of their subsequent zero effort choices in the remaining periods. The frequency of zero efforts varies across contestants implying that we cannot equate period-specific zero expenditure choices with contestants abstaining from the participation throughout the remaining part of the experiment. Yet, for some contestants choosing not to bid becomes an important strategy, especially in the 5-player WTA treatment, where 14 players continue to bid zero in more than half of the remaining periods.

Fig. 4
figure 4

Frequency of individual zero effort choices across treatments

From the previous analysis, we deduce that the common decreasing pattern in average expenditures, which we observe in all treatments, may be driven by different behavior, depending on the contest structure. Although in proportional-prize contests we are far from having the whole contestants managing to achieve the equilibrium level, the decrease in average expenditures hints toward a process of learning to play optimal strategies.Footnote 16 Conversely, the decrease of expenditures in the winner-take-all contests can largely be attributed to an overall ‘dropout’ effect that is not explained by forms of fictitious play. We hypothesize that the differences in the contests’ payoff structures have non-negligible effects on the learning process. The results from the EWA estimation in the next section further explore this thought.

4.2 EWA model estimation and interpretation

In the previous section, we have shown that the structure and group size of the contest affect total expenditures. Overall, the decrease in expenditures over time suggests that subjects display different behaviors across contests. We provide further insights to verify whether differences across treatments are driven by a different learning path. We estimate for each treatment the EWA model (Camerer and Ho 1999). In our estimations, we group the 1001 expenditure choices into \(K=11\) bins of equal distance and round all choices to the closest bin to facilitate comparability.Footnote 17

Every contestant i forms a set of ‘attractions’ \(A_i^j(t)\), which get recursively reinforced or weakened after every round. Attractions are updated as follows:

$$\begin{aligned} A_i^j(t)=\frac{\phi \cdot N(t-1) \cdot A_i^j(t-1)}{N(t)}+\frac{\delta \cdot E \left[ \pi (s_i^{-j}(t),s_{-i}(t))\right] }{N(t)}+\frac{(1-\delta )\cdot \pi (s_i^{j}(t),s_{-i}(t))}{N(t)}, \end{aligned}$$
(4)

where \(s_i^j(t)\) refers to the actual strategy j chosen by player i in period t,  while \(s_i^{-j}(t)\) denote all the possible strategies in the same period. Defining \(s_{-i}(t)\) as the strategy vector chosen by all other players, the payoff of player i choosing j in t is given by \(\pi _i(s_i^j(t), s_{-i}(t))\). Similarly, \(E \left[ \pi _i(s_i^{-j}(t), s_{-i}(t))\right] \) denotes the hypothetical payoff of player i that would have been expected for any possible strategy j given the strategies of all other players. Since the prize assignment in the PP contest is deterministic, this expression simplifies to \(\pi _i(s_i^{-j}(t), s_{-i}(t))\) and is equivalent to the expected hypothetical payoff of the WTA contest given risk neutrality. N(t) is a weight on past experience. The faster N(t) increases in t, the less players focus on immediate current payoffs at time t. The weights update with the following rule:

$$\begin{aligned} N(t)=(1-\kappa )\cdot \phi \cdot N(t-1)+1, \qquad t \ge 1. \end{aligned}$$
(5)

The parameter \(\kappa \) determines the growth rate of attractions, which reflects how quickly players lock into a strategy. The current attractions of the array of possible strategies J determine the probability of player i choosing strategy j in the next period \(t+1\). A logistic transformation links previous attractions to the choice probabilities (Eq. 6). Thus, the higher a contestant’s past attraction for a specific strategy, the higher is the probability that this strategy will be pursued.

$$\begin{aligned} P_i^j(t+1)=\frac{e^{\lambda \cdot A_i^j(t)}}{\sum ^{J}_{k=1} e^{\lambda \cdot A_i^k(t)}}. \end{aligned}$$
(6)
Table 4 Description of estimated EWA parameters

Attractions for each strategy are updated via weighting the previous experience, the current forgone payoffs, and the current received payoff (given by the summands in Eq. 4). Current forgone payoffs are the payoffs that could have been expected if the contestant had chosen differently by keeping the opponents’ strategies fixed. Formation of attractions via evaluating forgone payoffs is equivalent to belief learning (a version of weighted fictitious play, Brown 1951), whereas focusing exclusively on realized payoffs (\(\delta =0\)) reduces the model to reinforcement learning (Roth and Erev 1995). Thus, the EWA model incorporates two canonical learning models via the parameter delta (\(\delta \)).Footnote 18

We show in Table 5 for each treatment the simulation results of the EWA expenditure distribution using varying deltas (0, 0.5, 1) and the true expenditure frequencies. As the delta increases toward belief learning, the simulated choices show less variation and roughly resemble the true expenditure frequencies in the PP contests. The true distribution of expenditures in the WTA contest is more disperse, which is why we assume that these contestants rely less on belief learning than PP contestants.

Table 5 Simulated EWA expenditure distribution and observed expenditure distribution

Result 3 Proportional-prize contests allow for a mixture of reinforcement and adaptive learning. In winner-take-all contests, learning is mostly driven by previous own payoffs.

For each treatment, the EWA model is estimated over the first half and the complete sample via maximum Likelihood.Footnote 19 Estimated parameters and their theoretical domains are summarized in Table 4. We refrain from freely estimating initial attractions by following the approach used in Ho et al. (2008) and choose the initial attractions to maximize the likelihood of observing first period choice frequencies.Footnote 20

We report in Table 6 the estimated parameters with their clustered standard errors and confidence intervals in parentheses.Footnote 21 The main parameter of interest in our estimation analysis is delta, which indicates the degree of belief learning used in the game. In the PP contests, the deltas are significantly greater than zero, between 0.59 and 0.67, suggesting that players adopt a mixture between reinforcement learning (considering own realized payoffs) and belief learning (considering all own hypothetical payoffs) in the game. On the contrary, in WTA contests players rely mostly on reinforcement (experiential) learning, as deltas are significantly closer to zero. This is especially true for the early stage of the game (0.02 for 3W and 0.13 for 5W). As players get more familiar with the game, they shift slightly toward belief learning in all treatments. Nevertheless, the difference between PP and WTA contests remains substantial. One can argue that belief learning is more complex, since it requires the player to evaluate for each possible strategy the expected payoffs given the opponents’ set of strategies. The evaluation of hypothetical scenarios could be more difficult in WTA contests due to the discrepancy between expected and realized payoffs. In fact, it has been shown theoretically that individuals lock-in at inefficient levels of expenditures in low delta scenarios (Pangallo et al. 2017).

Table 6 EWA estimation results across treatments

Other parameter estimates are similar across various estimations. We find N0 below one in all estimations, which indicates that pre-game attractions are offset completely by first period attractions.Footnote 22 Kappa (\(\kappa \)), which measures the growth property of attractions, is significantly different from zero in the first half of the game, indicating that the importance of past attractions grows over time. The decay rates of past attractions indicated by phi (\(\phi \)) are significantly different from zero, but similar across and within treatments, on average between \(75\%\) in the PP contest and \(81\%\) in the WTA contest. Although the difference is negligible, this indicates that players may rely more on past experience in WTA contests. This result is consistent with the idea that present payoffs in these treatments reveal less useful information. Finally, lambda (\(\lambda \)) is significantly different from zero for all specifications, indicating that contestants’ choices are influenced by attractions formed in past rounds.

The EWA parameter estimation uncovers a noticeable difference in learning between the PP and WTA contests. A strong reliance on own realized payoffs by subjects in WTA contests conveys that current decisions depend on the success of the previous ones. Victories strongly reinforce the probability of playing the corresponding level of expenditures, while losses make the corresponding expenditures level less attractive in the forthcoming periods. If a subject experiences frequent losses, positive expenditures levels will, over time, become less appealing to the advantage of zero expenditures. Following this thought, the fraction of zero expenditures should be higher in contests with a higher share of non-winners. This reconciles with our results in Sect. 4.1 where the fraction of dropouts is significantly higher in WTA treatments than in PP treatments and increases significantly over time in larger groups. If WTA contestants rely indeed on previous own experiences, then we should observe a drop of expenditures after a series of losses.

Result 4 (a) Winner-take-all contestants significantly decrease expenditures after the accumulation of losses. (b) The effect is more pronounced for bigger groups.

We expect a negative relationship between the series of losses prior to time t and the expenditure level at time t. We assess this relationship, using a set of Tobit mixed effect models.Footnote 23 The model assumes that expenditure levels are left censored at 0 with random effects at individual and group level. In all models, we regress the expenditures at t on previous own expenditures, previous opponents’ expenditures (linear and squared) and a variable capturing time trend.Footnote 24 We check the relationship between prior losses and current expenditures via the variable loss streak defined as the accumulated, (negative) payoff from consecutive losses prior to time t relative to contestants’ endowment. After every incurred loss, the variable decreases by the foregone profits that would have been received by choosing to not spend the endowment. Consequently, a loss streak remains unchanged for zero expenditures. If the contest has been won in the previous period, the contestant’s loss streak is reset to zero.

Models (1), for 3W, and (4), for 5W, in Table 7 contain only control variables. In both lotteries, the influence of prior expenditures on current choices is positive and significant, while the effect of period advancement is negative and significant. In models (2) and (5) we find a positive and significant effect of the loss streak, indicating that the accumulation of losses indeed leads to a decrease in expenditures. This effect is further analyzed in (3) and (6), where we additionally control for the contestants’ gender. As a an exploratory result, we find that women spend on average more than menFootnote 25. From (3), we find that splitting the loss streak variable with respect to gender does not lead to significant effects on expenditures. Yet when increasing the group size (6), the accumulation of prior losses significantly decreases the expenditures for both genders, for women more sharply than for men. This last result has been similarly observed in experimental tournaments (Buser 2016).

The regression results support the claim that decreasing expenditures in the WTA contest are driven by previous lottery outcomes. With the cumulation of losses, contestants tend to lower their expenditures and may, over time, dropout of the contest.Footnote 26 Since individual losses accumulate longer when facing more opponents, the decrease of expenditures is more pronounced in 5W. Although loss aversion may be thought as the mechanism that leads to lower expenditures (as shown by Kong 2008; Shupp et al. 2013; Chowdhury et al. 2018), we should be careful in distinguishing the role of loss aversion from repeatedly experiencing losses. As pointed out in many studies summarized by Kermer et al. (2006), individuals overestimate the impact of losses in prospect compared to losses they realize, and learn from experience that losses have a less emotional impact than estimated ex-ante. Therefore, while loss aversion certainly has an impact on expenditures, this is not the only effect that impacts the decline over time. The EWA estimations and the results in Table 7 incorporate in addition the effect of experiencing losses.

Table 7 Tobit mixed effects regression on WTA expenditures

5 Final discussion

Our contribution to the literature is two-pronged. First, we offer a clean comparison of how subjects behave across different contest structures. Similarly to previous studies of the PP contest and oligopolies, we find that the expenditures in this setting converge well to standard predictions. Unlike the PP contests, we observe that group size changes from three to five players do not affect total investments in the WTA contests. Secondly, we assessed the role of learning as one of the possible explanations for behavioral differences across different contest structures.

The behavioral discrepancy between contest types might be connected to the probabilistic prize assignment in the WTA treatment that influences how contestants form their choices. We hypothesize that PP and WTA contestants use distinct learning strategies that may also explain another expenditure peculiarity that tends to be overlooked: the modal choice in WTA contests is oftentimes zero.

We find that varying the group size from three to five players does not affect total investments in WTA contests, in discord with theoretical predictions. The decrease of investments in large-group WTA contests is influenced by an increasing fraction of zero expenditures. A substantial share of these zero expenditures is not justified by myopic best responses (or other forms of fictitious play) and defined by us as ‘dropout’. Even though the dropout rate is lower for PP contests, the average expenditures converge to theoretical predictions which suggests that spending behavior across contests is formed by distinct learning processes. A parameter estimation of the EWA model in all treatments indicates that WTA contestants decide mostly based on the information gathered from their own realized payoffs. Since success in the WTA contest is stochastic, subjects who base their investment decision entirely on their previous decisions are less able to adapt their strategies in a payoff-optimizing fashion. On the contrary, participants in the PP contests rely on a mixture of own realized payoffs as well as foregone payoffs. This may be facilitated by the deterministic nature of the payoffs in the PP contest. The distinct learning patterns estimated in the two contest environments, do not significantly change over time and are robust to changes in the number of players.

Repeated losses that subjects face in the WTA contests decrease over time the reinforcement of positive expenditure levels and consequentially zero expenditures, irregardless of being a myopic best response, become more appealing. Our regression results add to this thought by showing that the cumulation of prior losses leads to a significant decrease in expenditures more pronounced in bigger groups. The higher dropout rate in the five-player WTA treatment is presumably driven by the faster accumulation of individual losses. As a consequence, an increase in the group size does not necessarily increase total rent seeking.

In a society, full of embedded winner-take-all contests, our results obtain practical relevance. First, it may be beneficial in rent-seeking situations, such as lobbying, that an increase in WTA contestants does not significantly change total rent-seeking effort. On the contrary, in case a high sum of efforts is favored, such as in a philanthropic fund-raising lottery, increasing the pool of participants might not lead to the desired effect if the individual loss probability increases simultaneously. Second, decisions not to invest can be aggregated on a macro-level to the so-called ‘industry shakeout’, i.e., a significant reduction in the number of active firms during the expansion of new industries (see Gort and Klepper 1982; Klepper 1996, 1997). One traditional explanation for firm exit dynamics postulates that market participants use Bayesian updating to learn their true ability of operating in the market (Jovanovic 1982; Jovanovic and Nyarko 1995). Thus, firms decide to exit assessing their past performance. Our finding that participants invest less after losses and potentially drop out of the contest due to non-reinforcement of positive payoffs takes a similar line. At the same time, we stress the difficulty of players to form rewarding learning strategies in highly uncertain domains, such as pharmaceutical R&D.

The presented work calls for a better understanding on how WTA contestants learn in highly uncertain environments with large group sizes, such as pharmaceutical R&D, and how their decision-making abilities can be improved. In addition, the observed ‘dropout’ effect and its relationship to group size and possibly other contest characteristics deserve increased attention to better bridge the gap between experimental findings and theoretical explanations of contestants’ behavior.