Abstract
Recent work on the evolution of social contracts and conventions has often used models of bargaining games, with reinforcement learning. A recent innovation is the requirement that every strategy must be invented either through through learning or reinforcement. However, agents frequently get stuck in highly-reinforced “traps” that prevent them from arriving at outcomes that are efficient or fair to the both players. Agents face a trade-off between exploration and exploitation, i.e. between continuing to invent new strategies and reinforcing strategies that have already become highly reinforced by yielding high rewards. In this paper I systematically study the relationship between rates of invention and the efficiency and fairness of outcomes in two-player, repeated bargaining games. I use a basic reinforcement learning model with invention, and five variations of this model, designed introduce various forms of forgetting, to prioritize more recent reinforcement, or to maintain a higher rate of invention. I use computer simulations to investigate the outcomes of each model. Each models shows qualitative similarities in the relationship between the efficiency and fairness of outcomes, and the relative amount of exploration or exploitation that takes place. Surprisingly, there are often trade-offs between the efficiency and the fairness of the outcomes.