It is well-established that toddlers can correctly select a novel referent from an ambiguous array in response to a novel label. There is also a growing consensus that robust word learning requires repeated label-object encounters. However, the effect of the context in which a novel object is encountered is less well-understood. We present two embodied neural network replications of recent empirical tasks, which demonstrated that the context in which a target object is encountered is fundamental to referent selection and word (...) learning. Our model offers an explicit account of the bottom-up associative and embodied mechanisms which could support children’s early word learning and emphasises the importance of viewing behaviour as the interaction of learning at multiple timescales. (shrink)