Abstract
In this paper, we describe a model that learns semantic representations from the distributional statistics of language. This model, however, goes beyond the common bag‐of‐words paradigm, and infers semantic representations by taking into account the inherent sequential nature of linguistic data. The model we describe, which we refer to as a Hidden Markov Topics model, is a natural extension of the current state of the art in Bayesian bag‐of‐words models, that is, the Topics model of Griffiths, Steyvers, and Tenenbaum (2007), preserving its strengths while extending its scope to incorporate more fine‐grained linguistic information.