Recent advances in both computing power and statistical thinking have allowed the use of huge amounts of data in a way not feasible in earlier times. In this course, we will learn how to use these new methods and data sources to investigate the structure of language. Of course, language is a multimodal phenomenon, i.e., it exists not just in the form of written text, but also in spoken form, and these different modes require different approaches. We will learn about the different methods needed to approach these different modalities of language. In learning about these statistical applications, we will be making use of the commonly used R programming language. Previous knowledge of R would of course make things easier, but is not a prerequisite for participation. Potential students who are very apprehensive about programming may also want to consider the parallel course (with the same title of Corpus Linguistics), taught by my colleague Ghattas Eid, which will have less of a computational focus.