Sources for the history of English
Download software
Dedicated journals
References
Since the early 1990’s a large number of corpora have become available which consist of texts covering periods in the history of English. The first major corpus in this area was the Helsinki Corpus of English Texts (1991, 1993) which includes extracts from various works ranging from Old English to the late modern period. At the University of Helsinki and many other universities, such as Brigham Young University in Provo, Utah, many more corpora have been compiled, focussing on a selection of texts, either of a particular genre, e.g. personal correspondence, medical texts, or from a particular region, e.g. Scottish texts. Other universities quickly followed suit and at present quite an impressive range of corpora for historical purposes is available.
Below a selection of corpora are listed to convey an impression of the variety and coverage of those currently available (2026). This is an expanding field and with each passing year new corpora appear, some of which are put in the public domain by their compilers. There are also many derived corpora, often for a specific research question, which may involve tagging a section of a corpus, see the Penn-Helsinki Parsed Corpus of Middle English.
| Name | Compiling institution / individuals |
| ARCHER, a corpus of British and American English from 1650-1990 | Douglas Biber and associates in Northwestern Arizona University in collboration with colleagues at the University of Freiburg, Germany |
| Australian Corpus of English | Department of Linguistics, Macquarie University, NSW, Australia |
| Bank of English | University of Bermingham, sponsored by the publisher HarperCollins |
| British National Corpus | Consortium under the aegis of Oxford University Press |
| The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English | A parsed section of the original Helsinki corpus prepared by a number of linguists |
| Brown Corpus of Standard American English. | W. Nelson Francis and Henry Kucera, Brown University, Providence, Rhode Island |
| Corpus of Nineteenth Century English | Merja Kytö and associates, Uppsala University, Sweden |
| Corpus of English Dialogues 1560-1760 | Merja Kytö, Uppsala University, Sweden and Jonathan Culpeper, Lancaster University, England |
| Corpus of Early English Correspondence | Terttu Nevalainen and Helena Raumolin-Brunberg, University of Helsinki, Finland |
| Corpus of Early English Medical Writing | Irma Taavitsainen, University of Helsinki, Finland |
| Corpus of Contemporary American English (COCA) | Mark Davies, Brigham Young University, Provo, Utah, USA |
| Corpus of Historical American English (COHA) | Mark Davies, Brigham Young University, Provo, Utah, USA |
| A Corpus of Irish English | Raymond Hickey, University of Limerick, Ireland (originally packaged with Corpus Presenter, Software for Language Analysis, Amsterdam: John Benjamins, 2003, now available in an extended form at http://www.raymondhickey.com/index_(RH).html, Version 2026, Build 2.0, March 2026). |
| Corpus of Late Modern English Texts, 1710-1920 | Henrik de Smet, KU Leuven, Belgium |
| Corpus of London Teenage Language (COLT) | Anna-Britta Stenström and associates, Department of English, University of Bergen |
| Corpus of Middle English Prose and Verse | University of Michigan, Michigan |
| Corpus of Religious Prose | Thomas Kohnen, University of Cologne, Germany |
| Early English Books Online (EEBO, 1475-1700) | ProQuest (educational company) |
| Freiburg-Brown Corpus of American English (FROWN) | Christian Mair and associates, University of Freiburg, Germany |
| Freiburg-LOB Corpus of British English (FLOB) | Christian Mair and associates, University of Freiburg, Germany |
| The Hansard Corpus (British paraliament records) | University of Glasgow and others, 2014-2016 |
| The Helsinki Corpus of Older Scots | Anneli Meurman-Solin, Department of English, University of Helsinki, Finland |
| Innsbruck Corpus Archive of Middle English Texts (ICAMET) | Manfred Markus, University of Innsbruck, Austria |
| International Corpus of English (ICE), collection of corpora from various anglophone countries, now (2005) partially completed | Co-ordinated by the Department of English, University College London, England |
| Kolhapur Corpus of Indian English | Shivaji University, Kolhapur |
| Lampeter Corpus of Early Modern English Tracts | Josef Schmied, Technical University Chemnitz, Germany |
| Lancaster-Oslo-Bergen Corpus of British English | Collaborative effort of the universities in the three cities named in title |
| London-Lund Corpus of Spoken English | Departments of English at University College London, England and Lund University, Sweden |
| Middle English Medical Texts | Irma Taavitsainen, Päivi Pahta and Martti Mäkinen, Department of English, University of Helsinki, Finland. Retrieval software by Raymond Hickey. Published by John Benjamins, 2005. |
| Northern Ireland Transcribed Corpus of Speech (NITCS) | John Kirk, Department of English, Queen’s University, Belfast, Northern Ireland |
| Old Bailey Court Depositions | Department of History, University of Sheffield |
| Penn-Helsinki Parsed Corpus of Middle English | University of Pennsylvania, Pittsburgh, Pennsylvania |
| Santa Barbara Corpus of Spoken American English | University of Santa Barbara, California |
| Zurich English Newspaper Corpus | Udo Fries and associates, Department of English, Zurich University |
Download software
If you want to see how corpus software works, you can download a free version of my software package, Corpus Presenter. This was published as a book and CD entitled Corpus Presenter, Software for Language Analysis (Amsterdam: John Benjamins) in 2003. It has undergone very considerable expansion and refinement in the 22 years since first published. The current version is Corpus Presenter 2026 (Build 2.0, March 2026). For more details, go to the Corpus Presenter website where the software can be downloaded and where many tutorial presentations can be found.
Dedicated journals
ICAME Journal, 1996- University of Bergen, Norway.
International Journal of Corpus Linguistics, 1996- Amsterdam: John Benjamins.
Corpus Linguistics and Linguistic Theory, 2005- Berlin: Mouton de Gruyter.
Corpus Pragmatics. International Journal of Corpus Linguistics and Pragmatics, 2017- Berlin: SpringerNature.