Sources for the history of English

Sources for the history of English

Download software
Dedicated journals
References

Since the early 1990’s a large number of corpora have become available which consist of texts covering periods in the history of English. The first major corpus in this area was the Helsinki Corpus of English Texts (1991, 1993) which includes extracts from various works ranging from Old English to the late modern period. At the University of Helsinki and many other universities, such as Brigham Young University in Provo, Utah, many more corpora have been compiled, focussing on a selection of texts, either of a particular genre, e.g. personal correspondence, medical texts, or from a particular region, e.g. Scottish texts. Other universities quickly followed suit and at present quite an impressive range of corpora for historical purposes is available.

Below a selection of corpora are listed to convey an impression of the variety and coverage of those currently available (2026). This is an expanding field and with each passing year new corpora appear, some of which are put in the public domain by their compilers. There are also many derived corpora, often for a specific research question, which may involve tagging a section of a corpus, see the Penn-Helsinki Parsed Corpus of Middle English.

Name Compiling institution / individuals

ARCHER, a corpus of British and American English from 1650-1990 Douglas Biber and associates in Northwestern Arizona University in collboration with colleagues at the University of Freiburg, Germany

Australian Corpus of English Department of Linguistics, Macquarie University, NSW, Australia

Bank of English University of Bermingham, sponsored by the publisher HarperCollins

British National Corpus Consortium under the aegis of Oxford University Press

The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English A parsed section of the original Helsinki corpus prepared by a number of linguists

Brown Corpus of Standard American English. W. Nelson Francis and Henry Kucera, Brown University, Providence, Rhode Island

Corpus of Nineteenth Century English Merja Kytö and associates, Uppsala University, Sweden

Corpus of English Dialogues 1560-1760 Merja Kytö, Uppsala University, Sweden and Jonathan Culpeper, Lancaster University, England

Corpus of Early English Correspondence Terttu Nevalainen and Helena Raumolin-Brunberg, University of Helsinki, Finland

Corpus of Early English Medical Writing Irma Taavitsainen, University of Helsinki, Finland

Corpus of Contemporary American English (COCA) Mark Davies, Brigham Young University, Provo, Utah, USA

Corpus of Historical American English (COHA) Mark Davies, Brigham Young University, Provo, Utah, USA

A Corpus of Irish English Raymond Hickey, University of Limerick, Ireland (originally packaged with Corpus Presenter, Software for Language Analysis, Amsterdam: John Benjamins, 2003, now available in an extended form at http://www.raymondhickey.com/index_(RH).html, Version 2026, Build 2.0, March 2026).

Corpus of Late Modern English Texts, 1710-1920 Henrik de Smet, KU Leuven, Belgium

Corpus of London Teenage Language (COLT) Anna-Britta Stenström and associates, Department of English, University of Bergen

Corpus of Middle English Prose and Verse University of Michigan, Michigan

Corpus of Religious Prose Thomas Kohnen, University of Cologne, Germany

Early English Books Online (EEBO, 1475-1700) ProQuest (educational company)

Freiburg-Brown Corpus of American English (FROWN) Christian Mair and associates, University of Freiburg, Germany

Freiburg-LOB Corpus of British English (FLOB) Christian Mair and associates, University of Freiburg, Germany

The Hansard Corpus (British paraliament records) University of Glasgow and others, 2014-2016

The Helsinki Corpus of Older Scots Anneli Meurman-Solin, Department of English, University of Helsinki, Finland

Innsbruck Corpus Archive of Middle English Texts (ICAMET) Manfred Markus, University of Innsbruck, Austria

International Corpus of English (ICE), collection of corpora from various anglophone countries, now (2005) partially completed Co-ordinated by the Department of English, University College London, England

Kolhapur Corpus of Indian English Shivaji University, Kolhapur

Lampeter Corpus of Early Modern English Tracts Josef Schmied, Technical University Chemnitz, Germany

Lancaster-Oslo-Bergen Corpus of British English Collaborative effort of the universities in the three cities named in title

London-Lund Corpus of Spoken English Departments of English at University College London, England and Lund University, Sweden

Middle English Medical Texts Irma Taavitsainen, Päivi Pahta and Martti Mäkinen, Department of English, University of Helsinki, Finland. Retrieval software by Raymond Hickey. Published by John Benjamins, 2005.

Northern Ireland Transcribed Corpus of Speech (NITCS) John Kirk, Department of English, Queen’s University, Belfast, Northern Ireland

Old Bailey Court Depositions Department of History, University of Sheffield

Penn-Helsinki Parsed Corpus of Middle English University of Pennsylvania, Pittsburgh, Pennsylvania

Santa Barbara Corpus of Spoken American English University of Santa Barbara, California

Zurich English Newspaper Corpus Udo Fries and associates, Department of English, Zurich University

Download software

If you want to see how corpus software works, you can download a free version of my software package, Corpus Presenter. This was published as a book and CD entitled Corpus Presenter, Software for Language Analysis (Amsterdam: John Benjamins) in 2003. It has undergone very considerable expansion and refinement in the 22 years since first published. The current version is Corpus Presenter 2026 (Build 2.0, March 2026). For more details, go to the Corpus Presenter website where the software can be downloaded and where many tutorial presentations can be found.

Dedicated journals

ICAME Journal, 1996- University of Bergen, Norway.
International Journal of Corpus Linguistics, 1996- Amsterdam: John Benjamins.
Corpus Linguistics and Linguistic Theory, 2005- Berlin: Mouton de Gruyter.
Corpus Pragmatics. International Journal of Corpus Linguistics and Pragmatics, 2017- Berlin: SpringerNature.

References

Aarts, Jan and Willem Meijs (eds.) 1990. Theory and practice in corpus linguistics. Amsterdam: Rodopi.
Aijmer, Karin and Bengt Altenberg (eds.) 1991. English corpus linguistics: Studies in honour of Jan Svartvik. London: Longman.
Aijmer, Karin and Bengt Altenberg (eds.) 2004. Advances in Corpus Linguistics.
Papers from the 23rd International Conference on English Language Research on Computerized Corpora (ICAME 23) Amsterdam: Rodopi.
Altenberg, Bengt 1991. A bibliography of publications relating to English computer corpora. In English computer corpora: Selected papers and bibliography, ed. by Stig Johansson and Anna-Brita Stenström. 355-396. Boston: Mouton de Gruyter.
Altenberg, Bengt and Sylviane Granger (eds) 2001. Lexis in contrast. Corpus-based approaches. Amsterdam: John Benjamins.
Biber, Douglas, Susan Conrad and Randi Reppen 1998. Corpus linguistics. Investigating language structure and use. Cambridge: Cambridge University Press.
Bridge, Derek and Stephen Harlow 1997. An introduction to computational linguistics. Oxford: Blackwell.
Briscoe, Ted and Brian Boguraev 1989. Computational lexicography for natural language processing. London: Longman.
Brookes, Gavin, Niall Curry and Robbie Love 2026. Applications of Corpus Linguistics. Cambridge: Cambridge University Press.
Butler, Christopher 1985. Computers in linguistics. Oxford: Blackwell.
Butler, Charles 1985. Statistics in linguistics. Oxford: Blackwell.
Connor, Ulla and Thomas A. Upton (eds) 2004. Applied Corpus Linguistics. A Multidimensional Perspective. Amsterdam: Rodopi.
Conrad, Susan and Douglas Biber 2001. Variation in English - Multi-dimensional Studies. Harlow, England; New York: Longman.
Culpeper, Jonathan, and Merja Kytö. 1997. ‘Towards a Corpus of Dialogues, 1550–1750’, Language in Time and Space. Studies in Honour of Wolfgang Viereck on the Occasion of His 60th Birthday, eds. Heinrich Ramisch and Kenneth Wynne. Stuttgart: Franz Steiner Verlag, 60–71.
Fries, Udo, Gunnel Tottie and Peter Schneider (eds) 1994. Creating and using English language corpora. Amsterdam: Rodopi.
Fries, Udo, Viviane Müller and Peter Schneider (eds) 1997. From Ælfric to the New York Times. Amsterdam: Rodopi.
Garside, Roger, Geoffrey Leech and Geoffrey Sampson (eds) 1987. The computational analysis of English. London: Longman.
Granger, Sylviane and Stephanie Petch-Tyson (eds) 2003. Extending the scope of corpus-based research. New applications, new challenges. Amsterdam: Rodopi.
Greenbaum, Sidney 1996. Comparing English world-wide. The international corpus of English. Oxford: Oxford University Press.
Häcker, Martina 1998. Syntax and semantics of adverbial clauses in present-day Scots. A corpus-based study. Berlin: Mouton de Gruyter.
Hampe, Beate 2001. Superlative verbs. A corpus-based study of semantic redundancy in English verb-particle constructions. Tübingen: Narr.
Hasselgard, Hilde and Signe Oksefjell 1999. Out of corpora. Studies in honour of Stig Johansson. Amsterdam: Rodopi.
Hauenschild, Christa and Susanne Heizmann (eds) 1997. Machine translation and translation theory. Berlin: Mouton de Gruyter.
Hickey, Raymond 1993a. ‘Applications of software in the compilation of corpora’ In: Merja Kytö, Matti Rissanen and Susan Wright (eds), Corpora across the centuries Amsterdam: Rodopi, pp. 165-86.
Hickey, Raymond 1993b. ‘A corpus of Irish English’, In: Merja Kytö, Matti Rissanen and Susan Wright (eds), Corpora across the centuries. Amsterdam: Rodopi, pp. 23-31.
Hickey, Raymond. 1997a. ‘The computer analysis of medieval Irish English’, In: Hickey, Kytö, Lancashire and Rissanen (eds), pp. 167-83.
Hickey, Raymond 2000. ‘Processing corpora with Corpus Presenter’, ICAME Journal 24, 65-84.
Hickey, Raymond 2003. Corpus Presenter. Processing software for language analysis. includes A Corpus of Irish English. Amsterdam: John Benjamins.
Hickey, Raymond, Merja Kytö, Ian Lancashire and Matti Rissanen (eds) 1997. Tracing the trail of time. Proceedings of the conference on diachronic corpora, Toronto, May 1995. Amsterdam: Rodopi.
Hockey, Susan M. 1980. A guide to computer applications in the humanities. London: Duckworth.
Hunston, Susan and Gill Francis 2000. Pattern grammar. A corpus-driven approach to the lexical grammar of English. Amsterdam: John Benjamins.
Johansson, Stig and Anna-Brita Stenström (eds) 1991. English computer corpora. Selected papers and research guide. Berlin: Mouton de Gruyter.
Jung, Udo O. H. (ed.) 1991. Computers in applied linguistics and language teaching. Frankfurt/Bern: Lang.
Kennedy, Graeme 1998. An introduction to corpus linguistics. London: Longman.
Kirk, John (ed.) 2000. Corpora galore. Analyses and techniques in describing English. Amsterdam: Rodopi.
Krug, Manfred 2000. Emerging English Modals: A Corpus-Based Study of Grammaticalization [Topics in English Linguistics]. Berlin/New York: Mouton de Gruyter.
Kytö, Merja 1993. Manual to the diachronic part of the Helsinki corpus of English texts. 2nd. edition. Helsinki: Department of English.
Kytö, Merja. 1999. ‘Collocational and Idiomatic Aspects of Verbs in Early Modern English: A Corpus-based Study of MAKE, HAVE, GIVE, TAKE, and DO’, Collocational and Idiomatic Aspects of Composite Predicates in the History of English, eds. Laurel J. Brinton and Minoji Akimoto. Amsterdam/Philadelphia: Benjamins, 167–206.
Kytö, Merja, and Suzanne Romaine. 1997. ‘Competing Forms of Adjective Comparison in Modern English: What Could Be More Quicker and Easier and More Effective?’, To Explain the Present. Studies in the Changing English Language in Honour of Matti Rissanen (Mémoires de la Société Néophilologique 52), eds. Terttu Nevalainen and Leena Kahlas-Tarkka. Helsinki: Société Néophilologique, 329–52.
Kytö, Merja and Matti Rissanen 1988. ‘The Helsinki Corpus of English Texts: Classifying and coding the diachronic part’. In Kytö, Ihalainen and Rissanen (eds), pp. 169-80.
Kytö, Merja and Matti Rissanen 1992. ‘A language in transition: The Helsinki Corpus of English texts’, ICAME Journal 16: 7-27.
Kytö, Merja, and Matti Rissanen. 1997. ‘Language Analysis and Diachronic Corpora’, Tracing the Trail of Time. Proceedings from the Second Diachronic Corpora Workshop, New College, University of Toronto, Toronto, May 1995, eds. Raymond Hickey, Merja Kytö, Ian Lancashire, and Matti Rissanen. Amsterdam and Atlanta, GA: Rodopi, 9–22.
Kytö, Merja, Ossi Ihalainen, and Matti Rissanen (eds.) 1988. Corpus linguistics hard and soft. Amsterdam: Rodopi.
Kytö, Merja, Juhani Rudanko, and Erik Smitterberg. 2000. ‘Building a Bridge between the Present and the Past: A Corpus of 19th-century English’, ICAME Journal 24: 85–97.
Kytö, Merja (ed.). forthcoming. New Vistas into Victorian English: Studies in 19th-century Morpho-syntax. Publisher??
Kytö, Merja, Matti Rissanen and Susan Wright (eds) 1994. Corpora across the centuries. Amsterdam: Rodopi.
Lawler, John and Helen Aristar Dry (eds) 1998. Using computers in linguistics. A practical guide. London: Routledge.
Leech, Geoffrey and Christopher N. Candlin 1986. Computers in English language teaching and research. London: Longman.
Leech, Geoffrey, Greg Myers and Jenny Thomas (eds) 1995. Spoken English on computers. Transcription, mark-up, application. London: Longman.
Leech, Geoffrey, Paul Rayson and Andrew Wilson 2001. Word Frequencies in Written and Spoken English: based on the British National Corpus. London, New York: Longman.
Leitner, Gerhard (ed.) 1992. New directions in English language corpora. Methodology, results, software developments. Berlin: Mouton de Gruyter.
Lindquist, Hans and Christian Mair (eds) 2004. Corpus Approaches to Grammaticalization in English. Amsterdam: John Benjamins.
Ljung, Magnus (ed.) 1997. Corpus-based studies in English. Amsterdam: Rodopi.
Mair, Christian and Marianne Hundt (eds) 2000. Corpus Linguistics and Linguistic Theory. (Proceedings of ICAME 20). Amsterdam, Atlanta, GA: Rodopi.
Mason, Oliver 2000. Programming for corpus linguistics. Edinburgh: Edinburgh University Press.
McEnery, Tom and Andrew Wilson 2001. Corpus linguistics. An introduction. 2nd edition. Edinburgh: Edinburgh University Press.
Meurman-Solin, Anneli 1997. ‘Text profiles in the study of language variation and change’, in Hickey et al., pp. 199-214.
Meyer, Charles F. 2023. English corpus linguistics. An introduction. Second edition. Cambridge: Cambridge University Press.
Miall, David S. (ed.) 1990. Humanities and the computer. New directions. Oxford: Clarendon Press.
Moon, Rosamund 1998. Fixed Expressions and Idioms in English. A Corpus-Based Approach. Oxford: Clarendon Press.
Nevalainen, Terttu 1997. ‘Ongoing work on the Corpus of Early English Correspondence’, in Hickey et al., pp. 81-90.
Nevalainen, Terttu and Helena Raumolin-Brunberg (eds) 1996. Sociolinguistics and Language History. Studies based on the Corpus of Early English Correspondence. Amsterdam: Rodopi.
Nelson, Gerald, Sean Wallis, Bas Aarts ????. Exploring natural language. Working with the British Component of the International Corpus of English. Amsterdam: John Benjamins.
Oakes, M.P. 1998. Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press.
Ooi, Vincent B. Y. 1998. Computer corpus lexicography. Edinburgh: Edinburgh University Press.
Partington, Alan 1988. Patterns and meanings. Using corpora for English language research and teaching. Amsterdam: John Benjamins.
Percy, Carol, Charles F. Meyer and Ian Lancashire (eds) 1996. Synchronic corpus linguistics. Amsterdam: Rodopi.
Pérez-Guerra, Javier 1999. Historical English syntax. A statistical corpus-based study on the organisation of English Modern English sentences. München: Lincom.
Peters, Pam, Peter collins and Adam Smith (eds) 2002. New Frontiers of Corpus Research.
Papers from the Twenty First International Conference on English Language Research on Computerized Corpora Sydney 2000. Amsterdam: Rodopi.
Raumolin-Brunberg, Helena 1997. ‘Incorporating sociolinguistic information into a diachronic corpus of English’, in Hickey et al., pp. 105-18.
Renouf, Antoinette (ed) 1998. Explorations in corpus linguistics. Amsterdam: Rodopi.
Renouf, Antoinette and Andrew Kehoe (eds) 2006. The Changing Face of Corpus Linguistics. Amsterdam: Rodopi.
Reppen, Randi, Susan M. Fitzmaurice and Douglas Biber (eds) 2002. Using Corpora to Explore Linguistic Variation. Amsterdam: John Benjamins.
Rissanen, Matti, Merja Kytö and Kirsi Heikkonen (eds) 1997. English in transition. Corpus-based studies in linguistic variation and genre styles. Berlin: Mouton de Gruyter.
Sampson, Geoffrey 1995. English for the computer. The SUSANNE corpus and analytic scheme. Oxford: Oxford University Press.
Schmid, Hans-Jörg 2000. English abstract nouns as conceptual shells. From corpus to cognition. Berlin: Mouton-de Gruyter.
Scott, Mike and Geoff Thompson (eds) 2000. Patterns of text. In honour of Michael Hoey. Amsterdam: John Benjamins.
Smitterberg, Erik 2005. The progressive in 19th-century English. A process of integration. Amsterdam: Rodopi.
Stenström, Anna-Britta, Gisle Andersen and Ingrid Kristine Hasund (eds) 2002. Trends in Teenage Talk. Corpus compilation, analysis and findings. Amsterdam: John Benjamins.
Stubbs, Michael 1996. Text and corpus analysis. Computer assisted studies of language and culture. Oxford: Blackwell.
Stubbs, Michael 2000. Word and phrases. Corpus-studies of lexical semantics. Oxford: Blackwell.
Svartvik, Jan and Randolph Quirk (eds) 1980. A corpus of English conversation. Lund: Gleerup.Thomas, Jenny and Mick Short (eds) 1996. Using corpora for language research. London: Longman.
Tognini-Bonelli, Elena 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins.
Trotta, Joe 2000. Wh-Clauses in English. Aspects of Theory and Description. Amsterdam: Rodopi.
Wichmann, Anne, Steven Fligelstone, Tony McEnry and Gerry Knowles (eds) 1997. Teaching and language corpora. London: Longman.
Zampolli, Antonio, Nicoletta Calzolari and Martha Palmer (eds) 1994. Current issues in computational linguistics. In Honour of Don Walker. Dordrecht: Kluwer.

Name	Compiling institution / individuals
ARCHER, a corpus of British and American English from 1650-1990	Douglas Biber and associates in Northwestern Arizona University in collboration with colleagues at the University of Freiburg, Germany
Australian Corpus of English	Department of Linguistics, Macquarie University, NSW, Australia
Bank of English	University of Bermingham, sponsored by the publisher HarperCollins
British National Corpus	Consortium under the aegis of Oxford University Press
The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English	A parsed section of the original Helsinki corpus prepared by a number of linguists
Brown Corpus of Standard American English.	W. Nelson Francis and Henry Kucera, Brown University, Providence, Rhode Island
Corpus of Nineteenth Century English	Merja Kytö and associates, Uppsala University, Sweden
Corpus of English Dialogues 1560-1760	Merja Kytö, Uppsala University, Sweden and Jonathan Culpeper, Lancaster University, England
Corpus of Early English Correspondence	Terttu Nevalainen and Helena Raumolin-Brunberg, University of Helsinki, Finland
Corpus of Early English Medical Writing	Irma Taavitsainen, University of Helsinki, Finland
Corpus of Contemporary American English (COCA)	Mark Davies, Brigham Young University, Provo, Utah, USA
Corpus of Historical American English (COHA)	Mark Davies, Brigham Young University, Provo, Utah, USA
A Corpus of Irish English	Raymond Hickey, University of Limerick, Ireland (originally packaged with Corpus Presenter, Software for Language Analysis, Amsterdam: John Benjamins, 2003, now available in an extended form at http://www.raymondhickey.com/index_(RH).html, Version 2026, Build 2.0, March 2026).
Corpus of Late Modern English Texts, 1710-1920	Henrik de Smet, KU Leuven, Belgium
Corpus of London Teenage Language (COLT)	Anna-Britta Stenström and associates, Department of English, University of Bergen
Corpus of Middle English Prose and Verse	University of Michigan, Michigan
Corpus of Religious Prose	Thomas Kohnen, University of Cologne, Germany
Early English Books Online (EEBO, 1475-1700)	ProQuest (educational company)
Freiburg-Brown Corpus of American English (FROWN)	Christian Mair and associates, University of Freiburg, Germany
Freiburg-LOB Corpus of British English (FLOB)	Christian Mair and associates, University of Freiburg, Germany
The Hansard Corpus (British paraliament records)	University of Glasgow and others, 2014-2016
The Helsinki Corpus of Older Scots	Anneli Meurman-Solin, Department of English, University of Helsinki, Finland
Innsbruck Corpus Archive of Middle English Texts (ICAMET)	Manfred Markus, University of Innsbruck, Austria
International Corpus of English (ICE), collection of corpora from various anglophone countries, now (2005) partially completed	Co-ordinated by the Department of English, University College London, England
Kolhapur Corpus of Indian English	Shivaji University, Kolhapur
Lampeter Corpus of Early Modern English Tracts	Josef Schmied, Technical University Chemnitz, Germany
Lancaster-Oslo-Bergen Corpus of British English	Collaborative effort of the universities in the three cities named in title
London-Lund Corpus of Spoken English	Departments of English at University College London, England and Lund University, Sweden
Middle English Medical Texts	Irma Taavitsainen, Päivi Pahta and Martti Mäkinen, Department of English, University of Helsinki, Finland. Retrieval software by Raymond Hickey. Published by John Benjamins, 2005.
Northern Ireland Transcribed Corpus of Speech (NITCS)	John Kirk, Department of English, Queen’s University, Belfast, Northern Ireland
Old Bailey Court Depositions	Department of History, University of Sheffield
Penn-Helsinki Parsed Corpus of Middle English	University of Pennsylvania, Pittsburgh, Pennsylvania
Santa Barbara Corpus of Spoken American English	University of Santa Barbara, California
Zurich English Newspaper Corpus	Udo Fries and associates, Department of English, Zurich University