Sources for varieties of English
Text corpora
Historical records
Information about varieties in previous centuries can be gleaned from a number of sources. These can be classified by type. Each type has its own advantages and disadvantages. The more types of historical record available, the better. Frequently, one has to make do with a limited set of sources and reconstruct features of varieties on the basis of fragmentary material.
| Emigrant letters | People who emigrated in previous centuries wrote back home, usually to maintain contact with friends and relatives. Because of this, letters from emigrants are available in archives today. Such material is usually non-prescriptive, i.e. written in a colloquial style without undue consideration of normative grammar. Hence it is a good source of information on varieties and, when used judiciously, can be useful for linguistic analyses. |
| Personal accounts | Apart from letters, there are also documents of various kinds in which speakers offer personal accounts of their lives and experiences. Some of these have been recorded deliberately, e.g. the accounts of life under slavery or in other adverse conditions. Such texts are not normally written using the variety in question, unless verbatim transcripts of what was said by informants are used. In this context one could also mention court records in which the statements of accused persons and/or defendants were written down by court clarks. |
| Dialect glossaries | From the 17th century onwards, a certain antiquarian interest in dialect vocabulary can be observed. Collections of words from diverse regions of the British Isles are available and are often a good source of material on the varieties spoken there. Such material is almost entirely lexical, i.e. information about pronunciation and grammar is not normally included. |
| Literary satires | Already with Chaucer (in the 14th century) one finds dialect material used to characterise figures in literary works. Shakespeare and Ben Jonson are prominent Elizabethan writers who kept up this tradition. Many satires contain figures from the Celtic regions, i.e. Irish, Scottish or Welsh characters, especially in drama from the 17th century onwards. The accuracy of such portrayals is often doubtful because many of the authors were English and did not have a first-hand knowledge of the speech they were satirising. In addition, there are limits on the linguistic features which can be represented using so-called ‘eye dialect’, i.e. changes in spelling to indicate dialect traits in writing. |
| Rhyming material | End rhyme, in poetry and sometimes in drama, can be a source of information on the pronunciation of vowels. For instance, one could check whether eat and great or past and waste rhyme for a particular author. This could indicate whether the first words in each pair still had the vowel /e:/ or /a:/ respectively. |
| Prescriptive comments | From the 18th century onwards, there are many works in which authors complain about regional pronunciation and grammar. This is connected with the rise of prescriptivism, i.e. strict notions of what is ‘correct’ in language and what variety was taken to be socially acceptable, and by implication what other forms were not. Authors often cite supposedly ‘incorrect’ usage and thus inadvertently supply present-day linguists with information about regional varieties of English in previous centuries. |
Selected references for different types of historical records
Emigrant letters
Hickey, Raymond (ed.) 2019. Keeping in Touch. Emigrant letters across the English-speaking world. Amsterdam: John Benjamins.
Montgomery, Michael 1995. ‘The linguistic value of Ulster emigrant letters’, Ulster Folklife 41: 1-15.
Personal accounts
Fitzpatrick, David 1994. Oceans of Consolation. Personal Accounts of Irish Migration to Australia. Cork: University Press.
Rickford, John R. and Jerome S. Handler 1994. ‘Textual evidence on the nature of early Barbadian speech, 1676-1835’, Journal of Pidgin and Creole Languages 9.2: 221-55.
Stanihurst, Richard 1965 [1577]. ‘The description of Ireland’ Chronicles of England, Scotlande and Irelande edited by R. Holinshed. London. Reprinted by Ams Press.
Dialect glossaries
Barnes, William (ed.) 1867. A Glossary, with Some Pieces of Verse, of the Old Dialect of the English Colony in the Baronies of Forth and Bargy, County of Wexford, Ireland Formerly Collected by Jacob Poole. London: J. R. Smith.
Ray, John 1674. A collection of English words not generally used. London.
Vallancey, Charles 1788. ‘Memoir of the language, manners, and customs of an Anglo-Saxon colony settled in the baronies of Forth and Bargie, in the County of Wexford, Ireland, in 1167, 1168, 1169’, Transactions of the Royal Irish Academy 2, 19-41.
Literary satires
Bartley, J. O. 1954. Teague, Shenkin and Sawney: Being an Historical Study of the Earliest Irish, Welsh and Scottish Characters in English Plays. Cork: University Press.
Bliss, Alan J. 1979. Spoken English in Ireland 1600-1740. Twenty-seven Representative Texts Assembled and Analysed. Dublin: Cadenus Press.
Duggan, G. C. 1969 [1937]. The Stage Irishman: A History of the Irish Play and Stage Characters from Earliest Times. Dublin and Cork/London: Talbot Press.
Jonson, Ben 1969. The Complete Masques. Edited by Stephen Orgel. New Haven, London: Yale University Press.
Sullivan, James 1980. ‘The validity of literary dialect: evidence from the theatrical portrayal of Hiberno-English’, Language and Society 9: 195-219.
Rhyming material
Kniezsa, Veronika 1985. ‘Jonathan Swift’s English’, in Siegmund-Schulze (ed.), pp. 116-24.
Siegmund-Schultze, Dorothea (ed.) 1985. Irland. Gesellschaft und Kultur. [Ireland. Society and culture] Vol. 4. Halle: University Press.
Prescriptive comments
Patterson, David 1860. The Provincialisms of Belfast and the Surrounding Districts Pointed Out and Corrected; to which is Added an Essay on Mutual Improvement Societies. Belfast: Alexander Mayne.
Sheridan, Thomas 1781. A Rhetorical Grammar of the English Language Calculated Solely for the Purpose of Teaching Propriety of Pronunciation and Justness of Delivery, in that Tongue. Dublin: Price.
Sheridan, Thomas 1967 [1780]. A general dictionary of the English language. 2 vols. Menston: The Scolar Press.
Sheridan, Thomas 1970 [1762]. A Course of Lectures on Elocution. Hildesheim: Georg Olms.
| Name | Compiling institution / individuals |
| ARCHER, a corpus of British and American English from 1650-1990 | Douglas Biber and associates in Northwestern Arizona University in collboration with colleagues at the University of Freiburg, Germany |
| Australian Corpus of English | Department of Linguistics, Macquarie University, NSW, Australia |
| Bank of English | University of Bermingham, sponsored by the publisher HarperCollins |
| British National Corpus | Consortium under the aegis of Oxford University Press |
| The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English | A parsed section of the original Helsinki corpus prepared by a number of linguists |
| Brown Corpus of Standard American English. | W. Nelson Francis and Henry Kucera, Brown University, Providence, Rhode Island |
| Corpus of Nineteenth Century English | Merja Kytö and associates, Uppsala University, Sweden |
| Corpus of English Dialogues 1560-1760 | Merja Kytö, Uppsala University, Sweden and Jonathan Culpeper, Lancaster University, England |
| Corpus of Early English Correspondence | Terttu Nevalainen and Helena Raumolin-Brunberg, University of Helsinki, Finland |
| Corpus of Early English Medical Writing | Irma Taavitsainen, University of Helsinki, Finland |
| Corpus of Contemporary American English (COCA) | Mark Davies, Brigham Young University, Provo, Utah, USA |
| Corpus of Historical American English (COHA) | Mark Davies, Brigham Young University, Provo, Utah, USA |
| A Corpus of Irish English | Raymond Hickey, University of Limerick, Ireland (originally packaged with Corpus Presenter, Software for Language Analysis, Amsterdam: John Benjamins, 2003, now available in an extended form at http://www.raymondhickey.com/index_(RH).html, Version 2026, Build 2.0, March 2026). |
| Corpus of Late Modern English Texts, 1710-1920 | Henrik de Smet, KU Leuven, Belgium |
| Corpus of London Teenage Language (COLT) | Anna-Britta Stenström and associates, Department of English, University of Bergen |
| Corpus of Middle English Prose and Verse | University of Michigan, Michigan |
| Corpus of Religious Prose | Thomas Kohnen, University of Cologne, Germany |
| Early English Books Online (EEBO, 1475-1700) | ProQuest (educational company) |
| Freiburg-Brown Corpus of American English (FROWN) | Christian Mair and associates, University of Freiburg, Germany |
| Freiburg-LOB Corpus of British English (FLOB) | Christian Mair and associates, University of Freiburg, Germany |
| The Hansard Corpus (British paraliament records) | University of Glasgow and others, 2014-2016 |
| The Helsinki Corpus of Older Scots | Anneli Meurman-Solin, Department of English, University of Helsinki, Finland |
| Innsbruck Corpus Archive of Middle English Texts (ICAMET) | Manfred Markus, University of Innsbruck, Austria |
| International Corpus of English (ICE), collection of corpora from various anglophone countries, now (2005) partially completed | Co-ordinated by the Department of English, University College London, England |
| Kolhapur Corpus of Indian English | Shivaji University, Kolhapur |
| Lampeter Corpus of Early Modern English Tracts | Josef Schmied, Technical University Chemnitz, Germany |
| Lancaster-Oslo-Bergen Corpus of British English | Collaborative effort of the universities in the three cities named in title |
| London-Lund Corpus of Spoken English | Departments of English at University College London, England and Lund University, Sweden |
| Middle English Medical Texts | Irma Taavitsainen, Päivi Pahta and Martti Mäkinen, Department of English, University of Helsinki, Finland. Retrieval software by Raymond Hickey. Published by John Benjamins, 2005. |
| Northern Ireland Transcribed Corpus of Speech (NITCS) | John Kirk, Department of English, Queen’s University, Belfast, Northern Ireland |
| Old Bailey Court Depositions | Department of History, University of Sheffield |
| Penn-Helsinki Parsed Corpus of Middle English | University of Pennsylvania, Pittsburgh, Pennsylvania |
| Santa Barbara Corpus of Spoken American English | University of Santa Barbara, California |
| Zurich English Newspaper Corpus | Udo Fries and associates, Department of English, Zurich University |