Data sources

Corpora and data sources

A number of corpora are available for the study of Irish English. These vary in range, size and stage of completeness. The following list is intended to convey an impression of the current situation. Only those corpora which are in the public domain or to which the academic community has some kind of access are mentioned here. Many authors who work on Irish English have collections of data which they have used and still use for their linguistic analyses. The nature of the data is normally discussed in their studies but access is not provided through printed material or online sources.

For general informationi (with links) on various corpora available for varieties of English, go to this website: www.english-corpora.org

A Corpus of Irish English Correspondence (CORIECOR)

This corpus, compiled by Carolina P. Amador Moreno (University of Extremadura, Cáceres, Spain) and Kevin McCafferty (University of Bergen, Norway, now retired), consists of emigrant letters from Ireland over the past two to three centuries. The data is unique in providing a window on vernacular Irish English usage (mostly of a Northern Irish origin) and has already been used as a basis for a number of PhD theses at the University of Bergen, Norway. For more details, see the following:

McCafferty, Kevin and Carolina P. Amador Moreno 2012. ‘A Corpus of Irish English Correspondence (CORIECOR): A tool for studying the history and evolution of Irish English’, in: Bettina Migge and Máire Ní Chiosáin (eds), New Perspectives in Irish English. Amsterdam: John Benjamins, pp. 265-288.

There is also an online resource offering a visual presentation of the corpus, click here

The Tape-Recorded Survey of Hiberno-English Speech

This survey was initiated in the 1970s and carried out at the Department of English, Queen’s University, Belfast under the direction of Dr. Michael Barry for about a decade. It was discontinued in the early 1980s. A certain amount of material that was publicly available at the time can be found on the DVD accompanying the book A Sound Atlas of Irish English by the present author (see relevant node in the tree on the left).

National Museums Northern Ireland are the current curators of all the existing sound recordings derived from some 539 interviews. For more information on the sound archive, click here

The Northern Ireland Transcribed Corpus of Speech

Under the supervision of Dr. John Kirk, Department of English, Queen’s University, Belfast, this corpus has been compiled during the 1990s. It consists of transcriptions of a section of the tape recordings for the previous corpus. For information, please contact Dr. Kirk at info@johnmkirk.co.uk. The corpus has been used in a number of investigations such as that by Simone Zwickl (see Zwickl 2002 in the references section)

For further information on this corpus, see:

Kirk, John M. 1992. ‘The Northern Ireland Transcribed Corpus of Speech’, in Leitner, Gerhard (ed.) New Directions in English Language Corpora. Berlin: Mouton de Gruyter, pp. 65-73.

ICE - Ireland

As part of the International Corpus of English project, this corpus has been compiled over a period of more than 10 years. It is a collection of texts which represent fairly standard forms of written Irish English. It has been used for studies by Jeffrey Kallen (Trinity College Dublin) and John Kirk in recent years. The material is not yet available to the general public, though this is the intention, as set out in the outlines for the entire ICE project. For further information, please contact either of the authors just mentioned.

There is a general website for the entire project at the University of Zurich: International Corpus of English

Studies arising from ICE-Ireland:

Kirk, John M., Jeffrey L. Kallen, Orla Lowry and A. Rooney 2004. ‘Issues arising from the compilation of ICE-Ireland’, in Belfast Papers in Language and Linguistics 16: 23-41.

Kirk, John M., Jeffrey L. Kallen, Orla Lowry and A. Rooney in press. ‘The compilation of ICE-Ireland: unity versus diversity’, in Antoinette Renouf and A. Kehoe (eds) The changing face of corpus linguistics (Amsterdam: Rodopi).

Limerick Corpus of Irish English

This is a corpus, compiled by a team at the University of Limerick, which has been used for a number of linguistic investigations. The corpus is synchronically oriented and one of its primary aims is to document pragmatic features of present-day Irish English. More information on this can now be found on Sketch Engine https://www.sketchengine.eu/limerick-corpus-of-irish-english/. Some results of data analyses have been published in the following volume:

Barron, Anne and Klaus Schneider (eds) 2005. The pragmatics of Irish English. Berlin: Mouton de Gruyter.

CELT Corpus of Electronic Texts

Based at Cork (National University of Ireland, Cork), this corpus consists in the main of historical texts in the Irish language with some in English as well. The amount of English material is slight and does not appear to have been gathered with a view to linguistic analysis.

Website: https://celt.ucc.ie/

Online sources for other varieties of English

Website: The Newcastle Electronic Corpus of Tyneside English

Website: American Dialect Society

Website: Linguistic Atlas Projects at the University of Georgia

Website: Information on language in Newfoundland

Website: Dictionary of Newfoundland English

Website: The Origins of New Zealand English project

Other corpus projects of relevance to English studies

Website: The British National Corpus

Website: ICAME (International Computer Archive of Modern English)

Website: Research Unit for Variation, Contacts and Change in English (Helsinki University)