The Corpus Presenter suite (Version 2022) consists of programs which are dedicated to various related functions. They can interact with each other in several ways, e.g. by using the same data stored to disk or clipboard. An example of this is the supplied utility Corpus Presenter Make Tree which facilitates the linking of one´s own corpus with Corpus Presenter by constructing the data set file needed to control the display and manipulation of a corpus internally in the latter program. There follows a list of the items of the suite.
1) Corpus Presenter (main program)
2) Corpus Presenter Make Tree
3) Corpus Presenter Make Database
4) Corpus Presenter Report Database
5) Corpus Presenter List Processor
6) Corpus Presenter File Manager
7) Corpus Presenter Find Files
8) Corpus Presenter Find Text
9) Corpus Presenter Text Tool
10) Corpus Presenter Word Processor
11) Corpus Presenter Internet File Editor
12) Corpus Presenter Database Editor
13) Corpus Presenter Table Editor
1) Corpus Presenter
The main program of the current suite is called Corpus Presenter. With it one can carry out all the processing tasks with a corpus of one´s own or one to which one has access. If one does not have a corpus one can still load a text directly and carry out retrieval operations. To create the file necessary to process a corpus with Corpus Presenter one uses the program Corpus Presenter Make Tree (see the next program description).
Within the main program the structure of a corpus is visible from the tree on the left-hand side of the screen. By moving in this tree one can view the various files which are associated with the nodes of the tree (each node contains a descriptive reference to a particular file). For a corpus consisting of text files, these texts are displayed in a window on the right-hand side of the screen.
An essential feature of Corpus Presenter is its ability to cope with files of different medium types. It can present text files, images (maps, pictures, etc.), databases (e.g. bibliographies) and sound files (e.g. language samples). These additional types have not perhaps been envisaged by linguists so far, but certainly the option of including images — say facsimile pictures — into a corpus might be appealing in future. Equally for contemporary corpora, the option of including sound files would be enriching in a respect which is central to language studies. For instance, one could imagine offering a version of the London-Lund corpus with the sound files from which the printed transcriptions were derived. The test corpus shipped with Corpus Presenter has a number of sound files to illustrate how this option works.
The program recognizes multi-media files automatically and presents them appropriately. Image files are normally in the Windows Bitmap (.BMP) or the JPEG Image File (.jpg) formats (though other common formats such GIF, TIF, WMF or PCX are also accepted). Databases should be in dBASE (.DBF) format and audio files in the Windows Wave (.WAV) format (technical note: these can be compressed into the MP3 format to save disk space and still be accepted by Corpus Presenter).
For text files, two special types are automatically recognized: RTF and HTM(L) files. A HTM(L) file is in the Hypertext Markup Language format and can be read and edited by most advanced word processors and by internet software. An RTF file is in the Rich Text Format and can equally be read without difficulty by the majority of commercially available word processors. In addition, a corpus may contain plain text files. Indeed, this is frequently the default case: very often no formatting specific to any word processor is included in a corpus to ensure that the texts can be read on any computer system. Corpus Presenter can of course handle plain texts equally well. Such texts can, if necessary, be edited using the supplied text editor Corpus Presenter Edit which can process ASCII and RTF files, the supplied word processor Corpus Presenter Word Processor can additionally deal with HTM(L) files. Databases can be edited by several programs, including two dedicated database managers (see program descriptions below).
Apart from presentation, the main operation which users will probably be interested in is searching texts. There are particularly flexible search algorithms built into Corpus Presenter. Please consult other sections of this website.
2) Corpus Presenter Make Tree
In order for Corpus Presenter to process any set of texts it must have access to a small file called a data set file (with the extension .CPD). This contains a list of the files of a corpus, labels for the nodes in a tree with which the texts are associated and information on the level in a tree structure at which a label is to appear. In addition, a data set file contains information pertaining to the general appearance of the corpus when it is displayed on screen. Such a data set file can be designed interactively with the current utility.
There is another way to make a tree for a corpus. If the files of your corpus are distributed in folders contained in a single branch of your hard disk (this is very often the case), then you can have Corpus Presenter make a tree with the same structure as the branch of your disk. In the resulting tree, the files associated with the nodes at the end of each branch are those which are contained in the folders of the branch of your hard disk. The option you need to activate for this is Make data set on the directory listing level (that from which you load any files within Corpus Presenter). In the window which then opens choose Make data set from branch of tree (hierarchical). You can also choose to make a data set file from any selected files in one directory (option: Make data set from branch of tree (flat). By flat is meant here that the display of files with Corpus Presenter does not show a hierarchical tree structure.
Notes
(1) With any set of files, loosely put together or organised into a corpus, you can more than one tree to display them in different ways. For instance, you might have a group of 100 files and you might to view them ordered by date (for instance, if they were historical texts stretching back through time). But on another occasion you might also wish to have them displayed by author, by type of text, by their contents, etc. For each of these aspects you could design a tree with Corpus Presenter Make Tree . In fact you can just edit an initial data set file (which determines the tree which is displayed). Remember that with different trees you also have different search possibilities. To see what this means in a particular instance, see the supplied Corpus of Irish English for which a general data set file is supplied, CIE.cpd, and also additional data set files which display the files of the corpus in different ways. None of these different arrangement affects the text of your corpus: the files themselves are not altered in any way.
(2) Corpus Presenter Make Tree, which is a new program supplied as of Version 10, replaces the older utility Corpus Presenter Create Data Set. If you have been using the latter, then you can switch to the new program without difficulty: the files it loads and saves to disk have the same structure as with the earlier program. This means that you can use Corpus Presenter Make Tree to process existing trees.
CP Make Tree [screen shots and further information]
3) Corpus Presenter Make Database
To collect data with a database manager one must create a database or use an existing one. Even in the latter case one may well find that one´s conception of how data should be arranged alters with time and so the need arises to change the structure of a database or just create a new one. In either case, the current utility will help to fulfil this task swiftly in an interactive, user-friendly environment.
4) Corpus Presenter Report Database
When one wishes to export data from a database it is necessary to specify how this is to be arranged in the output file generated. A small file called a report form determines how data from fields is arranged in the output text. One can have different report forms for one and the same database which greatly increases flexibility. For instance when outputting bibliographical data one could use different report forms corresponding to different style sheets which would obviate the necessity of hardwiring style-sheet preferences into the structure of the database. With the present utility one can design report forms interactively.
5) Corpus Presenter List Processor
This is a small program which is called from either the text editor or the word processor and which enables you to sort or merge any lists. It can also create unique output lists from a mixed input and generate a delimited text file for further processing by a database manager.
6) Corpus Presenter File Manager
A file manager is necessary for all the house-keeping tasks which one has to carry out on a computer. This utility has many special features such as incremental backup which is useful when dealing with large amounts of text, such as in a corpus, which may be variously modified during work sessions and hence in need of backup to a permanent separate medium such as high-capacity disks.
7) Corpus Presenter Find Files
With Corpus Presenter Find Files you can search for any files on your computer. All you need do is enter a part of the name of the file(s) you are looking for. The program returns all files in all directories which contain the string you entered in their names. The returns are displayed in a grid from which you can view, edit, copy, move, delete any or all of the returned files.
8) Corpus Presenter Find Text
Normally when compiling a corpus one is dealing with several texts and it may often be necessary to search for strings across the entire group or even through a complete drive. The present program will perform this task. A range of options make it a flexible tool for text retrieval.
9) Corpus Presenter Text Tool
There are many functions which users might wish to perform on texts outside of the main program Corpus Presenter. For instance, one might want to normalise a set of historical texts before carrying out retrieval tasks. Or one might be interested in lexical clustering analysis, i.e. in determining what recurring word patterns are to be found in a set of texts. These and other functions are possible with the present program. Note that texts are tagged with this editor (see section 3 Tagging texts below).
10) Corpus Presenter Word Processor
The aim of a word processor is to allow the processing of formatted output, e.g. when preparing a text for printing. Hence the options it contains differ somewhat from a text editor. The supplied word processor has many formatting options concerning the appearance of a document which go beyond those of the text editor. The trade-off is a slight reduction in the speed in text processing.
11) Corpus Presenter Internet File Editor
The current program enables you to create you own homepages in an interactive environment. You can edit HTML files directly as normal text files, preview the output with an internet browser and edit text in a Wysiwyg environment as well. There are a whole range of options - including flexible text macros and a database of code - to make homepage generation easy.
Internet File Editor [screen shot]
12) Corpus Presenter Database Editor
For speedy processing of databases the present utility is useful. It is intended for quick and efficient editing of databases which might be used to store the lexical data of a corpus. The program contains many a flexible copying facility and a large number of text macro options which saves on keying in repetitive text. It also interfaces directly with texts via an internal Rich Text Format editor.
13) Corpus Presenter Table Editor
Tables are structures where data is presented in the form of rows and columns. This is the primary form in which to save retrieval returns within Corpus Presenter on the Advanced Search level. Such finds can be loaded from disk into the present program and further processed. One can create new tables, copy data to and fro and interface with databases if one wishes as well as exporting table data to an RTF file for further editing with a word processor.