It is common with software nowadays for the programers to think in advance about what questions users might have. In this vein I have put together a list of probable questions and the answers to them, thus hopefully anticipating the kinds of queries which users approaching this software are liable to have.
Corpus processing
Corpus compilation
Using databases
Miscellaneous
Q: How is this software package organised?
A: It comes as a set of programs, the main one of which is called Corpus Presenter and which gives the name to the entire suite. You install it directly from a CD-ROM to the hard disk of your computer. At the end of the installation procedure you have a folder with shortcuts on the Windows desktop. All programs are accessible via this folder. The supplied test corpus can be viewed with Corpus Presenter straight away as can A Corpus of Irish English (also on the CD-ROM). You can use the Program Launcher to start any of the programs.
Corpus processing
Q: What is the quickest way to start?
A: The best way to start is by trying out the supplied test corpus. This contains a variety of data types: texts, databases, images and sound files. Select the data set file TEST_CP.CPD when prompted to load a file at the beginning. If the initial help text is displayed when you first load Corpus Presenter then you can click on the button Load supplied corpus. The data set file TEST_CP.CPD contains references to all the supplied data files which are then displayed in a structured tree form. By clicking on the node of the tree you can view the file which is associated with this node.
Q: Can I use my own files directly with Corpus Presenter?
A: Yes. When the dialogue window Open a data set appears at the beginning or when you choose to work with a new corpus from within Corpus Presenter (press Ctrl-O for this) you click on the button Load text file(s) directly. Then all that is required is that you select the files you wish to use and press Shift-F12. The files are then displayed as a simple tree with just one level on which the names of each file are to be found. You can also press Ctrl-D from the main level of Corpus Presenter to directly load files. Any ASCII or RTF files can be loaded as can HTML (Internet) files. You can also use XML files. Files can be converted before being used for searches, if you wish, click on the button Convert files on the directory listing level.
Q: Can I make a corpus with Corpus Presenter?
A: Yes. There are basically three ways to do this: (i) generate a corpus from a selection of files on the directory listing level, (ii) make a corpus from a branch of the hard disk, again on the directory listing level (both these options are reached by clicking on the button Make data set). (iii) Design a corpus using the supplied utility Corpus Presenter Create. If you choose the latter method, then it is sensible to make a copy of the supplied test corpus TEST_CP.CPD and then alter this to suit your needs.
Q: How can I view a corpus?
A: The default display mode uses a tree which may contain nodes on several embedded levels. Each node on the tree has a label and a file which is associated with it. Clicking on the label leads to the associated file being displayed. A second mode is also available. This is the list mode in which all files are listed in the order in which they occur in the tree. The advantage of this mode is that you can select any file by simply clicking on the label. You can then demand that a retrieval operation apply to the group of checked files in the list. If you wish you can also derive a sub-corpus from the checked files. Should you keep to the tree display then retrieval can apply to all files of a corpus, just those in a branch of the tree, to the current file and all others from there to the end of the tree or simply to the currently selected file.
Q: How do I search for strings with Corpus Presenter?
A: The first thing is to load a corpus, say the test corpus referred to above, or any set of files which you select from a directory listing. Then you can either choose the option Search, Quick search (Alt-L) or the option Search, Advanced search Level (Ctrl-L) or click on the tool button with either the magnifying glass or the binoculars. With the first option a window opens and you can type anything (string, word or phrase) which you wish to look for. The returns can be stored in a list and then copied from there to any of a number of destinations. In the second case you shift to the advanced search level. You see the current text and on the top of the screen various options pertaining to retrieval operations are available. The most important one to begin with is that labelled Parameters which opens a large window in which you can specify the various parameters for a search such as the strings to be located, the range of texts, the nature and expected distribution of strings in a text, etc.
There are a large number of options available on the retrieval level, e.g. you can specify exactly how the retrieval information is arranged which is returned by Corpus Presenter. It is important to take time to explore the options put at your disposal here in order to grasp the real potential of the program.
Note that there is a Simple Search function (in the Search menu, shortcut: Ctrl-F) which allows you to find strings swiftly without moving to the retrieval level. You can also search through databases, assuming that there is at least one in the corpus you are currently processing.
Q: Can I use wild cards during searches?
A: Yes. When generating a word list or when locating strings, the wildcards ? (question mark, stands for one character) and * (asterisk, stands for more than one character) are legal. For instance, you could search for do* which would return do, does, don´t, doing, done in a typical modern English text. An entry like he?d would probably return head and heed, again in a modern English text, whereas he*d could also return heaved, heard because the asterisk can stand for more than one character.
Q: Can I search for collocations in a corpus?
A: Yes. On either the Quick search or the Advanced search level you can choose to rearrange returns from a search in such a way that up to six words before and after the search string are arranged in a grid which can be sorted on any field by just clicking on its column. In addition the number of times a certain word occurs before or after a search string is shown and percentages are given. In both search modules there is a command Determine collocation which will initiate this process.
Q: Can I do complex searches with more than one string?
A: Yes. The advanced search level provides the most sophisticated options in this respect. It allows you to search for syntactic frames, i.e. String1 followed by String2 with a specifiable amount of material in between. Furthermore, you can say whether String1 or String2 are entire words, the beginning or end of a word or contained anywhere in a word.
Q: Can I search through only part of a corpus?
A: Yes. All retrieval functions allow you to specify whether the search is to apply to 1) all files in the corpus, 2) only the current branch, 3) from the current file to the end of the corpus, 4) only the current file and 5) checked files. The last option is the most flexible as it allows you to mark files in a corpus (irrespective of their position in the tree) and only carries out the search on these checked files.
Q: Can I access Cocoa header parameters when searching through a text?
A: Yes. For instance, if you are using the Helsinki corpus then on the advanced search level you can specify certain values for certain Cocoa header parameters which must apply for texts to be included in a search. You might wish to search through only those texts which are prose translations or verse by female writers. In such cases you would demand that the appropriate values for the relevant Cocoa parameters be matched in a text before it is searched.
Q: How do I deal with spelling variants in a corpus?
A: The answer to this problem is quite simple. When on the basic or advanced search level you specify that the search is to use an input list and not a single string. An input list can consist of any strings or words on the lines of a text. The search is carried out by examining the text for each of the items in the input file. This file need not contain just spelling variants, it can be used for any number of items which you wish to treat as a group, say a set of pronouns which you are interested in.
Q: Can I specify how returns are to be arranged?
A: Yes. A number of arrangements are possible, particularly on the advanced search level. These range from a plain text to a multi-line grid and includes the option of storing the information in database form. The multi-line grid can be saved to disk in re-loadable format so that you can view the work from a previous work session at a later point in time. The amount of text which is to flank the retrieval returns can also be specified, you can also say that the entire sentence in which a return is embedded is to be delivered.
Q: Can I view texts with retrieval returns?
A: Yes. All retrieval functions have an option Goto text which will cause Corpus Presenter to jump to the position in the text where the current text return was found. By these means you can check up on the context from which a return is derived.
Q: Can I generate a reverse dictionary from input texts?
A: Yes. The text statistics window includes a list in which the unique words of a text can be deposited in reverse order. You can decide how this list is to be stored (as a whole or in part), to disk or the Windows clipboard.
Q: What can I do with output lists from retrieval tasks?
A: Frequently the output from a particular task within Corpus Presenter is a list which can be copied to the Windows clipboard or saved directly to disk. Such a list can further be processed with the program List Processor. This will allow you to sort lists, create a unique list (i.e. a list of types from a list of tokens), combine two input lists to a single output one, etc. An output list, either directly from Corpus Presenter or filtered through List Processor can be imported into a database. This would be useful when doing lexical work with corpus files as a database is a kind of dictionary.
Q: Is lexical cluster analysis possible with the Corpus Presenter suite?
A: Yes. The program Corpus Presenter Text Tool will allows you to do this. The principle is quite simple. You load a text or texts and then specify the number of words per cluster (from 1 to 8). The program then combs through the texts and gathers every sequence of clusters and orders them alphabetically or by frequency. This procedure can be useful when trying to determine a writer’s style as typical combinations of words become obvious in the analysis.
Q: Can I generate a concordance with the Corpus Presenter suite?
A: Yes. The main program Corpus Presenter allows you to arrange and export returns from a search formatted as a concordance. With both the Quick search and the Advanced search modules you can choose the export option Make HTML file and the returns are arranged accordingly and written to a file of your choice. Similar output options are available with the Word List module.
Corpus compilation
Q: How do I collect and prepare texts?
A: The best way is to use the supplied text editor. This comes in two forms. The first, Corpus Presenter Text Editor, is a powerful editor with a whole range of useful functions including many shortcuts which saves you from entering repetitive text. The program can handle plain ASCII texts, i.e. those without formatting, and rich text format files in which attributes like bold or italic are retained. The second program is called Corpus Presenter Text Tool and is similar to the first one. It has a different set of functions, some of which are useful when preparing corpus texts. It allows you to tag texts, manually or automatically (see below), and provides many analytical tools for extracting information from texts. Both programs can handle large files easily and so are useful when compiling comprehensive text corpora.
Q: Can I convert files from one format to another?
A: Yes. On the directory lister level of Corpus Presenter you can convert plain ASCII files to RTF (rich text format) files, HTML (Internet) files or to Microsoft Word files with a single command for as many files as you have selected in a particular folder. The conversion works both ways. With Microsoft Word and RTF files you must ensure that they do not contain any graphics, footnotes or headers/footers, otherwise these are lost and the conversion may fail. The conversion to Word files does not present any difficulties.
Q: Can I globally change the attributes for files?
A: Yes. This can be done with the Corpus Presenter File Manager. There are many situations in which this might be necessary, one would be where you copy files from a CD-ROM onto hard disk. The copies may well still be read-only and this attribute needs to be altered before you can process the files in question. In the file manager the right-most column in a file listing shows whether the files are read-only or not.
Q: Does Corpus Presenter provide macro functions to cut down on repetitive tasks?
A: Yes. The two text editors, the word processor and the database editor all have an option group Macro in the menu system which offers you a number of functions which will help you avoid unnecessary typing of text which is required repeatedly. There is a text macro function which allows you to have up to 256 pre-defined strings at your disposal. Then there is the Alt-macro list which will associate user-specified strings to the key combinations Alt-0 through Alt-9. There is a further text array option and a small strings function along with an array of 4 text buffers at your disposal. All these options can be exploited gainfully to cut down on the typing of text. Try them out and see what they do.
Q: Can I tag texts with Corpus Presenter?
A: Yes. There is a special function for this in the Corpus Presenter Text Tool. Choose the option Tools, then Tagging. A window appears and you enter the information necessary for tagging. This can be done automatically or manually, can involve words or strings, be case-sensitive and avail of any user-specified list of tags and input forms to be tagged. If you choose to tag a text manually, then you may also edit the context of a tag on-line and store it back to its original position.
Q: If I have several texts, can I link them into a single one?
A: Yes. The easiest way to do this is to load the files, one after the other in the order in which you want them with the Corpus Presenter Text Editor or Corpus Presenter Text Tool (Key: Alt-Z for Insert, File). You then save the new composite file to disk - under a different name from any of the individual files - and use this with Corpus Presenter, for instance by loading the file directly (Key: Ctrl-D).
Q: Can I make my own data set file with Corpus Presenter?
A: Yes. There is a supplied program Corpus Presenter Make Tree (upgraded for Version 12) with which you can either create a new data set file or edit an existing one. A data set file contains the information necessary to display the contents of a corpus in tree form within Corpus Presenter. Try altering the supplied file TEST_CP.CPD which controls all the test data files packaged with Corpus Presenter.
Q: Can I normalise texts with Corpus Presenter?
A: Yes. The program Corpus Presenter Text Tool allows you to normalise any set of texts quickly and easily. You specify the set of variants which are to be replaced by a single form and repeat the process for as many replacements as you require. This information is stored on disk and can be retrieved later. Two texts, say an original and a normalised one, can be collated to a single text if you wish. You can also carry out lexical clustering analysis with the Text Tool, something which you might want to experiment with to see what recurrent word patterns are to be found in a set of texts, for instance when studying the style of an author.
Q: Can I collate texts with Corpus Presenter?
A: Yes. Again Corpus Presenter Text Tool provides a collation function with which you can combine two texts on a line by line basis, thus checking on differences between two versions of an original, for example. Collation can be useful when combining a normalised version with an unaltered version, for instance when processing historical texts with much spelling variation.
Q: Can I carry out global find and replace operations?
A: Yes. The program Corpus Presenter Find Text will allow you to do this. Choose the option Global find and replace in the Edit menu of this program. For this to work you must create a text file to start with. On each line of this file you enter the string to be found, then a single tab character and then the string to use as replacement. You can have as many lines in such a file as you wish. This is a very fast way of carrying out text substitutions and can be done automatically which obviates the necessity of loading each file and specifying find and replace strings manually.
Q: Can I use keywords in a corpus and then collect them?
A: Yes. Corpus Presenter Text Tool includes a function which will collect any strings which are delimited by specifiable character. Say you insert keywords (or comments or text markers of any kind) delimited by < and > then the program can collect these and deposit them in a list from which you can copy them to the Windows clipboard or store them directly to a disk.
Q: Can I check corpus texts for the integrity of their coding?
A: Yes. Corpus Presenter Text Tool again has a function which will examine any text and check whether embedded codes, such as comment markers, are opened and closed correctly. It will also check on whether a text contains only a user-specifiable set of legal characters, e.g. the lower ASCII area and a set of special characters for Old and Middle English, for example.
Q: How can I keep track of alterations in corpus texts?
A. There are three basic ways of doing this. You can use the Track changes function (present in all the text editors). This will display textual additions as underlined and with a specific colour, say blue. Deleted text is shown as strikethrough and again in a specific colour, say red. You can also mark corrections / additions manually using the Red marking function in the Format option group of the text editors. Thirdly, you can mark stretches of text as Protected which means that they cannot be changed until released again. The latter function can be useful when excluding certain parts of a text not only from alteration by a later user but also from certainly functions like ‘find and replace’. Note that to avail of these options, texts must be encoded in rich text format and be processed by either Corpus Presenter Text Editor or Corpus Presenter Text Tool (or for that matter by Microsoft Word).
Q: How can I view the structure of corpus texts?
A: When preparing texts you can enter any symbol you like which is to serve as a text marker followed by a number which represents the level in a tree hierarchy which this marker is to have (from 1 to 6). The Outline and the Table of Contents functions will collect these markers and display them in an Explorer-type tree. You can click on a tree node to jump to the text marker in question. You can also save the tree to disk or store it in the Windows clipboard.
Q: Can I arrange a corpus for the internet?
A: Yes. There is a special internet file editor, Corpus Presenter Internet Editor, which contains a whole range of powerful options for quick and easy design of webpages. There are many ready-to-use functions (in HTML and Java) available in a code database which you can access from within the program and insert into your web page. With Corpus Presenter Internet Editor webpages can be tested without uploading them to a server; this is done as a final step before disseminating information via the internet.
Using databases
Q: How can I process a database?
A: The program dedicated to the processing of databases in the Corpus Presenter suite is called Corpus Presenter Quick Database Editor and contains a full interface to texts, allowing free text to be associated with a database. The files processed are dBASE databases which can also be loaded with Microsoft Access or Excel if you wish. The program is accessible through an icon in the desktop folder created during the installation procedure for Corpus Presenter.
You can obtain databases in a number of ways. You can create one yourself with Make Database (a supplied utility of the Corpus Presenter suite). Or you could store data from Microsoft Access or Excel in a dBASE database format. A third source could be the output from a word list or after a retrieval operation within Corpus Presenter.
Q: What can I do with a database?
A: Databases are flexible structures which allow you to select, filter, copy, delete, move information contained in the fields of records using specific criteria. One typical use of databases would be to hold lexical data. Say you have generated a word file from a corpus text (with the option Search, then Make a Word List) you might like to maintain this in the form of a database for more flexible data manipulation afterwards. In this way you could maintain a dictionary from a corpus, either from the entire set or a selection of texts.
Another common use of databases is to store bibliographical information. There are a few such databases supplied with Corpus Presenter (as part of the test corpus). You can alter their structure with the Make Database utility. Furthermore, you can determine exactly how data from a database is to be exported to text by constructing a report form to your own specifications with the supplied program Report Database.
Q: Can I use texts with a database?
A: Yes. The database editor Corpus Presenter Quick Database Editor allows you to associate a rich text format file with any database. Via a field called ABSOLUTE_NUMBER you can access numbered sections of this text file and so add annotations to the records of any database. The annotations can be viewed and edited during processing and can be exported with the records they are linked to when the latter are exported to a text file or to an extract database.
Generating charts
Q: Can I generate charts from the statistics for retrieval returns?
A: Yes. From Version 10 of Corpus Presenter upwards you can store any returns, from different levels of the program, to a database in which the statistics for returns are deposited. Assume, for instance, that you search for a certain structure across a set of 100 texts and that a percentage of these show the structure, but to varying extents. The database which Corpus Presenter stores to disk contains the names of those texts in which the structure was found and the number of times it was found in each text. Such a database can then be loaded into Microsoft Excel and via the command Insert, Chart (after selecting all the rows and columns of the database with Ctrl-A) you can generate a chart in which the occurrences of the structure you searched for are shown in chart form.
Corpus Presenter can furthermore gather up to six sets of returns for a group of texts and transfer these to a database (for later chart generation). The advantage of this is you can see whether the occurrences of more than one structure run parallel across a set of texts, either ascending or descending. You can also check whether two or more structures run in opposite directions, e.g. whether the increase of one structure correlates with the decrease of another.
More on chart generation in Version 10 (and later) of Corpus Presenter
Miscellaneous
Q: How can I interface with Windows from within the Corpus Presenter suite?
A: Within nearly all programs you can open a Windows Explorer window by choosing the relevant option in the Miscellaneous menu or just pressing Ctrl-Shift-F9. When you exit the explorer window you are automatically returned to the program from which you started. You can also shell down to DOS by pressing Ctrl-F1 or choosing the relevant option, again from the Miscellaneous menu.
Q: How can I interface with the internet from within the Corpus Presenter suite?
A: All the major programs have an option group called Internet in the menu system which will allow you to browse on the internet using Corpus Presenter Browser. You can also load your default email program from this program group.
The major programs also maintain internal lists of email and web addresses which you can access without exiting the program in question.
Q: Can I manage various files easily with the Corpus Presenter suite?
A: Yes. The program Corpus Presenter Catalogue Manager will help you with this task. It allows you to arrange files into groups and then have them displayed in a list from which you can load them with a program of your choice. By these means you can keep track of diverse files in different folders on your hard disk without the risk of mislaying them.
Q: What happens if I have lost a file on my hard disk and don’t remember the name?
A: You simply load the program Corpus Presenter Find Text and specify some piece of text, a string or word, which you know occurs in the text and let the program do the searching for you. It can operate in different ways, scanning entire drives, only a certain branch, using exclusion lists if required to make sure that it does not examine certain file types such as programs, etc. The moment it finds the string your entered, the file containing it is loaded with an internal viewer and you can decide if this is what you are looking for; you continue until the right find is made.
Bear in mind that you can also look for files by file name (or part of a name using a wild card) by clicking on the binoculars icon when on the directory listing level of all the major programs of the current suite.
Q: Can I use the Corpus Presenter suite to make backups of my files?
A: Yes. There are two ways of doing this. The first is to load the program Corpus Presenter File Manager and then choose the option Backup disk. This will initiate a dialogue in which you specify a date filter (if required), the drive and directory from which the backup is to start and enter a possible exclusion list, if required. This option is especially useful when making backups of corpus texts to another hard disk or to a large removable disk like a Zip diskette. The file manager has many options which are useful in the area of file management and security so you are strongly advised to try it out and see what it can do.