The Setup of the Lampeter Corpus
The period covered by the Lampeter Corpus, 1640 to 1740,
marks a crucial period in English history as well as in the elaboration
of English as a multi-purpose language. The texts selected for the corpus
reflect both the standardisation process of English and historical developments
between the outbreak of the Civil War and the beginning of the Industrial
Revolution. In order to meet the needs of linguists and historians alike,
the Lampeter project has made an effort to create a balanced corpus rather
than a randomly chosen archive or collection. A balanced corpus, then,
is characterised by a number of transparent sampling criteria. The ones
that have served as guidelines for the compilation of the Lampeter Corpus
are listed below:
Tracts and pamphlets
published in the century between 1640 and 1740, all of them available in
Library at the University
of Wales, Lampeter.
Division of the century into ten decades.
Each decade consists of six domains, i.e.
Economy & Trade
Two texts per domain within each decade, leading to
120 different texts, amounting to c. 1.1 million words.
Complete texts only, including dedications, prefaces, postscripts, etc.
Texts are of varying length, ranging from c. 3,000 to c.
Each author appears only once, in order to avoid idiosyncratic language
Major literary figures of the time were excluded since their style
of writing can be studied in other, already existing, collections. See
Generally only first editions of the texts; later editions only if changes
were made by the original authors, thus ensuring the authenticity of the
No intermediary modern editions of the texts were used.
Markup according to the guidelines of the Text Encoding Initiative
and use of the Standard Generalized Markup Language (SGML),
in collaboration with Lou
Burnard and the Oxford
TEI text headers for background information on authors (name, age, sex,
place of residence, education, social status, political affiliation), printers/publishers,
place and date of print, print date, publication format, text characteristics,
bibliographical references - providing the framework for historical, sociolinguistic
and stylistic investigations.