BERGEN Versions of the BROWN Corpus.

Produced by- Norwegian Computing Centre for the Humanities, Allegt. 27, N-5007 Bergen, Norway

The original corpus is transfonned to upper and lower case letters and a minimum of special codes.

There are two, versions of the corpus.

Text format I: Typographical information is preserved; the same line division is used as in the original version exept that words at the end of line are never divided.

Text format H: Typographical information is removed, the line division is new. Each line has a reference consisting of three iterris. ne first two give the Brown Corpus line idendfication of the first word on the line. The last item, is the word number of the fust word on the line on the original Brown corpus line.

Changes from the original Brown Corpus codes:

Original text format I text format II

**A ' (apostrophe) '

**B @ @

**C : :

**D ^ (ascii dec 94) ^

**F **f **f

**H **h **h

**I ? ?

**J ~ (ascii 126) ~

**K % %

**N # new line + #

**P # # + new line

**Q " "

**R _ (underscore) new line + _

**R**T 3 spaces new line + 3 spaces

**S ; ;

**T _ (underscore) _ + new line

**U " "

**X ! !

*( [ (ascii 91) [

*) ] (ascii 93) ]

*= < ignored

*$ > ignored

*+0 ` (ascii 96) `

**- - (hyphen) -

**= { (ascii 123) ignored

**$ } (ascii 125) ignored

**. & &

**Z \ (ascii 192) \

**Y | (ascii 124) |