BERGEN Versions of the BROWN Corpus. Produced by: Norwegian Computing Centre for the Humanities, Allegt. 27, N-5007 Bergen, Norway The original corpus is transformed to upper and lower case letters and a minimum of special codes. There are two versions of the corpus. Text format I: Typographical information is preserved; the same line division is used as in the original version exept that words at the end of lines are never divided. Text format II: Typographical information is removed, the line division is new. Each line has a reference consisting of three items. The first two give the Brown Corpus line identification of the first word on the line. The last item, is the word number of the first word on the line in the original Brown corpus line. Changes from the original Brown Corpus codes: Original text format I text format II **A ' (apostrophe) ' **B @ @ **C : : **D ^ (ascii dec 94) ^ **F **f **f **H **h **h **I ? ? **J ~ (ascii 126) ~ **K % % **N # new line + # **P # # + new line **Q " " **R _ (underscore) new line + _ **R**T 3 spaces new line + 3 spaces **S ; ; **T _ (underscore) _ + new line **U " " **X ! ! *( [ (ascii 91) [ *) ] (ascii 93) ] *= < ignored *$ > ignored *+0 ` (ascii 96) ` **- - (hyphen) - **= { (ascii 123) ignored **$ } (ascii 125) ignored **. & & **Z \ (ascii 192) \ **Y | (ascii 124) |