Project Runeberg,
Another focused proofreader is Steen.Roennow, who has been working on http://runeberg.org/dbl/ or "Dansk biografisk Lexikon", first with indexing and now with proofreading. This work in 19 volumes is not yet completely proofread, but the most frequent changes so far are:
4320 {+</b>+} 2987 {+<i>+} 2957 {+</i> <b>+} 2371 [---] {+--+} 1555 [-f-] {+d.+} 1316 {+<b>+} 1308 [-i-] {+1+} 984 [-(f-] {+(d.+} 875 {+</i>+}
The single dash "-" that becomes a double "--" and the frequent use of italics are recognized from "akrell". DBL has a special pattern, where each article is started with a person's name in boldface, and ends with the article author's name in italics, which explains the frequent insertion of </i> just before <b>.
The OCRed "f" that is changed to "d." is the cross or dagger indicating the death year or date of a person, which we transcribe into "d." for "dead", since "f." is used for the birth date (født).
488 [---] {+<b> <tab> -- <tab>+} 421 [-f-] {+</b> d.+} 285 [-Kjøben- havn-] {+Kjøbenhavn+} 282 [-n-] {+11+} 167 [-Kjø- benhavn-] {+Kjøbenhavn+} 125 [-Chri- stian-] {+Christian+} 120 [-°g-] {+og+} 108 [-Dan- mark-] {+Danmark+} 102 [-ud- nævntes-] {+udnævntes+} 99 [-Virk- somhed-] {+Virksomhed+} 87 [-å-] {+à+}
Here we can see the effects of keeping hyphenated words in the OCR text. A lot of proofreading effort must be invested in joining those hyphenated words, which is unfortunate. In the more recently scanned works, the OCR software joins the hyphenated words without losing track of line breaks.
As with akrell, there are just a few cases that occur very many times. The above 20 different change patterns span over a range of occurances between 4320 and 87. The rest of the list consists of a large number of different cases that each occur only a few (tens of) times.
83 [-saa- ledes-] {+saaledes+} 83 [-For- hold-] {+Forhold+} 82 [-for- skjellige-] {+forskjellige+} 80 [-ble ven-] {+bleven+} 80 [-Med- lem-] {+Medlem+} 78 [-Sogne- præst-] {+Sognepræst+} 76 {+<tab>+} 76 [-til- bage-] {+tilbage+} 76 [-Oehlenschlager-] {+Oehlenschläger+} 74 [-Frede- rik-] {+Frederik+} 73 [-Gluckstadt-] {+Glückstadt+} 70 [-ii-] {+11+} 67 [-Muller-] {+Müller+} 63 [-Kjøben- havns-] {+Kjøbenhavns+} 60 [-der- efter-] {+derefter+} 58 [-Pro- fessor-] {+Professor+} 58 [-Oehlenschlagers-] {+Oehlenschlägers+} 57 [-oven- nævnte-] {+ovennævnte+} 55 [-alle- rede-] {+allerede+} 50 [-ff-] {+<i> H+} 50 [-For- bindelse-] {+Forbindelse+}
Special for DBL are the many corrections that are printed in the back of the volumes, that have been introduced into the text by the proofreader, where they seem to appear out of nowhere, e.g.:
1 {+G. døde 26. Maj 1895.+} 1 {+G. døde 24. Juni 1902.+} 1 {+G. døde 23. Juni 1895. <i>+} 1 {+G. døde 22. Okt. 1891. <i>+} 1 {+G. døde 21. Marts 1903. <i>+} 1 {+G. døde 18. Juni 1894. <i>+} 1 {+G. døde 15. Okt. 1902. <i>+} 1 {+G. døde 14. Sept. 1899. <i>+} 1 {+G. døde 13. Sept. 1893.+} 1 {+G. døde 11. Juli 1895. <i>+}