New subject: Verk som OCR inte har körts för

16 Jan 2005


      On 15-Jan-05, Hans Persson unicorn@lysator.liu.se wrote:
...
På URL:http://runeberg.org/upload.pl?mode=ocrlist finns en lista över
de verk vi har som idag saknar OCR-texter. På sidan finns länkar där 
man
kan ladda hem alla bilderna till ett verk och andra länkar man kan
använda för att ladda upp OCR-filer för ett verk. Notera att man 
behöver
bredband för att ladda hem bildfilerna, för det är ganska stora filer
det handlar om. Det har också tillkommit en länk "(download)" i 
sidfoten
för alla sidor inom ett verk, och via den kan man också hitta
motsvarande länkar.
Nu kan vi i redaktionen förhoppningsvis låta er andra sköta en del av
OCR-jobbet, och själva scanna ännu fler nya verk eller skriva nya
funktioner.
One possible problem which this procedure does not address, I think, is 
what in a database is sometimes called the updating anomaly. Suppose 
Jörg Vetenskaper and Frederik Pedant both happen to download the same 
text for OCR conversion, and therefore duplicate the work. It may seem 
unlikely, but if unnecessary duplication of effort can be avoided, it 
would be best to do so.
Is it possible to add a mechanism to the 
http://runeberg.org/upload.pl?mode=ocrlist page that records which 
files have been "checked out" for OCR conversion, so that no one else 
will download the same work unnecessarily?
Another point might be to somehow record the name and email address of 
the person who downloads a ZIP file of images, and then have a method 
to automatically send that person a friendly email periodically, say 
every fortnight, requesting a progress report, until such time as the 
corresponding OCR files are eventually uploaded.
Maybe it's a bit of trouble, but I think that, if I had the appropriate 
OCR software to do this sort of work, I would be reluctant to undertake 
it, knowing that another person was doing the same scan conversion at 
the same time.
Best regards to the directors and to all the wonderful volunteers with 
Project Runeberg.
Erik Bjørn Pedersen
Victoria, BC, Canada

Re: Verk som OCR inte har körts för