A Critique on the British Libraries Newspaper Archive from 1600-1900

In 2004 the British Library was funded to set up a database of newspapers from 1600-1900 with a grant of two million pounds from the Joint Information Systems Committee.[1] It represents one percent or 2.2 million pages of the BL archive and can only be added too in the future, funding permitting. The British Library has used a secondary partner to produce this database, Gale who are part of an American group called Cengage Learning. This group manage over six hundred sites for a variety of different applications such as eBooks and large print media as well as databases of this kind. The layout is fairly standard and perhaps already a little dated when compared to more commercial sites (please see below). A specific article can be called up or the whole page viewed at one time with usual controls to magnify each item. There are also options to mark items and bookmark them, however marked items appear not to be saved from one visit to the next and bookmarks require the recipient to have access to the database as well in order to view them. The basic search can be by date, publication, location or a specific collection. However the advanced search not only has the Boolean operators but also a variable degree of 'fuzzy' search options with low, medium or high (please see below). Image quality is reasonable with options not only to download as a PDF but also to print direct from the site (as opposed to copying an internet image and printing yourself). This could be a limitation as not all the scanned images are of the same standard and also rely on the quality of the original document as well. This leads onto the OCR which can be erratic at times, certainly the higher fuzzy search option you use the more miscellaneous hits you will receive (to be expected). However, as far as I can ascertain there is no option to look at the underlying metadata or XML. Although each document page has an individual number which you could use to refer to if you wished to forward a mistake to the management company.

            This site looks quite dated when compared to their newer site, The British Newspaper Archive by the company Brightsolid which is a subscription site that the University does not seem to have access to (no Athens login).[2] Although I could not look at the underlying text for the site being criticised, if it is similar to the Burney Collection (which this site uses as well) then a chance that every other word is incorrect does not seem best practice.[3] If fifty percent of search terms are missed due to lack of OCR recognition at best this could be thought of as negligent and at worst possible fraud. This could be even more disturbing if this could apply to the newer subscription site as well. However is it better to have fifty percent of some primary sources rather than a hundred percent of none?     

Word count 504

[1] Information from their official 'About this Site' pages - please note, must be logged into University site and logged into Athens to use this link. consulted 10/3/12
[3] Information from Week One Worksheet - week commencing 23/1/12

Basic Search

Page from Basic search Norfolk Incendiary

Advanced Search

Advanced Search Erroneous Result
No Results? Ranter Norfolk

Fuzzy Search (Med) on Primitive Methodist Norfolk

1 comment:

  1. Looks good to me - on the images, have you tried resizing them before you upload them?