June 09, 2009
By Robert B. Townsend
The staff at Google have now posted information about the status of the newspapers obtained from the Paper of Record. As we reported last month, a number of members were deeply distressed after these materials were taken off line and they could not find out about their status.
According to a member of the Google staff, 4.91 million articles from 522 titles obtained from Paper of Record are now live on Google News Archive search (though he adds the caveat that “all articles from these titles may not be comprehensively available, but will otherwise be made available in browse-only mode within 3 months.”)
Another half million additional pages from 381 titles are projected to be available in “browse-only mode within 3 months.” These materials “were of low quality, and we were therefore unable to get quality text after following the OCR process. We are working to put up content from these titles so that they can be browsed.”
They report that materials from additional ten titles will not be available for the foreseeable future, because they could not obtain the rights from the original rights holders.
The materials are currently searchable by a number of search criteria—though searching by particular titles and particular ranges of years can be rather difficult. Staff at Google say they are working to improve the interface and allow improved browsing, which will be critical for newspapers with poor-quality OCR.