Building the Digital Collection
Scanning the Printed Material
The printed documents were scanned by contract at the Library of Congress. Each item was reproduced as facsimile page images. In order to preserve the originals, bound works were scanned face-up in their bindings, one page at a time. The master or archival version of the text pages (dark letters on a light background) is a 300-dots-per-inch (dpi) bitonal image in the TIFF format, with ITU Group IV compression. Pages with printed halftone illustrations or finely detailed line drawings were captured as 8-bit grayscale and stored in the JFIF image format (with JPEG compression). The manuscript slave code for the District of Columbia, 1860, was also captured as 8-bit grayscale.
Creating the Searchable Text
After the approval of images by National Digital Library Program staff, searchable texts were prepared by keying the documents from the source images and encoding them with Standard Generalized Markup Language (SGML) according to the American Memory Document Type Definition (DTD). This DTD is a markup scheme that conforms to the guidelines of the Text Encoding Initiative (TEI), the work of a consortium of scholarly institutions. The online presentation of the texts also includes a version in HTML (HyperText Markup Language), produced by the Library in an automated process. Because it requires no special software, the HTML version is easier for most users to access.