The staff of the National Digital Library Program at the Library of Congress have identified ten challenges that must be met if large and effective digital libraries are to be created during the 21st century. In some cases, there may be no technology solution to the challenge, but through sharing of ideas, new thinking may emerge to help institutions such as the Library of Congress formulate policy on these important issues. The challenges may be grouped under the following broad categories building the resource, interoperability, intellectual property, providing effective access, and sustaining the resource. The Library hopes that creative and innovative minds can devise solutions to these challenges.
Building the Resource
Challenge One: Develop improved technology for digitizing analog materials.
In order to build a comprehensive resource, historical materials now in analog form (e.g., books, journals, laboratory records, sound recordings, manuscripts, photographs) must be converted. Today, the technology for digital conversion is, at best, emergent and often forces a library to choose between risking damage to precious originals or producing the highest quality reproductions. There are few established standards or best practices and a shortage of tools for the objective measurement of reproduction quality. There is a need for more automated support for capturing in explicit data structures the navigational and organizational clues implicit in printed works through page numbers, tables of contents, and indices.
Challenge Two: Design search and retrieval tools that compensate for abbreviated or incomplete cataloging or descriptive information.
Providing access to library collections is labor-intensive. In order to apply scarce resources to the digitization of significant quantities of content, it is often necessary to reduce the level of detail offered in accompanying catalogs or indexes. Can automated tools permit the incorporation of factual knowledge (e.g., France is in Europe; Leontyne Price is an opera singer.) into descriptive information, indexing, or search and retrieval systems? Could such bodies of factual knowledge be shared or assembled cooperatively and distributed?
Challenge Three: Design tools that facilitate the enhancement of cataloging or descriptive information by incorporating the contributions of users.
Can the digital library take advantage of distributed expertise? Among millions of users, there will be those who can enhance the description or cataloging of an item, thus improving the next researcher's chance of finding it. Collaborative tools could allow far-flung professional colleagues, e.g., faculty or graduate students in the nation's universities, to provide excellent enhancements to materials they employ for their own advanced research. Less expert users (schoolchildren familiar with buildings in photographs, or individuals who recognize a family member in a group picture) can also add value to the resource. What filters, methods for attributing enhancements without violating privacy, or other protections against misuse could support this enhancement of the resource?
Challenge Four: Establish protocols and standards to facilitate the assembly of distributed digital libraries.
How can a distributed resource like the National Digital Library be assembled to create a virtual unity? What types of protocols and what degree of standardization on types of digital objects will achieve a balance between feasibility of widespread implementation and coherence of access? Should unified searching use an approach like that found in the Z39.50 standard (distributed search) or the approach used by World Wide Web search engines (distributed indexing)? How can distributed digital libraries best safeguard the rights associated with content (including rights of privacy and conditions imposed by donors as well as copyright) while still providing the broadest possible access?
Challenge Five: Address legal concerns associated with access, copying, and dissemination of physical and digital materials.
A key element for digital libraries is appropriate recognition and protection of legal rights such as copyright, publicity, privacy, matters of obscenity, defamation intellectual property protection as well as less legalistic but serious concerns associated with the ethics of sharing or providing access to folk or ethnographic materials. The vision for digital libraries includes fluid, easy access to a wide variety of materials. This is often in conflict with the duties of libraries and archives entrusted with care and management of materials that may be subject to privacy rights or other needs for security.
Efforts to formulate digital libraries will be delayed or frustrated in the absence of a common, responsible framework of rights, permissions, and restrictions that acknowledges the mutual needs of rights-holders and users of materials in digital libraries. The challenge here is, in part, to develop mechanisms, perhaps social expectations independently or in combination with technical means, regarding acceptable levels of access (for example where privacy rights are at issue) and use (such as the extent or permissible copying and dissemination). Could responsible practices include acceptable use policies, codes of practice, and standard contracts that begin to establish norms of behavior by people creating and using digital library resources? How can authors, creators, researchers, publishers (who may require some control over how the information is made available or used) and digital libraries develop reasonably administrable means to maintain appropriate stewardship without focusing only on work in the public domain (or items not otherwise subject to legal protection)?
Materials currently available on the Internet from the American Memory collections range from items for which the Library is unaware of any copyright or other legal concerns to items where permission was sought from copyright or other rights' holders for inclusion in the Library's website. American Memory materials also encompass a wide range of media such as printed text, photographs, prints, sound recordings, and film. Applicants are encouraged to explore the more technical challenges in the context of the legal concerns and should carefully consider the "access statements" as well as the Copyright and Restriction Statements on the American Memory homepage and on most collection homepages. Note that the Library cannot provide legal advice to applicants regarding the contents of the collections beyond these statements.
Challenge Six: Integrate access to both digital and physical materials.
A user looking for an item in a library catalog should be able to identify it without regard to whether it is available in its original physical form or as a digital or microfilm reproduction. Intellectual descriptions of originals and reproductions should be presented in a fully integrated way. During the current experimental period, however, many digitization efforts are disconnected from traditional library services. Even when appropriate catalog records exist, digital content may fail to connect to potential users because individual items in digital collections can not be retrieved directly or are not identified appropriately to support links from traditional catalogs or bibliographic indexes.
Challenge Seven: Develop approaches that can present heterogenous resources in a coherent way.
A digital library that provides diverse content will be characterized by heterogeneity in original format, in digital format and resolution, and in the level of detail and format of descriptive information that is available to support access. This heterogeneity may be seen in the historical collections on-line at the Library of Congress, which typify the larger class of materials that are likely to form part of any digital library, however defined. The National Digital Library Program offering includes books, articles, pamphlets, personal papers, legislative documents, prints, architectural drawings, photographs, maps, sheet music, sound recordings, and movies. Some text materials have been re-keyed and marked up in Standard Generalized Markup Language (SGML), some have been captured as bit-mapped images, and some are available in both forms. Pictorial images have been captured at various spatial and tonal resolutions. Some collections have detailed catalogs or indexes, while others are described in brief and superficial ways. The Library of Congress is building a generic repository that can store objects in any format, and represent relationships between objects, such as a sequence of page-images forming a book, which might also have been transcribed and marked up in SGML.
In the face of great diversity of content and description, special problems attend to the development of a coherent approach to indexing and presenting retrieval results. It is important that any approach allow all the information available to be used to aid retrieval rather than force the user who wants to search across the entire resource to rely on some lowest common denominator of descriptive information.
Challenge Eight: Make the National Digital Library useful to different communities of users and for different purposes.
How many different ways can users explore and discover content? What capabilities will permit users to customize the interface and specify preferences that affect retrieval results? Will teachers benefit from tools that support group projects or collaboration with colleagues? By whom might these tools be developed? How can differences in vocabulary be resolved? For example, how might an interface translate the search terms selected by today's users into the language of older historical documents? How might the vocabulary of, say, teachers looking for material to illustrate broad topics in a prescribed curriculum be mapped to the vocabulary of the catalogers who describe individual items?
Challenge Nine: Provide more efficient and more flexible tools for transforming digital content to suit the needs of end-users.
Today, each content item in most digital libraries is represented in multiple forms or versions. The multiple forms exist to serve varieties of users, function as archival masters, and reduce download time and transmission loads on networks. A content provider may produce large and small versions of images; compressed and uncompressed versions of images, texts, audio, and video; texts formatted for browser software and also formatted for preservation or publication; and materials both in proprietary formats and in public or "open" formats. This burden of plural production and maintenance results from the fact that today many digital objects are hard to transform on the fly. What technologies can be developed to make digital objects malleable, migrate-able, and transformable? Similar capabilities are also needed to ensure the preserving of digital content for posterity.
Sustaining the Resource
Challenge Ten: Develop economic models for the support of the National Digital Library.
The creation and maintenance of digital libraries is very expensive. Costs are incurred for production, for ongoing provision of access, and for preservation of the digital information. The cost to develop and operate a distributed architecture for long-term archiving, migration, and backup of digital materials will be high. Since the resource is distributed among providers, the net cost tends to be disguised. Libraries would benefit from better estimates of costs and trends in cost for production and maintenance of a corpus of digital information.
How can the continuing costs of assembling content and providing access to the American public best be met? Is technology available that could offer better measurement of benefits and savings? To whom do the greatest benefits and savings accrue? Are there value-added services the payment for which will subsidize broad public access?