Preserving the Digital Preservation Conversation
Digital Preservation Resource Repository
A proposed project created for the Digital Information Management Program
at the School of Information Resources and Library Science
University of Arizona
Jason Kucsma
The rapidly changing technological landscape has quickened the speed at which information is made accessible, and it has also provided information professionals with a tremendous number of tools with which we may improve our ability to preserve history and provide research documentation to meet our organizational and constituent needs. Institutional repositories are one such tool that afford large and small institutions the capability of aggregating, organizing, and sharing of research produced by individuals and committees within the organization. In creating digital repositories for faculty and students, academic institutions are not limited to simply establishing a digital vault for intellectual capital. Rather, institutional repositories facilitate users need for “transportable content that can be utilized within various digital environments and reused in multiple formats, and … forums for the rapid exchange of ideas with both on-campus and external communities” (Walters, 2006).
The rapidly changing landscape and relative “newness” of the digital preservation field also affords scholars, practitioners, and students the opportunity to swap roles more fluidly than virtually any other profession. Such role-swapping, however, demands a thorough, centralized repository of published literature on digital preservation theory and best practices. Much of the published literature on the subject of digital preservation is fairly recent and, thus, constantly evolving. Adopting a documentation strategy introduced by Helen Samuels (1986), this repository would serve as a historical record of where the field has been, a current record of where the field stands, and a projection of where the digital preservation movement is heading. This strategy, according to the Society of American Archivists Glossary (2005), requires “the analysis of the subject to be documented; how that subject is documented in existing records, and information about the subject that is lacking in those records; and the development of a plan to capture adequate documentation of that subject, including the creation of records, if necessary.” It is our intention to address these key concerns within the context of Samuels’s four primary documentation strategy activities. In doing so, our documentation strategy will attempt to identify and define what we intend to collect as published works on the subject of digital preservation; establish the governing personnel and physical site of our repository; structuring and fine-tuning the selection process; and actually selecting and placing the documentation resources.
Published Work on Digital Preservation
Documenting an entire field is not without its challenges. Indeed, it is all but impossible to consider a preservation strategy that attempts to capture all the work in a given discipline, and digital preservation is no exception. However, now is an appropriate time to cast a wide net on published work on digital preservation literature while the scope of work is relatively manageable. Directing preservation energies in this arena now will make it that much easier to consistently document critical shifts in the field in the near and distant future.
There currently does not exist any global effort to aggregate research on digital preservation literature, and focusing efforts on the documentation of this work now, while the corpus is manageable, will fill a gap in the documentation of this movement. The lack of a central repository for the majority of research on this subject is due, in part, to the geographic dispersion of institutions doing major work in this field — with the bulk of this work being done by institutions in Australia, Netherlands, the United Kingdom, and the United States. A centralized open access repository would aid in eliminating geographic and institutional barriers that may be seen as impeding the progress of the digital preservation movement as a whole.
In an attempt to reign in such a seemingly large body of knowledge, the Digital Preservation Resource Repository (DiPRR, pronounced “dipper”) will focus entirely on scholarly works published in online and print journals and those additional works published independently by organizations that self-identify digital preservation as their primary concern. For example articles from D-Lib magazine explicitly focused on digital preservation, digital preservation reports published by the Council on Library and Information Resources, and articles published by the Digital Curation Centre would all be likely additions to DiPRR. The repository may also include conference proceedings as they are created and posted to organizational sites, as these resources often provide the most current snapshot of industry trends. Corporate white papers from Digital Assets Management corporations would not be included in this repository, for example. We will examine a sample of potential resources in the “Selection” portion of this report.
Administering Personnel and Governing Site for DiPRR
The Digital Information Management Certificate Program (nicknamed DigIn) was launched in Summer 2007 and provides an ideal site for the creation and administration of DiPRR. The program focuses on exposing students to a functional and interdisciplinary approach to digital curation. The program “acquaint(s) students with the basic theoretical concepts underpinning libraries, archives, records management and information technology, while at the same time immersing students in the hands-on work these communities are doing with digital collections” (Fulton et. al., 2007). As one of the only programs of its kind being offered in North America, there is an intrinsic relationship between the program and a repository of that could be exploited for the mutual benefit of the program and the profession as a whole.
As a program that mandates students familiarize themselves the literature detailing the theory and practice of digital preservation, DigIn presents a natural site for selecting, appraising, and preserving digital preservation for use by students in the program. Program professors and members of the DigIn national advisory board already do a substantial amount of work reviewing the literature and presenting it for student consumption. Undoubtedly, hundreds of resources may be relevant for student intellectual exploration that never make it to a formal course reading list. A centralized, searchable repository at students’ disposal would encourage students to delve deeper into issues relevant to their own intellectual pursuits that are not able to be covered in greater depth in this short certificate program.
While DigIn professors, with the cooperation of scholars and practitioners (see below), will serve as curators of the initial corpus of work, students in the Advanced Digital Collections course and the program’s Capstone course could use DiPRR as a testbed. Students would be afforded a rare opportunity of hands-on experience with the key issues introduced in the Applied Technology and Introduction to Digital Collections course — namely resource appraisal strategies, metadata schemas, and the technology serving collections.
In a field that is developing as rapidly as digital preservation is, there is a natural built-in community of scholars and practitioners who would benefit from a centralized repository of digital preservation literature. In addition to DiPRR becoming a dynamic collection where scholars and practitioners may drop their work for inclusion, student participation will add value to the collection through applying METS standard metadata to resources.
To this point, the administrative responsibility of DiPRR may seem a bit unclear. While DiPRR will be owned and administered by the DigIn program, it will only be useful as a dynamic resource through the active participation by leading scholars, practitioners, and organizations tackling digital preservation challenges in libraries, archives, and museums. The diverse leadership of the DigIn national advisory board includes leading actors in records managements, archives, libraries, museums, and information technology. These members provide a useful starting point for assisting DigIn in announcing the presence of DiPRR and aiding in publicity of the project in their professional circles.
While open archives (like the one we are proposing here) present some of the greatest opportunities for scholarly communication, they are most effective when they reach a critical mass of participation from scholars and practitioners. We anticipate that the nature of the field (i.e. tendency toward creative commons-type licensing, atmosphere of collaboration and cooperation, recognition of the value of open archives) would aid DiPRR in attracting a strong pool of submissions from individuals and organizations worldwide.
Centralizing Independent Repositories for Preservation and Open Access
Samuels’s (1986) documentation strategy challenges us to rethink how we approach collections like DiPRR. A documentation strategy does not begin with a list of items to be collected. Rather, Samuels suggests it must “begin with detailed investigations of the topic to be documented and the information required. The concern is less what does exist than what should exist … [d]ocumentation strategies are designed to respond to abundance an abundance of institutions and information.” With that in mind, we envision DiPRR will mirror the DigIn program’s pedagogical model (Fulton et. al., 2007) combining theory (knowledge of the discipline), conceptual framework (strategic knowledge guiding action), and practical skills (tools and methods) as they relate to digital collections and the digital curation of cultural and historic heritage resources. Such a collection will inevitably include metadata schema, technology concerns (legacy systems and new advancements), interoperability, user access and interface, and areas which have yet to be determined as critical to the digital preservation field.
With this generalist approach to digital preservation research and best practices as a guiding principle, students in the DigIn program will be charged with administering submissions from researchers interested in self-archiving their work. Work that will be included in DiPRR will necessarily have to meet some general fulfillment of the criteria stated above. Taking a cue from the successful repository at Queensland University of Technology, DiPRR will not host work that is commercial in nature; contains confidential material; would infringe on a legal commitment by the author or her/his sponsoring organization or institution.
It is equally useful to examine the existing state of documentation to identify additional areas of need. Scholars and practitioners working to address the needs, challenges, and promises of the future of digital preservation have done a substantial amount of independent work to make their work accessible via online journals and professional organization websites. And while E-LIS (the international open archive for library and information science literature) has amassed a substantial collection of articles on digital preservation, a good portion of these are written in languages other than English. What remains elusive, then, is a centralized international repository that culls resources after they have been originally printed online in journals and organization websites.
Similar work has been underway at the National Library of Australia with their Preserving Access to Digital Information Initiative (PADI). As a “gateway to international digital preservation resources,” PADI provides an pathfinder to glossaries, articles, and related media on digital preservation concerns. PADI does substantial work to add value to the collection of resources by offering “PADI trails” that help users drill-down through topics of interest to see a collection of resources on a given topic. For instance, a user interested in retrieving resources about types of storage media may use the “Removable Storage Media Trail” to learn basic facts about storage media and read research from a variety of sources. PADI also offers “safekeeping” tags for items that are deemed to be of long-term interest to users.
Ironically, most of the work PADI organizes and provides access to is still hosted on disparate institutional servers. DiPRR will build on the successful PADI model by not only adding value to resources through student-generated pathfinders, but also providing long-term storage of resources on local servers administered by DigIn faculty and students. The importance of this element of DiPRR cannot be overstated. While it is useful to provide links to resources, it ultimately leaves the collection open to problems with broken/changed URLs, server failures, and related potential problems that threaten the long-term access to digital resources. As the host of the DiPRR, DigIn will assume responsibility for providing long-term access to these resources in ways that are simply not possible on the internet alone.
And while it is outside the scope of the initial launch of this repository, DiPRR may prove a useful testbed for providing translations of eprints from E-LIS in languages other than English. Additionally, podcasts, blogs, and other more ephemeral contributions would be important to capture as part of this documentation strategy, as they, like conference proceedings, provide timely snapshots of how professionals are thinking about tackling contemporary preservation issues.
We recognize that a major obstacle to building a repository like this is the licensing rights of work by individuals and authors. Publications and organizations may retain the rights to works they publish or they allow the authors to retain those rights. In either case, DiPRR’s most valuable work will be done in developing relationships with publishers, organizations, and individual scholars — relationships that reinforce the mutually beneficial relationship between scholarship and a repository serving the development of that scholarship. DiPRR will be seeking non-exclusive rights to preserve resources after they have been first published elsewhere. In cases where authors may wish to deposit work directly into DiPRR — bypassing the traditional publication venues — the author will be free to determine the appropriate Creative Commons license for the work and will be asked to provide requisite documentation indicating that he/she is the legal copyright holder of the material and able to make licensing decisions.
Selection: A Cross-Section of Potential Resources for Inclusion in DiPRR
At this point, it is useful to consider a cross-section of resources that DiPRR might host. In the following discussion, we will examine six potential resources and discuss how they contribute to DiPRR’s documentation strategy.
Preservation Management of Digital Materials: A Handbook
Produced by the Digital Preservation Coalition (DPC), this online handbook provides, “an internationally authoritative and practical guide to the subject of managing digital resources over time and the issues in sustaining access to them. It will be of interest to all those involved in the creation and management of digital materials.” We chose this as our first example because it clearly serves as an example of the sort of authoritative resource that would be useful to professionals working on and studying digital preservation initiatives. It was initially created by Neil Beagrie and Maggie Jones, but is currently maintained and updated by the DPC. As a resource that is not static and is copyrighted by the DPC, however, it presents challenges to our repository. In this instance, DiPRR would work with DPC to secure milestone versions of the handbook for inclusion in the repository. If that were not possible, DiPRR would house a record for this resource with metadata generated by students in the DigIn program — along with a link to the original source hosted on DPC servers.
View metadata created for this resource.
Copyright Issues Relevant to the Creation of a Digital Archive: A Preliminary Assessment
This essay by Jane M. Besek presents an analysis of critical copyright issues that are paramount to the work of long-term digital preservation. Unlike the handbook from DPC, this is a static document published by the Councl on Library and Information Resources. As such, it could be easily deposited into DiPRR with little to no concern about version control. The work was commissioned by the Library of Congress for the National Digital Information Infrastructure and Preservation Program. It is not clear if the author retains the rights to this work, or if those rights are reserved by CLIR or the Library of Congress.
View metadata created for this resource.
Digital Curation Centre Manual: Curating E-Mails
This essay, written by Maureen Pennock working for the Digital Curation Centre, “reports on the several issues involved in managing and curating e-mail messages for both current and future use.” Like the above essay by Besek, this is a static document that would provide a useful reference point for the curation of one of the most popular born digital resources that an information professional might be charged with preserving. It is licensed under a Creative Commons “Attribution-Non-Commercial-Share Alike 2.5″ license, which means it may be included in DiPRR with proper attribution and that derivative works may be made from this resource.
View metadata created for this resource.
Data Dictionary for Preservation Metadata
The Data Dictionary is a product of the Online Computer Library Center (OCLC) and Research Libraries Group (RLG) working group titled Preservation Metadata: Implementation Strategies (PREMIS). It “defines and describes an implementable set of core preservation metadata with broad applicability to digital preservation repositories.” The Data Dictionary and related contextual documents contained within the dictionary are invaluable in their ability to illustrate a milestone in the preservation metadata conversation. Reproduction and reuse of this document is permitted as long as the original OCLC and RLG copyright is included. We would not anticipate any problems cataloging and including this resource in DiPRR.
View metadata created for this resource.
Building a Collaborative Digital Preservation Network
This is a presentation by co-principal investigators of the MetaArchive of Southern Digital Culture and the Library of Congress. It discusses the Congressionally mandated National Digital Information Infrastructure Preservation program and the creation of MetaArchive. As both a conference proceeding and a critical element of the digital preservation conversation, this resource is essential to the DiPRR documentation strategy. The resource is “published” via the EduCause site as both an mp3 and a PowerPoint presentation — further aiding in the documentation strategy by diversifying resource formats available in DiPRR.
View metadata created for this resource.
Digital Preservation and Blogs
Another conference proceeding (this time from the Austin, TX music showcase), this resource is an mp3 recording of a conference session discussing the “technical, social, and legal problems” encountered by digital preservationists working to preserve online blogs. Much like the Pennock essay discussing emails, this resource addresses the critical issues that arise when attempting to preserve born-digital resources — particularly when those resources are not fixed resources. Participants on this panel included: Josh Greenberg. Associate Director of Research Projects, Center for History and New Media/George Mason University; Alison Headley, bluishorange; Mike Linksvayer, CTO, Creative Commons; Colin Wells, Pratt Institute; Carrie Bickner, Web Developer, The New York Public Library.
View metadata created for this resource.
Hosting and Preservation of DiPRR Repository
As stated above, the DigIn program, located in Tucson, Arizona as a project of the School of Information Resources and Library Science will be the principal administering body of DiPRR. Pending allocation of available financial and technology resources, SIRLS/DigIn will collaborate with peer institutions that also offer instruction in digital preservation issues. Long- and short-term preservation of the repository will be ensured through participation in the LOCKSS initiative. The breadth and depth of this project will rest on the cooperation and collaboration between researchers and their organizational/institutional sponsors. To assess the viability of a project like this, we will issue a preliminary survey to be distributed to information professionals working on or studying digital preservation initiatives.
While the repository will be hosted at the University of Arizona, efforts will be made to ensure interoperability between other institutional repositories. We recognize, as Henty (2007) mentions, that DiPRR will not serve all digital preservation research needs, and for that reason we must take important steps toward ensuring that DiPRR will be useful across divergent platforms. In addition, interoperability will be a critical selling point as we approach potential co-collaborating institutions for permission to include resources in DiPRR. Providing access to this repository regardless of environments will require quality (common) metadata, scalability of the repository, and reasonable security measures in place.
Problems and Solutions
In preparing this proposal for the creation of the Digital Preservation Resource Repository, we have encountered a number of obstacles. Some of these obstacles have workable solutions, and others will require the team to revisit the strategies for building DiPRR.
Copyright concerns for resources present the first major obstacle. In creating a repository intended primarily to serve the needs of academic professionals and students working on and studying digital initiatives, we are optimistic that we will be able to develop relationships with partnering institutions interested in supporting this initiative. These partnerships will be couched in the need for cooperation, not competition, between institutions working to address the important work of curating digital information for long-term preservation. We anticipate that institutions will be open to the idea of allowing their resources to be deposited in DiPRR under the Fair Use copyright principle. It bears reiterating also that institutions will be encouraged to determine the appropriate licensing for each of the resources deposited in DiPRR and provide the requisite documentation for each licensing option.
No plan for a digital repository of this sort can stand on optimism and best case scenarios. We realize that the unfortunate proprietary nature of academic scholarship will prevent some institutions and individuals from allowing their work to be included in DiPRR. When these situations arise with resources that are of obvious relevance to the documentation strategy, DiPRR administrators (students and faculty in the DigIn program) will create records and generate metadata for these resources and link to them on their host servers.
Technology intensive projects like DiPRR require a substantial amount of funding upfront to acquire the server technology and creation of the repository. Primary launch funding would be established through grant-funding (grant yet to be determined). Maintenance and long-term administration of the repository will be managed by students and faculty in the DigIn program. DiPRR will utilize either DSpace or Fedora — both open-source repository software packages. Final determination of the repository software will be made after a period of research on available options. Information gleaned from that research will be matched against the demands of the repository.
As mentioned above, we encountered some difficulty determining how to handle critical resources that are not static. For example, a number of resources mentioned here (and reviewed but not included in this discussion) are online documents that are under regular review and revision over time. The Preservation Management of Digital Materials Handbook is a perfect example of this. While it was created initially as a static document, it is still subject to review and revision as the current administrators see fit.
It may seem appropriate to simply revise our selection strategy to include only fixed resources. At the same time, many of these “fluid” resources are essential to our documentation strategy. For example, leaving the DSpace Digital Preservation Tools and Strategies Wiki (and resources like it) out of this repository would leave a tremendous gap in our documentation strategy. We have established two possible solutions for this problem. The first, and most thorough, option would be to work with authors or administering institutions to procure milestone versions of particular fluid resources. The frequency of these milestones would depend entirely on how regularly the document undergoes review and revision (yearly, bi-annually, etc…). The second option mirrors what DiPRR would do when acquisition of copyright/licensing rights may not be possible for a given resource. In these instances, DiPRR would include records for these resources with student-generated metadata.
References
Arms, C., McDonald, R. H., Nicol, L. B., & Walters, T. (2005). Building a collaborative digital preservation network. Paper presented at the Educause 2005 Annual Conference, Orlando, FL. Retrieved November 20, 2007, from http://www.educause.edu/LibraryDetailPage/666?ID=EDU05199
Beagrie, N., & Jones, M. (2007). Preservation management of digital materials: The handbook. Retrieved November 20, 2007, from http://www.dpconline.org/graphics/handbook/
Besek, J. M. (2003). Copyright issues relevant to the creation of a digital archive: A preliminary assessment No. 112). Washington, DC: Council on Library and Information Resources. Retrieved November 20, 2007, from http://www.clir.org/pubs/abstract/pub112abst.html
Bickner, C., Greenberg, J., Headley, A., Linksvayer, M., & Wells, C. (2006). Digital preservation and blogs. Austin, TX. Retrieved November 20, 2007, from http://player.sxsw.com/2006/podcasts/SXSW06.INT.20060313.DigitalPreservationAndBlogs.mp3
Cornell University Library. (2005). Digital preservation management: Implementing short-term strategies for long-term problems. Retrieved November 20, 2007, from http://www.library.cornell.edu/iris/tutorial/dpm/eng_index.html
DSpace Federation. (2006). Digital preservation tools and strategies. Retrieved November 20, from http://wiki.dspace.org/index.php/DigitalPreservationToolsAndStrategies
Fulton, B., et al., “Teaching Digital Curation: A Functional Approach, in International Cultural Heritage Informatics Meeting” (ICHIM07): Proceedings, J. Trant and D. Bearman (eds). Toronto: Archives & Museum Informatics. 2007. Published September 30, 2007 at http://www.archimuse.com/ichim07/papers/fulton/fulton.html
Henry, Geneva. On-Line Publishing in the 21st Century: Challenges and Opportunities. D-Lib Magazine, October 2003. Consulted October 10, 2007. http://www.dlib.org//dlib/october03/henry/10henry.html
Henty, Margaret. “Ten Major Issues in Providing a Repository Service in Australian Universities.” D-Lib Magazine 13(5/6) May/June 2007.
http://www.dlib.org/dlib/may07/henty/05henty.html
Lots Of Copies Keeps Stuff Safe. Consulted October 10, 2007. http://www.lockss.org/lockss/Home
Metadata Encoding & Transmission Standard. Consulted November 30, 2007. http://www.loc.gov/standards/mets/
National Library of Australia. Preserving access to digital information: Glossaries. Retrieved November 20, 2007, from http://www.nla.gov.au/padi/format/gloss.html
Online Computer Library Center, & RLG. (2005). Data dictionary for preservation metadata: Final report from the PREMIS working group. Dublin, OH; Mountain View, CA: Online Computer Libary Center and RLG.
Pearce-Moses, et. al. The Society of American Archivists Glossary. Consulted October 10, 2007. http://www.archivists.org/glossary/index.asp
Pennock, M. (2006). DCC curation manual: Curating E-mails. Edinburgh, UK: Digital Curation Centre. Retrieved February 19, 2007, from http://www.dcc.ac.uk/resource/curation-manual/chapters/curating-e-mails
Prud’homme, P. P., Zhong, Y. & Urban, R. (2005). Digital preservation pathfinder. Retrieved November 20, 2007, from http://www.ndiipp.uiuc.edu/index.php?option=com_content&task=view&id=25&Itemid=52
QUT ePrints. “Open-access archive of QUT research literature.” Retrieved November 20, 2007 from http://eprints.qut.edu.au/
Samuels, Helen. “Who Controls the Past.” American Archivist 49(Spring 1986), pp. 109-124. Reprinted in Randall Jimerson, ed., American Archival Studies: Readings in Theory and Practice ( Chicago : Society of American Archivists, 2000), pp. 193-210.
Walters, Tyler. “Strategies and Frameworks for Institutional Repositories and the New Support Infrastructure for Scholarly Communications.” D-Lib Magazine 12(10) October 2006. Retrieved November 20, 2007 from http://www.dlib.org/dlib/october06/walters/10walters.html

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.