Digital Preservation
Resources Repository (DiPRR, "dipper"):
Preserving the Digital Preservation Conversation
A proposed project created for the Digital
Information Management Program
at the School of Information
Resources and Library Science
University of Arizona
Jason Kucsma
The rapidly changing technological landscape has quickened the speed at
which information is made accessible, and it has also provided information
professionals with a tremendous number of tools with which we may
improve our ability to preserve history and provide research
documentation to meet our organizational and constituent needs.
Institutional repositories are one such tool that afford large and
small institutions the capability of aggregating, organizing, and
sharing of research produced by individuals and committees within the
organization. In creating digital repositories for faculty and
students, academic institutions are not limited to simply establishing
a digital vault for intellectual capital. Rather, institutional
repositories facilitate users need for "transportable content that can
be utilized within various digital environments and reused in multiple
formats, and ... forums for the rapid exchange of ideas with both
on-campus and external communities" (Walters, 2006).
The rapidly changing landscape and
relative "newness" of the digital preservation field also affords
scholars, practitioners, and students the opportunity to swap roles
more fluidly than virtually any other profession. Such role-swapping,
however, demands a thorough, centralized repository of published
literature on digital preservation theory and best practices. Much of
the published literature on the subject of digital preservation is
fairly recent and, thus, constantly evolving. Adopting a documentation
strategy introduced by Helen Samuels (1986), this repository would
serve as a historical record of where the field has been, a current
record of where the field stands, and a projection of where the digital
preservation movement is heading. This strategy, according to the
Society of American Archivists Glossary (2005), requires "the analysis
of the subject to be documented; how that subject is documented in
existing records, and information about the subject that is lacking in
those records; and the development of a plan to capture adequate
documentation of that subject, including the creation of records, if
necessary." It is our intention to address these key concerns within
the context of Samuels's four primary documentation strategy
activities. In doing so, our documentation strategy will attempt to
identify and define what we intend to collect as published works on the
subject of digital preservation; establish the governing personnel and
physical site of our repository; structuring and fine-tuning the
selection process; and actually selecting and placing the documentation
resources.
Published Work on
Digital Preservation
Documenting an entire field is not
without its challenges. Indeed, it is all but impossible to consider a
preservation strategy that attempts to capture all the work in a given
discipline, and digital preservation is no exception. However, now is
an appropriate time to cast a wide net on published work on digital
preservation literature while the scope of work is relatively
manageable. Directing preservation energies in this arena now will make
it that much easier to consistently document critical shifts in the
field in the near and distant future.
There currently does not exist any
global effort to aggregate research on digital preservation literature,
and focusing efforts on the documentation of this work now, while the
corpus is manageable, will fill a gap in the documentation of this
movement. The lack of a central repository for the majority of research
on this subject is due, in part, to the geographic dispersion of
institutions doing major work in this field -- with the bulk of this
work being done by institutions in Australia, Netherlands, the United
Kingdom, and the United States. A centralized open access repository
would aid in eliminating geographic and institutional barriers
that may be seen as impeding the progress of the digital preservation
movement as a whole.
In an attempt to reign in such a
seemingly large body of knowledge, the Digital Preservation Resource
Repository (DiPRR, pronounced "dipper") will focus entirely on
scholarly works published in online and print journals and those
additional works published independently by organizations that
self-identify digital preservation as their primary concern. For
example articles from D-Lib magazine explicitly focused on digital
preservation, digital preservation reports published by the Council on
Library and Information Resources, and articles published by the
Digital Curation Centre would all be likely additions to DiPRR. The
repository may also include conference proceedings as they are created
and posted to organizational sites, as these resources often provide
the most current snapshot of industry trends. Corporate white papers
from Digital Assets Management corporations would not be included in
this repository, for example. We will examine a sample of potential
resources in the "Selection" portion of this report.
Administering
Personnel and Governing Site for DiPRR
The Digital Information Management
Certificate Program (nicknamed DigIn) was launched in Summer 2007 and
provides an ideal site for the creation and administration of DiPRR.
The program focuses on exposing students to a functional and
interdisciplinary approach to digital curation. The program
"acquaint(s) students with the basic theoretical concepts underpinning
libraries, archives, records management and information technology,
while at the same time immersing students in the hands-on work these
communities are doing with digital collections" (Fulton et. al., 2007).
As one of the only programs of its kind being offered in North America,
there is an intrinsic relationship between the program and a repository
of that could be exploited for the mutual benefit of the program and
the profession as a whole.
As a program that mandates students
familiarize themselves the literature detailing the theory and practice
of digital preservation, DigIn presents a natural site for selecting,
appraising, and preserving digital preservation for use by students in
the program. Program professors and members of the DigIn national
advisory board already do a substantial amount of work reviewing the
literature and presenting it for student consumption. Undoubtedly,
hundreds of resources may be relevant for student intellectual
exploration that never make it to a formal course reading list. A
centralized, searchable repository at students' disposal would
encourage students to delve deeper into issues relevant to their own
intellectual pursuits that are not able to be covered in greater depth
in this short certificate program.
While DigIn professors, with the
cooperation of scholars and practitioners (see below), will serve as
curators of the initial corpus of work, students in the Advanced
Digital Collections course and the program's Capstone course could use
DiPRR as a testbed. Students would be afforded a rare opportunity of
hands-on experience with the key issues introduced in the Applied
Technology and Introduction to Digital Collections course -- namely
resource appraisal strategies, metadata schemas, and the technology
serving collections.
In a field that is developing as
rapidly as digital preservation is, there is a natural built-in
community of scholars and practitioners who would benefit from a
centralized repository of digital preservation literature. In addition
to DiPRR becoming a dynamic collection where scholars and practitioners
may drop their work for inclusion, student participation will add value
to the collection through applying METS standard metadata to resources.
To this point, the administrative
responsibility of DiPRR may seem a bit unclear. While DiPRR will be
owned and administered by the DigIn program, it will only be useful as
a dynamic resource through the active participation by leading
scholars, practitioners, and organizations tackling digital
preservation challenges in libraries, archives, and museums. The
diverse leadership of the DigIn national advisory board includes
leading actors in records managements, archives, libraries, museums,
and information technology. These members provide a useful starting
point for assisting DigIn in announcing the presence of DiPRR and
aiding in publicity of the project in their professional circles.
While open archives (like the one we
are proposing here) present some of the greatest opportunities for
scholarly communication, they are most effective when they reach a
critical mass of participation from scholars and practitioners. We
anticipate that the nature of the field (i.e. tendency toward creative
commons-type licensing, atmosphere of collaboration and cooperation,
recognition of the value of open archives) would aid DiPRR in
attracting a strong pool of submissions from individuals and
organizations worldwide.
Centralizing
Independent Repositories for Preservation and Open Access
Samuels's (1986) documentation
strategy challenges us to rethink how we approach collections like
DiPRR. A documentation strategy does not begin with a list of items to
be collected. Rather, Samuels suggests it must "begin with detailed
investigations of the topic to be documented and the information
required. The concern is less what does exist than what should exist
... [d]ocumentation strategies are designed to respond to abundance an
abundance of institutions and information." With that in mind, we
envision DiPRR will mirror the DigIn program's pedagogical model
(Fulton et. al., 2007) combining theory (knowledge of the discipline),
conceptual framework (strategic knowledge guiding action), and
practical skills (tools and methods) as they relate to digital
collections and the digital curation of cultural and historic heritage
resources. Such a collection will inevitably include metadata schema,
technology concerns (legacy systems and new advancements),
interoperability, user access and interface, and areas which have yet
to be determined as critical to the digital preservation field.
With this generalist approach to
digital preservation research and best practices as a guiding
principle, students in the DigIn program will be charged with
administering submissions from researchers interested in self-archiving
their work. Work that will be included in DiPRR will necessarily have
to meet some general fulfillment of the criteria stated above. Taking a
cue from the successful repository at Queensland University of
Technology, DiPRR will not host work that is commercial in nature;
contains confidential material; would infringe on a legal commitment by
the author or her/his sponsoring organization or institution.
It is equally useful to examine the
existing state of documentation to identify additional areas of need.
Scholars and practitioners working to address the needs, challenges,
and promises of the future of digital preservation have done a
substantial amount of independent work to make their work accessible
via online journals and professional organization websites. And while
E-LIS (the international open archive for library and information
science literature) has amassed a substantial collection of articles on
digital preservation, a good portion of these are written in languages
other than English. What remains elusive, then, is a centralized
international repository that culls resources after they have been
originally printed online in journals and organization websites.
Similar work has been underway at the
National Library of Australia with their Preserving Access to Digital
Information Initiative (PADI). As a "gateway to international digital
preservation resources," PADI provides an pathfinder to glossaries,
articles, and related media on digital preservation concerns. PADI does
substantial work to add value to the collection of resources by
offering "PADI trails" that help users drill-down through topics of
interest to see a collection of resources on a given topic. For
instance, a user interested in retrieving resources about types of
storage media may use the "Removable Storage Media Trail" to learn
basic facts about storage media and read research from a variety of
sources. PADI also offers "safekeeping" tags for items that are deemed
to be of long-term interest to users.
Ironically, most of the work PADI
organizes and provides access to is still hosted on disparate
institutional servers. DiPRR will build on the successful PADI model by
not only adding value to resources through student-generated
pathfinders, but also providing long-term storage of resources on local
servers administered by DigIn faculty and students. The importance of
this element of DiPRR cannot be overstated. While it is useful to
provide links to resources, it ultimately leaves the collection open to
problems with broken/changed URLs, server failures, and related
potential problems that threaten the long-term access to digital
resources. As the host of the DiPRR, DigIn will assume responsibility
for providing long-term access to these resources in ways that are
simply not possible on the internet alone.
And while it is outside the scope of
the initial launch of this repository, DiPRR may prove a useful testbed
for providing translations of eprints from E-LIS in languages other
than English. Additionally, podcasts, blogs, and other more ephemeral
contributions would be important to capture as part of this
documentation strategy, as they, like conference proceedings, provide
timely snapshots of how professionals are thinking about tackling
contemporary preservation issues.
We recognize that a major obstacle to
building a repository like this is the licensing rights of work by
individuals and authors. Publications and organizations may retain the
rights to works they publish or they allow the authors to retain those
rights. In either case, DiPRR's most valuable work will be done in
developing relationships with publishers, organizations, and individual
scholars -- relationships that reinforce the mutually beneficial
relationship between scholarship and a repository serving the
development of that scholarship. DiPRR will be seeking non-exclusive
rights to preserve resources after they have been first published
elsewhere. In cases where authors may wish to deposit work directly
into DiPRR -- bypassing the traditional publication venues -- the
author
will be free to determine the appropriate Creative Commons license for
the work and will be asked to provide requisite documentation
indicating that he/she is the legal copyright holder of the material
and able to make licensing decisions.
Selection: A
Cross-Section of Potential Resources for Inclusion in DiPRR
At this point, it is useful to
consider a cross-section of resources that DiPRR might host. In the
following discussion, we will examine six potential resources and
discuss how they contribute to DiPRR's documentation strategy.
Preservation
Management of Digital
Materials: A Handbook
Produced by the Digital Preservation
Coalition (DPC), this online handbook provides, "an internationally
authoritative and practical guide to the subject of managing digital
resources over time and the issues in sustaining access to them. It
will be of interest to all those involved in the creation and
management of digital materials." We chose this as our first example
because it clearly serves as an example of the sort of authoritative
resource that would be useful to professionals working on and studying
digital preservation initiatives. It was initially created by Neil
Beagrie and Maggie Jones, but is currently maintained and updated by
the DPC. As a resource that is not static and is copyrighted by the
DPC, however, it presents challenges to our repository. In this
instance, DiPRR would work with DPC to secure milestone versions of the
handbook for inclusion in the repository. If that were not possible,
DiPRR would house a record for this resource with metadata generated by
students in the DigIn program -- along with a link to the original
source hosted on DPC servers.
View metadata created
for this resource.
Copyright
Issues Relevant to the
Creation of a Digital Archive: A Preliminary Assessment
This essay by Jane M. Besek presents
an analysis of critical copyright issues that are paramount to the work
of long-term digital preservation. Unlike the handbook from DPC, this
is a static document published by the Councl on Library and Information
Resources. As such, it could be easily deposited into DiPRR with little
to no concern about version control. The work was commissioned by the
Library of Congress for the National Digital Information Infrastructure
and Preservation Program. It is not clear if the author retains the
rights to this work, or if those rights are reserved by CLIR or the
Library of Congress.
View metadata created
for this resource.
Digital
Curation Centre Manual: Curating
E-Mails
This essay, written by Maureen
Pennock working for the Digital Curation Centre, "reports on the
several issues involved in managing and curating e-mail messages for
both current and future use." Like the above essay by Besek, this is a
static document that would provide a useful reference point for the
curation of one of the most popular born digital resources that an
information professional might be charged with preserving. It is
licensed under a Creative Commons "Attribution-Non-Commercial-Share
Alike 2.5" license, which means it may be included in DiPRR with proper
attribution and that derivative works may be made from this resource.
View metadata created
for this resource.
Data
Dictionary for Preservation Metadata
The Data Dictionary is a product of
the Online Computer Library Center (OCLC) and Research Libraries Group
(RLG) working group titled Preservation Metadata: Implementation
Strategies (PREMIS). It "defines and describes an implementable set of
core preservation metadata with broad applicability to digital
preservation repositories." The Data Dictionary and related contextual
documents contained within the dictionary are invaluable in their
ability to illustrate a milestone in the preservation metadata
conversation. Reproduction and reuse of this document is permitted as
long as the original OCLC and RLG copyright is included. We would not
anticipate any problems cataloging and including this resource in
DiPRR.
View metadata created
for this resource.
Building
a Collaborative Digital
Preservation Network
This is a presentation by
co-principal investigators of the MetaArchive of Southern Digital
Culture and the Library of Congress. It discusses the Congressionally
mandated National Digital Information Infrastructure Preservation
program and the creation of MetaArchive. As both a conference
proceeding and a critical element of the digital preservation
conversation, this resource is essential to the DiPRR documentation
strategy. The resource is "published" via the EduCause site as both an
mp3 and a PowerPoint presentation -- further aiding in the
documentation
strategy by diversifying resource formats available in DiPRR.
View metadata created
for this resource.
Digital
Preservation and Blogs
Another conference proceeding (this
time from the Austin, TX music showcase), this resource is an mp3
recording of a conference session discussing the "technical, social,
and legal problems" encountered by digital preservationists working to
preserve online blogs. Much like the Pennock essay discussing emails,
this resource addresses the critical issues that arise when attempting
to preserve born-digital resources -- particularly when those resources
are not fixed resources. Participants on this panel included: Josh
Greenberg. Associate Director of Research Projects, Center
for History and New Media/George Mason University; Alison Headley,
bluishorange; Mike Linksvayer, CTO, Creative Commons;
Colin Wells, Pratt Institute; Carrie Bickner, Web Developer, The New
York Public Library.
View metadata created
for this resource.
Hosting and
Preservation of DiPRR Repository
As stated above, the DigIn program,
located in Tucson, Arizona as a project of the School of Information
Resources and Library Science will be the principal administering body
of DiPRR. Pending allocation of available financial and technology
resources, SIRLS/DigIn will collaborate with peer institutions that
also offer instruction in digital preservation issues. Long- and
short-term preservation of the repository will be ensured through
participation in the LOCKSS initiative. The breadth and depth of this
project will rest on the cooperation and collaboration between
researchers and their organizational/institutional sponsors. To assess
the viability of a project like this, we will issue a preliminary
survey to be distributed to information professionals working on or
studying digital preservation initiatives.
While the repository will be hosted
at the University of Arizona, efforts will be made to ensure
interoperability between other institutional repositories. We
recognize, as Henty (2007) mentions, that DiPRR will not serve all
digital preservation research needs, and for that reason we must take
important steps toward ensuring that DiPRR will be useful across
divergent platforms. In addition, interoperability will be a critical
selling point as we approach potential co-collaborating institutions
for permission to include resources in DiPRR. Providing access to this
repository regardless of environments will require quality (common)
metadata, scalability of the repository, and reasonable security
measures in place.
Problems and
Solutions
In preparing this proposal for the
creation of the Digital Preservation Resource Repository, we have
encountered a number of obstacles. Some of these obstacles have
workable solutions, and others will require the team to revisit the
strategies for building DiPRR.
Copyright concerns for resources
present the first major obstacle. In creating a repository intended
primarily to serve the needs of academic professionals and students
working on and studying digital initiatives, we are optimistic that we
will be able to develop relationships with partnering institutions
interested in supporting this initiative. These partnerships will be
couched in the need for cooperation, not competition, between
institutions working to address the important work of curating digital
information for long-term preservation. We anticipate that institutions
will be open to the idea of allowing their resources to be deposited in
DiPRR under the Fair Use copyright principle. It bears reiterating also
that institutions will be encouraged to determine the appropriate
licensing for each of the resources deposited in DiPRR and provide the
requisite documentation for each licensing option.
No plan for a digital repository of
this sort can stand on optimism and best case scenarios. We realize
that the unfortunate proprietary nature of academic scholarship will
prevent some institutions and individuals from allowing their work to
be included in DiPRR. When these situations arise with resources that
are of obvious relevance to the documentation strategy, DiPRR
administrators (students and faculty in the DigIn program) will create
records and generate metadata for these resources and link to them on
their host servers.
Technology intensive projects like
DiPRR require a substantial amount of funding upfront to acquire the
server technology and creation of the repository. Primary launch
funding would be established through grant-funding (grant yet to be
determined). Maintenance and long-term administration of the repository
will be managed by students and faculty in the DigIn program. DiPRR
will utilize either DSpace or Fedora -- both open-source repository
software packages. Final determination of the repository software will
be made after a period of research on available options. Information
gleaned from that research will be matched against the demands of the
repository.
As mentioned above, we encountered
some difficulty determining how to handle critical resources that are
not static. For example, a number of resources mentioned here (and
reviewed but not included in this discussion) are online documents that
are under regular review and revision over time. The Preservation
Management of Digital Materials Handbook is a perfect example of this.
While it was created initially as a static document, it is still
subject to review and revision as the current administrators see fit.
It may seem appropriate to simply
revise our selection strategy to include only fixed resources. At the
same time, many of these "fluid" resources are essential to our
documentation strategy. For example, leaving the DSpace Digital
Preservation Tools and Strategies Wiki (and resources like it) out of
this repository would leave a tremendous gap in our documentation
strategy. We have established two possible solutions for this problem.
The first, and most thorough, option would be to work with authors or
administering institutions to procure milestone versions of particular
fluid resources. The frequency of these milestones would depend
entirely on how regularly the document undergoes review and revision
(yearly, bi-annually, etc...). The second option mirrors what DiPRR
would do when acquisition of copyright/licensing rights may not be
possible for a given resource. In these instances, DiPRR would include
records for these resources with student-generated metadata.
References
Arms, C., McDonald, R. H., Nicol, L.
B., & Walters, T. (2005). Building a collaborative digital
preservation network. Paper presented at the Educause 2005 Annual
Conference, Orlando, FL. Retrieved November 20, 2007, from
http://www.educause.edu/LibraryDetailPage/666?ID=EDU05199
Beagrie, N., & Jones, M. (2007).
Preservation management of digital materials: The handbook. Retrieved
November 20, 2007, from http://www.dpconline.org/graphics/handbook/
Besek, J. M. (2003). Copyright issues
relevant to the creation of a digital archive: A preliminary assessment
No. 112). Washington, DC: Council on Library and Information Resources.
Retrieved November 20, 2007, from
http://www.clir.org/pubs/abstract/pub112abst.html
Bickner, C., Greenberg, J., Headley,
A., Linksvayer, M., & Wells, C. (2006). Digital preservation and
blogs. Austin, TX. Retrieved November 20, 2007, from
http://player.sxsw.com/2006/podcasts/SXSW06.INT.20060313.DigitalPreservationAndBlogs.mp3
Cornell University Library. (2005).
Digital preservation management: Implementing short-term strategies for
long-term problems. Retrieved November 20, 2007, from
http://www.library.cornell.edu/iris/tutorial/dpm/eng_index.html
DSpace Federation. (2006). Digital
preservation tools and strategies. Retrieved November 20, from
http://wiki.dspace.org/index.php/DigitalPreservationToolsAndStrategies
Fulton, B., et al., "Teaching Digital
Curation: A Functional Approach, in International Cultural Heritage
Informatics Meeting" (ICHIM07): Proceedings, J. Trant and D. Bearman
(eds). Toronto: Archives & Museum Informatics. 2007. Published
September 30, 2007 at
http://www.archimuse.com/ichim07/papers/fulton/fulton.html
Henry, Geneva. On-Line Publishing in
the 21st Century: Challenges and Opportunities. D-Lib Magazine, October
2003. Consulted October 10, 2007.
http://www.dlib.org//dlib/october03/henry/10henry.html
Henty, Margaret. "Ten Major Issues in
Providing a Repository Service in Australian Universities." D-Lib
Magazine 13(5/6) May/June 2007.
http://www.dlib.org/dlib/may07/henty/05henty.html
Lots Of Copies Keeps Stuff Safe.
Consulted October 10, 2007. http://www.lockss.org/lockss/Home
Metadata Encoding & Transmission
Standard. Consulted November 30, 2007.
http://www.loc.gov/standards/mets/
National Library of Australia.
Preserving access to digital information: Glossaries. Retrieved
November 20, 2007, from http://www.nla.gov.au/padi/format/gloss.html
Online Computer Library Center, &
RLG. (2005). Data dictionary for preservation metadata: Final report
from the PREMIS working group. Dublin, OH; Mountain View, CA: Online
Computer Libary Center and RLG.
Pearce-Moses, et. al. The Society of
American Archivists Glossary. Consulted October 10, 2007.
http://www.archivists.org/glossary/index.asp
Pennock, M. (2006). DCC curation
manual: Curating E-mails. Edinburgh, UK: Digital Curation Centre.
Retrieved February 19, 2007, from
http://www.dcc.ac.uk/resource/curation-manual/chapters/curating-e-mails
Prud'homme, P. P., Zhong, Y. &
Urban, R. (2005). Digital preservation pathfinder. Retrieved November
20, 2007, from
http://www.ndiipp.uiuc.edu/index.php?option=com_content&task=view&id=25&Itemid=52
QUT ePrints. "Open-access archive of
QUT research literature." Retrieved November 20, 2007 from
http://eprints.qut.edu.au/
Samuels, Helen. "Who Controls the
Past." American Archivist 49(Spring 1986), pp. 109-124. Reprinted in
Randall Jimerson, ed., American Archival Studies: Readings in Theory
and Practice ( Chicago : Society of American Archivists, 2000), pp.
193-210.
Walters, Tyler. "Strategies and
Frameworks for Institutional Repositories and the New Support
Infrastructure for Scholarly Communications." D-Lib Magazine 12(10)
October 2006. Retrieved November 20, 2007 from
http://www.dlib.org/dlib/october06/walters/10walters.html

This work is licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 3.0 United States License.