Monday, November 30, 2015

Sharing the load: Jisc RDM Shared Services events

Sharing the load: Jisc RDM Shared Services events. Chris Awre. Digital Archiving blog. 25 November 2015.
     This post is a summary of the Jisc event he attended that was looking at shared services for research data management.  Most academic institutions are struggling to manage research data and some form of shared service provision will be of benefit.  The presentation "Digital Preservation Requirements for Research Data Management" that he and Jenny Mitcham gave "highlighted the importance of digital preservation as part of a full RDM service, stressing of how a lack of digital preservation planning has led to data loss over time, and how consideration of requirements has been based on long established principles from the OAIS Reference Model". Any RDM shared service should include digital preservation capabilities. There is a need to provide a suit of shared services, including providing a shared service platform for digital preservation and providing independent digital preservation tools.

Friday, November 27, 2015

High-speed digitization of historic artifacts

CultLab3D’s 3D scanning conveyor belt allows high-speed digitization of historic artifacts. Benedict. 3D printer and 3D printing news website. Nov 19, 2015.
     Researchers at the Fraunhofer Institute for Computer Graphics Research IGD have developed CultLab3D: a 3D scanning system that can create digital images of 3D objects. The project aim is to provide mass digitization, annotation and storage of historical artifacts for museums and other places of preservation. Quotes and notes from the article:
  • "Digital preservation is one of the most important methods of sustaining our cultural history."
  • digital preservation makes it possible to created and maintain scans of written texts
  • "Digital preservation of texts is one thing, but the preservation of physical artifacts is quite another."
  • while there is no real substitute for an authentic historical artifact, something should be done to preserve historical artifacts
This organization believes that the digital preservation of historical artifacts via 3D scanning is undoubtedly a worthwhile endeavor.

Wednesday, November 25, 2015

Tool Time, or a Discussion on Picking the Right Digital Preservation Tools for Your Program: An NDSR Project Update

Tool Time, or a Discussion on Picking the Right Digital Preservation Tools for Your Program: An NDSR Project Update. John Caldwell; Erin Engle. The Signal. November 17, 2015    
     "There are lots of tools out there, from checksum validators to digital forensics suites and wholesale preservation solutions." Instead of wanting the latest tool, ask if this right tool is right for you for this situation?  The NDSA project is looking at:

  •     studying current workflows;
  •     benchmarking current policies against best practices;
  •     reviewing and testing potential digital curation applications;
  •     proposing sustainable workflows that align with current digital curation standards; and
  •     producing a white paper to sum up current processes and propose next steps.

In order to determine what the right tool, there are some things you need to know:
  1. know your records: how electronic records are being managed, how archivists are processing them, and what happens with the materials after.
  2. what do you want the end result to be. 
  3. what tool to use for the task
    1. Placement: Where does the tool fit into your process?
    2. Purpose: What does the tool actually do?
    3. Utility: How easy is the tool to use and does its output make sense?
"The seemingly straightforward question of utility is fundamentally tied to the question of purpose, and also the viability question: is the tool a long-term solution or a quick fix for today?" They are finding that they need to add preservation metadata to the records and establish the record integrity as early in the lifecycle as possible.
 An interesting comment on the blog post: "Digital preservation systems are precisely that: systems. Systems are a complex set of elements (people, technologies) and the connections between them (policies, procedures). Without all of these pieces, there really isn’t a system. There is just a tool. A hammer isn’t a house, just as a tool isn’t a digital preservation system."

Tuesday, November 24, 2015

Five Takeaways from AOIR 2015

Five Takeaways from AOIR 2015. Rosalie Lack. Netpreserve blog. 18 November 2015. 
     A blog post on the annual Association of Internet Researchers (AOIR) conference in
Phoenix, AZ. The key takeaways in the article:
  1. Digital Methods Are Where It’s At.  Researchers are recognizing that digital research skills are essential. And, if you have some basic coding knowledge, all the better. The Digital Methods Initiative from Europe has tons of great information, including an amazing list of tools.
  2. Twitter API Is also Very Popular
  3. Social Media over Web Archives. Researchers used social media more than web archived materials.  
  4. Fair Use Needs a PR Movement. There is a lot of misunderstanding or limited understanding of fair use, even for those scholars who had previously attended a fair use workshop. Many admitted that they did not conduct particular studies because of a fear of violating copyright. 
  5. Opportunities for Collaboration.  Many researchers were unaware of tools or services they can use and/or that their librarians/archivists have solutions.
There is a need for librarians/archivists to conduct more outreach to researchers and to talk with them about preservation solutions, good data management practices and copyright.

Monday, November 23, 2015

Introduction to Metadata Power Tools for the Curious Beginner

Introduction to Metadata Power Tools for the Curious Beginner. Maureen Callahan, Regine Heberlein, Dallas Pillen. SAA Archives 2015. August 20, 2015.   PowerPoint  Google Doc 
      "At some point in his or her career, EVERY archivist will have to clean up messy data, a task which can be difficult and tedious without the right set of tools." A few notes from the excellent slides and document:

Basic Principles of Working with Power Tools
  • Create a Sandbox Environment: have backups. It is ok to break things
  • Think Algorithmically: Break a big problem down into smaller steps
  • Choosing a Tool: The best tools, works for your problem and skill set
  • Document: Successes, failures, procedures
Dare to Make Mistakes
  • as long as you know how to recognize and undo them!
  • view mistakes as an opportunity
  • mistakes can teach you as much about your data as about your tool
  • share your mistakes so others may benefit
  • realize that everybody makes them
General Principles
  • Know the applicable standards
  • Know your data
  • Know what you want
  • Normalize your data before you start a big project
  • The problem is intellectual, not technical
  • Use the tools available to you
  • Don’t do what a machine can do for you
  • Think about one-off operations vs. tools you might re-use or re-purpose
  • Think about learning tools in terms of raising the level of staff skill
  • XPath
  • Regex
  • XQuery
  • XQuery Update
  • XSLT
  • batch
  • Linux command line
  • Python
  • AutoIt

The Provenance of Web Archives

The Provenance of Web Archives. Andy Jackson; Jason Webber. UK Web Archive blog. 20 November 2015.
     More researchers are taking an interest in web archives.  The post author says their archive has "tried to our best to capture as much of our own crawl context as we can." In addition to the WARC request and response records, they store other information that can answer how and why a particular resource has been archived:
  • links that the crawler found when it analysed each resource 
  • the full crawl log, which records DNS results and other situations
  • the crawler configuration, including seed lists, scope rules, exclusions etc.
  • the versions of the software we used  
  • rendered versions of original seeds and home pages  and associated metadata.
Th archive doesn't "document every aspect of our curatorial decisions, e.g. precisely why we choose to pursue permissions to crawl specific sites that are not in the UK domain. Capturing every mistake, decision or rationale simply isn’t possible, and realistically we’re only going to record information when the process of doing so can be largely or completely automated". In the future, there "will be practical ways of summarizing provenance information in order to describe the systematic biases within web archive collections, but it’s going to take a while to work out how to do this, particularly if we want this to be something we can compare across different web archives."

No archive is perfect. They "can only come to be understood through use, and we must open up to and engage with researchers in order to discover what provenance we need and how our crawls and curation can be improved. " There are problems need to be documented, but researchers "can’t expect the archives to already know what they need to know, or to know exactly how these factors will influence your research questions."

Saturday, November 21, 2015

How Much Of The Internet Does The Wayback Machine Really Archive?

How Much Of The Internet Does The Wayback Machine Really Archive? Kalev Leetaru. Forbes.  November 16, 2015.
     "The Internet Archive turns 20 years old next year, having archived nearly two decades and 23 petabytes of the evolution of the World Wide Web. Yet, surprisingly little is known about what exactly is in the Archive’s vaunted Wayback Machine." The article looks at how the Internet Archive archives sites and suggests "that far greater understanding of the Internet Archive’s Wayback Machine is required before it can be used for robust reliable scholarly research on the evolution of the web." It requires a more "systematic assessment of the collection’s holdings." Archive the open web uses enormous technical resources.

Maybe the important lesson to learn is that we have little understanding of what is actually in the data we use and few researchers really explore the questions about the data.  The archival landscape of the Wayback Machine was far more complex than original realized, and it is unclear how the Wayback Machine has been constructed. This insight is critical. "When archiving an infinite web with finite resources, countless decisions must be made as to which narrow slices of the web to preserve." The selection can be either random or prioritized by some element.  Each approach has distinct benefits and risks.

Libraries have formalized over time how they make collection decisions. Web archives must adopt similar processes.  The web is "disappearing before our very eyes" which can be seen in the fact that  up to 14% of all online news monitored by the GDELT Project is no longer accessible after two months".  We must "do a better job of archiving the online world and do it before this material is lost forever."

Friday, November 20, 2015

Hydra: Get a head on your repository

Hydra: Get a head on your repository.  Hydra Project website. November 2015.
  • Hydra is a Repository Solution:  Hydra is an open source software repository solution used by institutions worldwide to provide access to their digital content.  Hydra software provides a versatile and feature rich environment for end-users and repository administrators.
  • Hydra is a Community: Hydra is a large, multi-institutional collaboration that gives institutions the ability to combine their repository development efforts into a collective solution beyond the capacity of any individual institution to create, maintain or enhance on its own. The project motto is “if you want to go fast, go alone.  If you want to go far, go together.”
  • Hydra is a Technical Framework: Hydra is an ecosystem of components that lets institutions build and deploy robust and durable digital repositories supporting multiple “heads”, which are fully-featured digital asset management applications and tailored workflows.  Its principal platforms are the Fedora Commons repository software, Solr, Ruby on Rails and Blacklight.  Hydra does not yet support “out-of-the-box” deployments but the Community is working towards such “solution bundles”, particularly “Hydra in a Box” and Avalon.

Developing Best Practices in Digital Library Assessment: Year One Update

Developing Best Practices in Digital Library Assessment: Year One Update. Joyce Chapman, Jody DeRidder, Santi Thompson. D-Lib Magazine. November 2015.
     While research and cultural institutions have increased focus on online access to special collections in the past decade, methods for assessing digital libraries have yet to be standardized. Because of limited resources and increasing demands for online access, assessment has become increasingly important. Library staff do not know how to begin to assess the costs, impact, use, and usability of digital libraries. The Digital Library Federation Assessment Interest Group is working to develop best practices and guidelines in digital library assessment. The definition of a digital library used is "the collections of digitized or digitally born items that are stored, managed, serviced, and preserved by libraries or cultural heritage institutions, excluding the digital content purchased from publishers."

They are considering two basic questions:
  1.     What strategic information do we need to collect to make intelligent decisions?
  2.     How can we best collect, analyze, and share that information effectively?
There are no "standardized criteria for digital library evaluation. Several efforts that are devoted to developing digital library metrics have not produced, as yet, generalizable and accepted metrics, some of which may be used for evaluation. Thus, evaluators have chosen their own evaluation criteria as they went along. As a result, criteria for digital library evaluation fluctuate widely from effort to effort." Not much has changed in the last 10 years in the area in regards to digitized primary source materials and institutional repositories. "Development of best practices and guidelines requires a concerted engagement of the community to whom the outcome matters most: those who develop and support digital libraries". The article shares "what progress we have made to date, as well as to increase awareness of this issue and solicit participation in an evolving effort to develop viable solutions."

Thursday, November 19, 2015

Old formats, new challenges: preservation in the digital world

Old formats, new challenges: preservation in the digital world. Kevin Bunch. C & G News. November 13, 2015.
     Without proper preservation, digital materials are going to degrade and become useless. Digital preservation is "basically coming up with policies and procedures to address mostly the obsolescence that happens with digital content. We know file formats die, we know operating systems and platforms die at some point, so how do we sustain this digital content through time?” In addition to hardware and media failing, there are also difficulties in reading old formats. Archivists generally try to convert files from on old format to an “open” format that will hopefully be in use for some time into the future. Some people work at converting analog media, like audio and video recordings, to open digital formats. It can be challenging as older equipment is outdated and fails. Analog magnetic media formats like VHS and audio cassettes are also "at an ever-increasing risk of deterioration, especially those from the 1980s or 1990s, and should be digitized as soon as possible".

Wednesday, November 18, 2015

iPRES workshop report: Using Open-Source Tools to Fulfill Digital Preservation Requirements

iPRES workshop report: Using Open-Source Tools to Fulfill Digital Preservation Requirements.  Jenny Mitcham. Digital Archiving at the University of York. 12 November 2015.
     The ‘Using Open-Source Tools to Fulfill Digital Preservation Requirements’ workshop provided a place to talk about open-source software and share experiences about implementing open-source solutions. Archivematica, Archivespace, Islandora and BitCurator (and BitCurator Access) were also discussed.

Sam Meister of the Educopia Institute talked about a project proposal called OSSArcFlow. "This project will attempt to help institutions combine open source tools in order to meet their institutional needs. It will look at issues such as how systems can be combined and how integration and hand-offs (such as transfer of metadata) can be successfully established". The lessons learned (including workflow models, guidance and training) will be available to others besides the 11 partners. 

Digital Preservation Videos for the Classroom

Back to School: Digital Preservation Videos for the Classroom. Erin Engle. The Signal, Library of Congress. August 30, 2013.
     There have been some educational programs created geared toward students and about the K-12 Web Archiving Program.  There is a Digital Preservation Video Series and here is a list of videos that educators may find most relevant. Some of those videos include:

Tuesday, November 17, 2015

Born Digital: Guidance for Donors, Dealers, and Archival Repositories

Born Digital: Guidance for Donors, Dealers, and Archival Repositories. Gabriela Redwine, et al. Council on Library and Information Resources. October 2013. [PDF]
     "Until recently, digital media and files have been included in archival acquisitions largely as an afterthought." People may not have understood how to deal with digital materials, or staff may not be prepared to manage digital acquisitions. The object is to offer guidance to rare book and manuscript dealers, donors, repository staff, and other custodians to help ensure that digital materials are handled, documented appropriately, and arrive at repositories in good condition, and each section provides recommendations for donors, dealers, and repository staff..

The sections of the report cover:
  • Initial Collection Review
  • Privacy and Intellectual Property
  • Key Stages in Acquiring Digital Materials
  • Post-Acquisition Review by the Repository
  • Appendices, which include: 
    • Potential Staffing Activities for the Repository
    • Preparing for the Unexpected: Recommendations
    • Checklist of Recommendations for Donors and Dealers, and Repositories
Some thoughts and quotes from the report:
  • it is vital to convince all parties to be mindful of how they handle, document, ship, and receive digital media and files.
  • Early communication also helps repository staff take preliminary steps to ensure the archival and file integrity, as well as the usability of digital materials over time.
  • A repository’s assessment criteria may include technical characteristics, nature of the relationship between born-digital and paper materials within a collection, information about context and content, possible transfer options, and particular preservation challenges.
  • Understand if there is a possibility that the digital records include the intellectual property of people besides the creator or donor of the materials.
  • Clarify in writing what digital materials will be transferred by a donor to a repository
    (e.g., hard drives, disks, e-mail archives, websites)
  • It is strongly recommended that donors and dealers seek the
    guidance of archival repositories before any transfer takes place.
  • To avoid changing the content, formatting, and metadata associated with the files, repositories
    must establish clear protocols for the staff’s handling of these materials.
The good practices in this report can help reduce archival problems with digital materials. "Early
archival intervention in records and information management will help shape the impact on archives of user and donor idiosyncrasies around file management and data backup."

Monday, November 16, 2015

Fixity Architecting for Integrity

Fixity Architecting for Integrity. Scott Rife, Library of Congress, presentation. Designing Storage Architectures for Digital Collections 2015. September 2015. [PDF]
     The Problem: “This is an Archive. We can’t afford to lose anything!” They are custodians to the history of the United States and do not want to consider that the loss of data is likely to happen. The current solutions:
  • At least 2 copies of everything digital
  • Test and monitor for failures or errors
  • Refresh the damaged copy from the good copy
  • This process must be as automated as possible
  • Recognize that someday data loss will occur
Fixity is the process of verifying that a digital object has not been altered or corrupted. It is a function of the whole architecture of Archive/Long Term Storage (hardware, software, network, processes, people, budget)
What costs are reasonable to reduce the loss of data?
Need to understand the possible solutions.  How much more secure will our customers content be if:
  • There is a third, fourth or fifth copy?
  • All content is verified once a year versus every 5 years?
  • More money is spent on higher quality storage?
  • More staff are hired
RAID, erasure encoding, is at risk due to larger disk sizes. With storage, there is a wide variation in price, performance and reliability. Performance and reliability are not always correlated with price. Choose hardware combinations to limit likely failures based on your duty cycle

Background reading list for Designing Storage Architectures for Digital Collections

Background reading list. Designing Storage Architectures for Digital Collections. Library of Congress. September 9, 2015.
     A list of items that may be representative of materials and projects related to the meeting topics. They might be useful to provide context for the meeting topics:

Friday, November 13, 2015

Alternatives for Long-Term Storage Of Digital Information

Alternatives for Long-Term Storage Of  Digital Information. Chris Erickson, Barry Lunt. iPres 2015. November 2015.   Poster  Abstract
     This is the poster and abstract that Dr. Lunt and I created and was presented at iPres 2015. The most fundamental component of digital preservation is storing the digital objects in archival repositories. Preservation Repositories must archive digital objects and associated metadata on an affordable and reliable type of digital storage. There are many storage options available; each institution should evaluate the available storage options in order to determine which options are best for their particular needs. This poster examines three criteria in order to help preservationists determine the best storage option for their institution:
  1. Cost
  2. Longevity
  3. Migration Time frame
Each institution may have different storage policies and environments. Not every situation will be the same. By considering the criteria above (the storage costs, the average lifespan of the media and the migration time frame), institutions can make a more informed choice about their archival digital storage environment. The poster has more recent cost information than what is in the abstract.

Thursday, November 12, 2015

Digital Curation Decision Form

Digital Curation Decision Form. Chris Erickson. Harold B. Lee Library. November 13, 2015. Updated.
     This is the latest version of our Digital Curation Decision Form. The form is used by subject specialists (curators, subject librarians, or faculty responsible for collections) to determine
  • what materials should be included in our Rosetta Digital Archive; 
  • whether additional copies are needed, including copies on M-Discs; and 
  • whether or not the digital collection is a preservation priority. 
Additional questions ask about access to the preservation copies; the preservation actions needed; and directions on content options if format migration is needed. The form was created to help subject specialists determine what should be preserved, even if they are unaware of digital preservation topics. In practice, we complete the form during an interview with new subject specialists. Documentation will be added when the final version is approved.

Monday, November 09, 2015

Web Archiving Questions for the Smithsonian Institution Archives

Five Questions for the Smithsonian Institution Archives’ Lynda Schmitz Fuhrig. Erin Engle. The Signal. October 6, 2015.   
     Article about the Smithsonian's Archives and what they are doing. Looks at the Smithsonian Institution archives its own sites and the process. Many of the sites contain significant content of historical and research value that is now not found elsewhere. These are considered records of the Institution that evolve over time and they consider that it would irresponsible as an archives to only rely upon other organizations to archive the websites. They use Archive-It to capture most of these sites and they retain copies of the files in their collections. Other tools are used to capture specific tweets or hashtags or sites that are a little more challenging due to the site construction and the dynamic nature of social media content.

Public-facing websites are usually captured every 12 to 18 months, though it may happen more frequently if a redesign is happening, in which case the archiving will happen before and after the update. An archivist appraises the content on the social media sites to determine if it has been replicated and captured elsewhere.

The network servers at the Smithsonian are backed up, but that is the not the same as archiving. Web crawls provide a snapshot in time of the look and feel of a website. "Backups serve the purpose of having duplicate files to rely upon due to disaster or failure" and are only saved for a certain time period. The website archiving we do is kept permanently. Typically, website captures may not going to have everything because of excluded content, blocked content, or dynamic content such as Flash elements or calendars that are generated by databases. Capturing the web is not perfect.

Monday, November 02, 2015

Emulation as a Tool. What Can Emulation Do for You?

Emulation as a Tool. What Can Emulation Do for You? Dr. Klaus Rechert. CurateGear 2015. January 7, 2015.
     Emulation can be used as a tool for:
  • Contextualization, To identify, describe and preserve object environments
  • Generalization. To allow the environment to be run everywhere
  • Preservation Planning. Prepare environments to run long term
  • Publication & Access. Provide citation of objects in context; allow reuse
Emulation as a Service (EaaS)
  • Encapsulation of different emulators and technology to common component 
  • Centralize technical services
  • Hide technical complexity of emulation through web interfaces
  • Browser-based access
Preservation of and access to inherited personal digital assets
  • Provides citation support
  • Available with simple browser-based access 
  • Make emulated content embeddable and shareable like Youtube videos 

The Shanghai Library Selects Ex Libris Rosetta

The Shanghai Library Selects Ex Libris Rosetta. Press release. Ex Libris. November 2, 2015.
     The Shanghai Library, the second largest library in China and one of the world’s largest public libraries, chose Rosetta to manage and preserve its vast collection of digitized records such as ancient books, sound recordings, manuscripts, genealogy resources, archives (such as the Sheng Xuanhuai Archives, books and journals published in the Republic period, and the North China Daily News). Rosetta’s support for multiple languages and its customized Chinese interface will enable library staff to deposit diverse content into the system and expose a wide range of rich Chinese heritage to the world. "Rosetta was the only solution on the market that supports the whole spectrum of digital asset management and preservation, from ingest and export, to collection management and publishing.”

Research data management: A case study

Research data management:  A case study. Gary Brewerton. Ariadne, 74. October 12, 2015.
Loughborough University faced a number of challenges in meeting the expectations of its research funders, especially in three areas:
  • publishing the metadata describing the research data that it holds
  • where appropriate providing access to the research data
  • preserving the research data for at least ten years since last accessed
They did a survey of their research groups to determine existing data management practices and storage requirements. The data could take a variety of formats and vary dramatically in size. Also, not all the data collected by the researchers would need to be preserved. This made it hard to predict the amount of storage needed. Instead of using the existing institutional repository, at possible archiving and discovery solutions and decided on two:
  • Arkivum: a digital archiving service guaranteeing long-term preservation of data deposited
  • figshare: a cloud-based research sharing repository
Each of these answered a different need: "Arkivum could provide the storage and preservation required, whilst figshare addressed the light-touch deposit process and discoverability of the research data." Both suppliers were asked to work together to develop a platform to meet all the University’s needs, and a two tier implementation occurred, and faculty reaction to the platform has been very positive to the interface and the deposit workflow.  It "remains to be seen how researchers will engage with the platform in the mid- to long- term, but it is clear that advocacy will need to remain an ongoing process if the platform is going to achieve continued success."

Saturday, October 31, 2015

This is how we wash ... discs!

Så gør vi sådan, når vi vasker… plader! October, 2015.
     The Danish State Library has started to wash a large part of their old records so they can be digitized. Some can be cleaned by just wiping off the dust with a dry brush but other need a turn in the discwasher. This is a machine that looks like a turntable but instead of a needle has a small vacuum system that sucks water from the record. Another article on this, Disc washing at the library, has more information with images of the process of digitizing the 78 rpm record discs. There are about 37,000 Danish shellac discs (78 rpm records) and the audio engineer digitizes about 10 to 12 discs a day.  The process is to register the items to be digitized, check the condition of the disc, after which they are washed and cleaned. About three or four discs can be washed per hour. In the digitization room is the recording machine and an audio engineer who listens to the record and based on what he hears, chooses one of four different pickup needles for the final digitization. 

Friday, October 30, 2015

Vital information could be lost in 'digital dark age' warns professor

Vital information could be lost in 'digital dark age' warns professor. Sarah Knapton. The Telegraph. 11 Oct 2015.
     Professor David Garner, former president of the Royal Society for Chemistry, said that the world faces an information 'dark age' because so much information is stored digitally, and that "wherever possible, scientific data should be printed out and kept in paper archives to avoid crucial research being lost to future generations." Other quotes from the article are:
  • “Digital storage is great and has put knowledge in an instantly accessible form, but things really need to be backed up in paper formats as well. In my own lifetime I have experienced not being able to access information any longer because the formats are now out of date. I am not a luddite, and I think the internet is fantastic. But while it’s great to have a Plan A, we really need to have a Plan B. It’s really important that we have accessible paper archives. We risk a lot of information being lost without adequate paper copies."
  • Digital materials are especially vulnerable to loss and destruction because they are stored on fragile magnetic and optical media which can deteriorate and can easily be damaged by exposure to heat, humidity, and short circuits.
  • While a book can be left on a shelf for hundreds of years with little damage, information can suffer ‘bit rot’ where it can no longer be accessed. And opening each file manually to save it in a readable form would never be possible.
  • "Long term accessibility of data was not really taken into account in the 1980s and 1990s in the way it is now and I am delighted that there are a number of initiatives underway for the long term preservation of digital data," added Prof Garner.

Wednesday, October 28, 2015

DPOE Plants Seed for Statewide Digital Preservation Effort in California

DPOE Plants Seed for Statewide Digital Preservation Effort in California. Barrie Howard. The Signal. October 9, 2015.
     The Library of Congress partnered with the State Library of California to host a three-and-a-half day workshop to increase the knowledge and skills of those providing long-term access to digital content. The California Preservation Program provides consultation, information resources, and preservation services to archives, historical societies, libraries, and museums across the state. They want to help librarians, archivists, and museum curators educate others and advocate for statewide digital preservation services. The state's smaller memory institutions need help with digital preservation. The workshop helped participants think about how to work across jurisdictional and organizational boundaries to meet the needs of all state cultural heritage institutions, especially small organizations with very few staff.

Saturday, October 24, 2015

The National Film Board’s CTO offers a close-up look at its digital archiving project

The National Film Board’s CTO offers a close-up look at its digital archiving project. Shane Schick. IT World Canada. October 16, 2015.
     The Canadian National Film Board has been putting together the technology, processes and policies to change the way films are produced, collected and stored. The NFB collection needs a particular set of metadata because of the versions produced.

Archiving digital content is an ongoing challenge for many organizations because the volume of content and also "the fact that formats change, and ensuring the long-term accessibility and quality can be uncertain". The organization tries to stay ahead of the difficulties by adhering to to four ‘golden rules’ of archiving. These include:
  1. There must be a process to continually check the integrity of the data which has been stored.
  2. Open file formats should be used whenever possible, in order to avoid frequent data migrations.
  3. Obsolescence of the storage hardware should be assumed as inevitable.
  4. Two copies of all content or media assets should be maintained on different technologies, in different locations, which is the "most critical" part.
While LTO tapes are often used in the industry, the organization uses ASG’s Digital Archive (based on Sony’s Optical Disk Array). These discs have a 50 year life expectancy. They still use LTO for backup, but now they have the optical element that they can go back to. “The archiving system allowed us to think beyond the film.”  The new way of thinking is very open. "We can ingest content as we produce it."  

Friday, October 23, 2015

Metadata for your Digital Collections

Metadata for your Digital Collections. Jenn Riley. Indiana Cooperative Library Services Authority. March 6, 2007.
    A slideshow about metadata that I came across while preparing a presentation. A summary:
There are many definitions of metadata; generally it can be defined as structured information about an information resource. The presentation looks at the uses, structure and types of metadata:
  • Descriptive metadata
  • Technical metadata
  • Preservation metadata
  • Rights metadata
  • Structural metadata
Each of the various metadata types have their structures, values, benefits, and limitations, including:
  • Dublin core, inability to "provide robust record relationships".                          
  • Qualified Dublin Core
  • MARC
  • MARCXML, "the exact structure of MARC21 in an XML syntax"
  • MODS, "'MARC-like' but intended to be simpler"
  • Others include Visual Resources Association Core, TEI, EAD, FRBR,
The standards are important now because it will help in migrating to other systems later and the collections will be more inter-operable.  Good digital collections are:
  • Inter-operable, shareable and searchable
  • Persistent
  • Re-usable for multiple purposes
It notes that "good metadata promotes good digital collections". To share the metadata it needs to be prepared to map across other formats and systems. A map or 'crosswalk' can be created to do this. It is "good practice to create and store most robust metadata format possible." You need to find the right balance for your metadata. Good shareable metadata should involve:
  • content
  • consistency
  • coherence
  • context
  • communication
  • conformance
That is what the standards help to do.

Thursday, October 22, 2015

Preparing for format migration

Preparing for format migration. Chris Erickson. Presentation to the Utah State Archives fall conference. October 22, 2015. [PDF presentation]
     The presentation begins with terms and definitions of digital preservation, obsolescence, fixity, migration, refreshing, and formats. Formats include hardware, software, media, and systems. The purpose of migration is:
  1. Avoid media failure
  2. Avoid obsolescence
  3. Benefit from new technologies
The goal of migration is to change the object to deal with software and hardware developments but not affect the original representation. There are some cautions (cited):
  • “Data migration success rates are never 100%”
  • Successive storage/migration cycles accumulate failures, data corruption and loss.
  • Even if data migration is flawless, repeated migrations will take its toll on the data “the nearly universal experience has been that migration is labor-intensive, time-consuming, expensive, error-prone, and fraught with the danger of losing or corrupting information.”
The presentation provides an overview of creating a migration plan, advance preparations and follow up actions. Some of the issues are from my personal data migrations, as well as corporate examples. In the end, it is important to clearly understand what you have and what you need to do, then to start, even if it is a small step.

Wednesday, October 21, 2015

DPC invites members to review the OAIS Standard

DPC invites members to review the OAIS Standard. William Kilbride, et al. Digital Preservation Coalition,  Open Preservation Foundation. October 21, 2015.
"DPC is delighted to welcome members to participate in the review of OAIS, work that will hold our interest for a couple of years and which we aim to build into a platform for collaboration among our diverse members in the future.

The OAIS standard published by both the Consultative Committee for Space Data Systems (CCSDS) and as ISO14721 has been highly influential in the development of digital preservation. As a reference model it provides a common basis for aligning disparate practice in diverse institutional settings. A range of standards have emerged around and related to OAIS including PREMIS (for preservation metadata), ISO16363 (for certification) and PAIMAS (for exchange between Producers and Archives).

Since OAIS was initially proposed the digital preservation community has grown tremendously in absolute numbers and in diversity. OAIS adoption has expanded far beyond the space data community to include cultural heritage, research data centers, commerce, industry and government.

The digital preservation community has – we have! – a responsibility to keep our standards relevant. The upcoming ISO review of the OAIS standard in 2017 offers a chance for a cooperative, transparent review process. It also creates an opportunity for further community building around OAIS and related initiatives.

"The outcome from this activity is not simply a wiki nor is it a set of recommendations. By providing a shared open platform for the community that gathers around the OAIS we aim to ensure on-going dialogue about our standards and their implementation in the future.
In this sense the 2017 review is a milestone on the way to an engaged and empowered community rather than a destination.
  • OAIS Community forum via a wiki: Your feedback and the discussions on this wiki will provide raw material for an editorial committee of the most active participants to formulate recommendations which will result in a formal submission to the 2017 review. So sign in and add your views!
  • Exploring official mechanisms: Official mechanisms for the review of ISO standards are well established via National Standard Bodies and these will be explored and used to give input for the review.
  • Active Interaction: Ensuring inclusion for this large, diverse community will mean collaborative virtual meetings are necessary but we all recognize the value of meeting face to face and will seek to enable this.
Join the community and contribute your views on the wiki here:

University of Alabama at Birmingham Selects Ex Libris Rosetta for Digital Preservation

The University of Alabama at Birmingham Opts for Ex Libris Alma, Primo, and Rosetta Solutions. News Release. Ex Libris. October 21, 2015.
     The University of Alabama at Birmingham (UAB) has selected the Ex Libris products, including the Rosetta digital asset management and preservation solution. "Rosetta’s end-to-end digital asset infrastructure will preserve digital resources at both libraries and keep such resources accessible for future generations.... We are acquiring Rosetta to support preservation for UAB’s digital assets, ranging from institutional memory to research data."

Tuesday, October 20, 2015

Faculty receive digital preservation grant for statewide project

Faculty receive digital preservation grant for statewide project. Press Release. Indiana State University. October 15, 2015.
     Faculty members in the College of Education received am IMLS Library Services Technology Act grant to partner with the Indiana State Library to establish the Indiana Memory Digital Preservation Collaborative. The collaborative is a statewide initiative to provide an affordable and sustainable digital preservation solution for small to mid-sized cultural heritage organizations that lack the necessary resources to manage the digital files in their collections.The collaborative will join the   MetaArchive Cooperative. The grant funding will be used for education, hardware and data preparation.

Monday, October 19, 2015

Published Preservation Policies

Published Preservation Policies. Carl Wilson, Barbara Sierman. Scape. Aug 11, 2015.
    The SCAPE project gathered a number of policies "concerning the creation of the Policy Framework". Other sources, such as a report in the Signal, Analysis of Current Digital Preservation Policies: Archives, Libraries and Museums, by Madeline Sheldon, were helpful when creating this overview of published preservation policies. The policies are divided in four categories:
  •     Libraries
  •     Archives
  •     Data Centers
  •     Miscellaneous
These policies are not all "preservation policies" and may be published under different headings.

Google Drivageddon and Docsapocalypse are here: Why I’m typing this in Microsoft Word

Google Drivageddon and Docsapocalypse are here: Why I’m typing this in Microsoft Word. John Brandon. Computerworld.
     Article about a writer unable to access documents on the Internet when there was an outage on Google. These outages may seem minor, but they are not if you cannot access the content you need.  The author also had "some of my documents in a backup saved to a hard drive, which is a good thing. What’s not a good thing? Having a total lack of control over the situation. None." But this outage is a wake-up call to save every document, not just a few, to another place. "It’s time to not put every egg in the Google basket."
[A good example of why we need multiple managed copies. -cle]

Saturday, October 17, 2015

Flooding Threatens The Times’s Picture Archive

Flooding Threatens The Times’s Picture Archive. David W. Dunlap. New York Times. October 12, 2015.
     A broken pipe sent water cascading into the storage area where The Times keeps its collection of historical photos, newspaper clippings, microfilm records, books and other archival material. About 90 percent of the affected photos would be salvageable, but how many were lost remains unknown. The card catalog was not damaged; otherwise it would be impossible to locate materials in the archive. "What makes the card catalog irreplaceable is that it has never been digitized. Hundreds of thousands of people and subjects are keyed by index numbers to the photo files, which contain an estimated six million prints and contact sheets." This "raised the question of how in the digital age... can some of the company’s most precious physical assets and intellectual property be safely and reasonably stored?"

Tuesday, October 13, 2015

Presentations from Library of Congress Storage Architectures Symposium 2015

Presentations from Library of Congress Storage Architectures Symposium 2015. Clifford Lynch. CNI. October 12, 2015. [PDF files]
     The presentations from the Library of Congress 2015 Symposium on Storage Architectures for Digital Collections are now available. The presentations during the symposium include:
  • Technology Overview of Library of Congress Storage Architectures and also Industry
  • Technical Presentations: Tape Futures, Object Storage, Fixity and Integrity
  • Community Presentations
  • Alternative Media Presentations: Digital Optical, DNA
  • Look Back/Future Predictions of Storage

Monday, October 12, 2015

Digital Curation as Journalism

Digital Curation as Journalism. Online Journalism 2. September 28, 2015.
     An interesting perspective:  
Curate – To gather, source, verify and redistribute information or social media elements to track an event. If done well, it can make sense of chaos and create a narrative of an event.  “I think curation has always been part of journalism; we just didn’t call it that.” – Andy Carvin.

Social Media Usage: 2005-2015

Social Media Usage: 2005-2015. Andrew Perrin. Pew Research Center. October 8, 2015
     Results of report on social network usage statistics. "Nearly two-thirds of American adults (65%) use social networking sites, up from 7% when Pew Research Center began systematically tracking social media usage in 2005".  The figures reported here are for social media usage among all adults, not just among those Americans who are internet users.
  • Age differences: 
    • 90% of young adults use social media
    • 35% of all those 65 and older report using social media 
  • Gender differences: 
    • 68% of all women use social media
    • 62% of all men use social media
  • Socio-economic differences: 
    • Those in higher-income households were more likely to use social media. 
    • Over 56% of the lowest-income households now use social media. 
    • Those with college experience are more likely to use social media than those with high school degree or less
  • Racial and ethnic similarities: There are no notable differences by racial or ethnic group: 
    • 65% of whites, 65% of Hispanics and 56% of African-Americans use social media today.
  • Community differences: 
  • Today, 58% of rural residents, 68% of suburban residents, and 64% of urban residents use social media.

Saturday, October 10, 2015

Software benchmark initiatives

Software benchmarks in digital preservation: Do we need them? Can we have them? How do we get them? Kresimir Duretec. Open Preservation Foundation Blog. 9th Oct 2015.
     Blog post that addresses the need for improving software evaluations in digital preservation. "A significant part of the work in digital preservation field is dependent on various software tools." Achievements have been made in various areas of digital preservation but it is quite hard to quantify how successful this has been. The lack of demonstrated evidence an important research challenge to be addressed. The BenchmarkDP project explores improving software evaluations in the digital preservation field with software benchmarks, as discussed in their paper. Two initiatives have been started:
  1. A Benchmarking forum at this year’s IPRES conference to discuss possible scenarios which are in need of proper benchmarks. 
  2. A short consultation to gather more information around current practices in software evaluation.
These initiatives should be a good starting point for a wider community involvement and better understanding of software evaluation needs in the digital preservation field.

Friday, October 09, 2015

Questions to ask when you learn of digitization projects

Questions to ask when you learn of digitization projects. Sarah Werner. Wynken de Worde. 6 October 2015.
    With new digitization projects that we hear about it may be helpful to ask some questions:
  1. Who financially benefits from such agreements? Sometimes researchers forget that the primary commercial digitization projects "isn’t to enable access to cultural heritage materials but to make money. And cultural heritage institutions have not always prioritized open access to their collections over monetizing them, either."
  2. Who is going to have access to the resulting images? In commercial projects the results "are typically limited to institutions who can pay to subscribe to the commercial database". 
  3. Who is not going to have access to the images? It is important to realize who will be excluded from such projects.  
  4. What will you be able to do with the resulting images? "Most commercial databases retain copyright over their digitized products and do not license them beyond personal use".
  5. How will this impact the ability of researchers to access the original documents? "If you are a holding institution that will be restricting access to your newly digitized collection, will you help fund scholars to come use your database if their institution doesn’t subscribe to it?"
Without knowing these kinds of details we won't know if these enterprises are "good or bad things". The projects can be expensive, and balancing the access and the cost can be complicated. "But researchers and librarians should ask themselves this list of questions before cheerleading announcements." How will we support institutions in order to "create high quality digitizations without selling our cultural heritage to the highest bidder?"

Benchmarks for Digital Preservation tools

Benchmarks for Digital Preservation tools. Kresimir Duretec, et al. Vienna University of Technology and University of Toronto. October 2015.
     "Creation and improvement of tools for digital preservation is a di cult task without an established way to assess any progress in their quality." Software benchmarking is used to "provide objective evidence about the quality of software tools" but the digital preservation field is "still missing a proper adoption of that method." This paper looks at benchmarks and proposes a model for digital preservation and a standardized way to "objectively compare various software tools relevant to the digital preservation community."

Thursday, October 08, 2015

Why we should let our digital data decay

Why we should let our digital data decay. Jamie Carter. South China Morning Post. Oct 8, 2015.
     An article that states we have all become "digital hoarders" and that letting "data expire and self-delete might be the best way to clear the clutter". Some quotes from the article or quotes of quotes to consider:
  • storage is easy to come by. So cheap has it become, in fact, that none of us are deleting anything any more. The cloud has become a commodity that's often given away free 
  • Online storage has become "dumping grounds for files to sort later."
  • "Digital minimalism has only increased the rate at which we remove physical, analogue items in favour of their digital counterparts - why have an entire library of books when you can have more books than you will probably ever read in your life on a Kindle?"
  • what's the point in having more than a few dozen ebooks? Probably the most liberating thing a Kindle owner can do is to delete any book unread for more than a year.
  • "Currently, 'forgetting' data by deliberately deleting it routinely requires more effort than having it preserved"  
  • "This increases the 'cost' of digital forgetting, and thus tilts the default towards preservation. As a consequence, digital minimalists need to spend significant time and effort to get rid of data."
  • Living as a digital minimalist is almost impossible; the constant decision making and pruning of files is time-consuming. With the cost of storage so low and falling all the time, routinely deleting data doesn't save you money.
  • Allowing data to expire and self-delete might be the most effective way to prevent our digital detritus from owning us. It might seem an esoteric debate, but there is a clear demand for apps and services with short memories.
  • as we are requested to set expiration dates we are reminded that most data is not relevant and valuable forever." In short, we'll take fewer photos, upload less, and cherry-pick only the most precious to preserve "forever".  
  • Of course, there are downsides to replacing digital durability with digital decay in the way the internet works. 
  • While some compliance rules require data retention - something that encourages companies to retain everything they do, digitally, forever - there are also data protection laws in many parts of the world to ensure that data that is no longer needed, relevant or accurate is deleted.
  • "Permanence is something we bestow on digital data, it is not a genuine quality of digital data"