Reporting live from a Digital Preservation Institute hosted by POWRR (Preserving Digital Objects With Restricted Resources) to share my second day of learning. I'm here in Honolulu, at the University of Hawai'i, Mānoa, as a representative for T.A.P.E., generously funded by the National Institute for the Humanities.
Writing a Workflow
A digital preservation workflow is an iterative process made up of key processes: ingest, processing, access, storage, and maintenance. The key is making those processes part of your ecosystem and work for you! Since T.A.P.E. primarily works with digitized items, rather than "born-digital" items without an analog counterpart, we will adapt the workflows to best suit us. However, we are so excited to begin expanding support for digital home movies - not only in their preservation, but in the valuation of these items in the home movie landscape.
One of the most valuable takeaways was the idea that tools quickly become orphans, abandoned by creators, unsupported by new operating systems, or lose funding. The workflow and goals should come first, and the suite of tools should come second, with an eye toward interoperability. Increasing resilience and confidence, this model puts our goals first.
We also discussed the value of different kinds of tools. Free is never really quite free, as it requires time, attention, training, and knowledge to operate and maintain open-source free tools. The analogy used was a kitten may be free, but it's not the same type of free like a beer from a friend that you can drink quickly.
The kitten model extended further into the types of tools. A "barn cat" tool is one that requires little maintenance but usually completes one task (such as hunting mice). For example, a tool that only identifies duplicate files might be a barn cat. A "high maintenance" cat requires more attention but might do more complex tasks. A tool like a bit curator, which requires a Ubuntu-supported computer, does more complex tasks in an imaging environment but requires more infrastructure. Knowing how much you need to invest in a tool is useful for assembling a suite of tools that best support your capacity.
Digital Preservation Tool Grid
One of the most exciting parts of my training was getting to play around with digital preservation tools. Digital POWRR has created a tool grid, which outlines the value of different tools and access points to using them. This is critical for building a tool kit that assists in a workflow and getting started on using them.
Data Accessioner
One of the most exciting tools we used was a Digital POWRR maintained tool called Data Accessioner. In the archive world, to "accession" something is to bring in formally into your archive, but putting it in relation to the rest of your collections and intitiating processing, which can also be understood as ingest.
You enter this tool using a ./start.sh command in terminal and then a graphical user inferface (GUI) opens! The tool allows you to ingest digital media (photographs, text files, videos) stored on digital storage devices (optical disks, hard drives, etc.) using fixity checks and inserting additional Dublin Core Metadata. I.E. you're moving them off the storage device to a new digital environment and running checks to make sure the transfer was faithful.
|
okay yes I changed my terminal color scheme to pretend I'm an early 2000s hacker, but I promise it's so easy to use!
|
Side Note: Optical Media is a category of physical storage device for digital information that uses laser light to encode and read digital information. This includes CDs, DVDs, Blu-Ray, and spinning hard drives. The way it works is that a laser beam burns onto the disk in a rapid pattern, with pits (burns) and lands (spaces) representing 0s and 1s. Depending on the type of media, some optical media (-R and -RW writable media especially) is made with organic dyes that degrade quickly. As the dyes degrade, the data becomes corrupted because it cannot be read properly. While manufacturers promised long lifespans, these discs are degrading in much shorter periods, (sometimes only lasting a few years).
|
A microscopic view of three forms of optical media. Note how the more complex the data, the more pits we see encoded on the disc. These are all burned in and read by a microscopic laser. |
A laser disc is an analog optical disc, so works a little differently as a direct representation of light and sound frequency like a vinyl (that's the analog part) rather than 0s and 1s). Laser discs are subject to something called laser rot, whereby the reflective disc surface begins to degrade or detach from the disc itself, making these an unstable format for the long-run.
Data Accessioner uses Dublin core for creating ingest metadata. Dublin Core is a family of standards used to describe objects using controlled vocabulary and standards. Highly flexible, interoperable, and easy to read, this metadata is designed to inject vital information at the site of ingest before it is lost.
During ingest, Data Accessioner assigns the file an MD5 checksum. As I discussed in my earlier blog post, a checksum is a string of numbers and letters that serves as a tool to make sure our digital file has not changed (over time or through actions like moving it).
(I also learned an exciting python script to force a program to recognize a command, instead of reading something as a directory - nerd shit).
Side Note: "Command-Line" operations on your Terminal is an interface for talking directly to your computer and telling it what to od. But it can do a lot of the same times as software because it's fundamentally doing the same processing. But sometimes faster or perform functions that haven't yet been built into user-friendly software.
Da-Mt
A great add on too for Data Accessioner is Da-Mt (pronounced damn-it). All that metadata and inventory you create in Data Acessioner is exported as an XML file, which is machine readable and durable, but not quite as human readable (mostly because of spacing and use of computing language).
|
XML File with Technical and Descriptive Metadata created during Ingest |
Da-Mt transforms XML raw information into human readable CSV and HTML files, which can be easily read and utilized in google sheets or excel. It's useful for transforming any XML information and certainly makes the ingest and inventory process easier.
|
DA-MT in action! So easy! |
Dupe Guru
A new essential for me is Dupe Guru, a tool that identifies duplicate files in a file directory and helps you decide what to delete. Not only does it search duplicate file names and file size, but also performs a fuzzy search. A fuzzy search is one that looks for approximate patterns, rather than exact replicas, with a column showing the percentage match so you can manually determine if you indeed have a duplicate.
A game changer for storage management and inventory!
DANNNG!
DANNNG! is a working group used for evaluating and resource building for ingest, packaging, and transfer, and imaging of digital files, designed for the cultural heritage sector.
One great tool is decision tree for deciding whether to image or simply transfer the files on disc. I've always struggled with the decisions involved in the two. Disc Imaging is a process whereby you copy all the bits on a storage device. A disc image is only machine readable, not human readable, thus requiring a decoding software to interpret and represent the raw data. Migration is less intensive, and just inovlves moving the files contained on the storage device. While an older school of thought, which emerged - particularly from policing - forwarded disc imaging because of a desire to establish provenance and recover deleted files, disc imaging is often overkill for optical media. For optical discs (CDs and DVDs) in particular, the need to capture the raw data may excessive, because you wouldn't be looking for deleted files, taking up valuable storage space, labor time, and computing power. Additionally, the leading free disc imaging software, FTK Imager, emerges from policing, and for T.A.P.E., collusion with policing technology does not align with our values.
Although it may be more complicated to run, I'm interested in exploring BitCurator as an alternative for imaging hard drives as a future T.A.P.E. service to recover data!
I am particularly excited to explore more of the tool decision factor documentation. In building digital preservation and workflow, we also aim to support other individuals, communities, and organizations in developing tools that assist in the preservation of their digital heritage! Undersanding many tools will help in adapting to many environments and needs!
Packaging, such as the DART tool, would be great for wrapping up files for storage. Packaging relies on "bag it" tools, which provide an envelope for digital files, useful for long-term storage. Like a ZIP file, a bagit package helps keep related files together (like a Preservation Copy, Mezzanine File, Access Copy, metadata XLM, checksum) in one folder so it's ready to get shipped off and stored. DART (Digital Archivist's Resource Tool) is an open-source GUI (gooey or Graphical User Interface) and command-line tool for doing exactly that! Designed by and for archivists!
Overall, day 2 of the POWRR Institute was an exciting and nerdy time for me. I love learning about new tools and honing my command-line skills. Tomorrow is the final day! :(
T.A.P.E. is a 501(c) 3 non-profit dedicated to facilitating access to analog media making, preservation, and exhibition. To support our work and access great benefits, join our patreon at just $5/month. You'll get access to exclusive rates for our rental equipment library, access to our digital and physical videotape library, and other member benefits like free workshops.
We've launched a $6,000 goal for GoFundMe to buy essential digitization equipment to provide more archival transfer services for more tape formats. A donation will advance the work of people-oriented digitization services!
info@tapeanalog.org
Blog is written by Jackie Forsyte, T.A.P.E.'s Technical Director, and an audio-visual archivist.
No comments:
Post a Comment