T.A.P.E.'s Technical Director Jackie Forsyte got the opportunity to travel to Dublin on scholarship to attend π ½π Ύ ππ Έπ Όπ ΄ ππ Ύ ππ °π Έπ - a free conference on open standards, open source tools, and digital audiovisual preservation. I was funded by Pratt Institute's DPOE-N Professional Development Stipend, who generously granted me funds to attend a digital preservation conference of my choice, special thanks to Kirk Mudle of that organization!
No Time to Wait is in its 9th year, focusing on showcasing and developing open source digital preservation standards and tools.
π²π½πΆπ'π ππ ππ πππππΈπ? Open-source is a philosophy for making, using, and documenting tools and workflows that are public-facing, available for the public to at minimum inspect, but also often reuse, deconstruct, and remix. This philosophy is in start contrast to many computing universes which "black-box" their software, making it impossible to understand how it is made and thus impossible to use outside of a licensed, supported, and often paid for environments controlled by a corporation. Open source is hand-in-hand with digital preservation as we are often fighting for autonomy over the stability of our files, the ability to emulate digital environments, and need as preservationists to get under the hood of the code making up our environment. You probably have a story about locked, lost, corrupted, or proprietary files, which can be destabilizing and painful.
No Time to Wait intervenes in these increasingly proprietary and closed box digital environments, advocacating for policy action, tools, and workflows rooted in such a philosophy.
I've been highly interested in open source tools for personal digital preservation and the courses I teach with TAPE. Seeing the enthusiastic response to courses like Get Off Of iCloud, Preserving Texts & Voicemails, Intro to Digital Preservation, Hard Drive Office Hours .... affirm that there is a desperate need for autonomy over our personal media. A real frustration is bubbling, that demands a collective channeling methods, tools, and learning to facilitate greater control over our personal digital universes. Questions like:
- How do I get my files out of closed sandboxes like Apple's iCloud?
- What file formats am I even dealing with on my hard drives and mobile devices? Will they last?
- How do I care for the hard drives and storage devices I've accumulated?
- And most frequently, how do I manage the mountain of personal files that feels like it's teetering over?
TAPE has taught Digital Preservation workshops for the past year at Whammy! Analog Media and increasingly at Los Angeles Public Library's Central Branch, teaching the public how to use their command line to do things like mass download text messages, batch rename files, or generate checksums to fingerprint files for long term storage. A lot of the first workshops came out of a conference I attended in October 2024 called Digital POWRR (Preserving digital Objects With Restricted Resources). You can read those blogs here!
Sharing knowledge is at the core of my personal ethos and the Conference, so I've outlined my notes and some exciting takeaways that you should expect to see in upcoming workshops!
π ΅π Έππ ΄ πππ °ππ Έπ Ύπ ½ π °πππ Έπππ ππππ ³π Έπ Ύπ
A 33-year old live-in artist residence and teaching space inside a 19th century Fire Station, the space is a hub for arts-making outside of the university system, serving as a workshop, studio, and exhibition space. Helena was kind enough to show me around the space and talk about efforts to build a motion picture processing darkroom, which will become the only space in Ireland for artists to process their own films. The lack of national labs - professional or amateur - has created a real gap in what artists can do at home, often requiring travel to other EU countries. It was so exciting to talk with Helena, who has a really infectious joy for filmmaking and the fostering of an experimental culture locally. We got really excited talking about the preservation of the Fire Station's own videotape archive, and instantly clicked over sharing resources to set up an in-house digitization station. Ireland has a rich history of alternative filmmaking and videomaking, and I was excited to learn about a number of collectives who were making work like the Derry Film & Video Workshop.
Experimental filmmakers have one of the most incredible networks of open source knowledge. They are a real model for preservationists to think about knowledge sharing and resource sharing. And most importantly, about creating work and resources that are outward facing, assisting the broader public in the creation of their own work!
I also took the train out to Howth and visited the Ye Olde Hurdy Gurdy Museum of Vintage Radio which houses unique radio equipment and ephemera inside of a Napoleonic Wars defense tower on a cliff! My favorite item was a radio hidden inside a portrait of Rita Hayworth created by the French underground during Nazi occupation.
Conference Day 1 python, rawcooked, and floppys!
π ΄ππππ °π ²ππ Έπ ½π Ά π ΅π Έπ »π ΄ π Όπ ΄ππ °π ³π °ππ ° ππ Έππ · π Ώπππ ·π Ύπ ½
The first workshop I attended was from Joanna White who is an in-house developer for the British Film Archive, Extracting file metadata using Python, MediaInfo and other open-source tools. You can read her blog about it here!
White is building and building upon open source tools for managing the BFI's massive collection of moving image materials - I mean how cool that a moving image archive has an in-house developer - in stark contrast to how most archives rely on IT departments that lock us into proprietary and ready-made software.
The tool Joanna built allows archivists to search across the entire database by very specific metadata fields. π²π½πΆπ'π ππππΆπΉπΆππΆ? It's data about the data - a tautology I know - it's the technical information that surrounds the audio visual information. Technical information like how long is the video, what file format is it, what frame rate... is now searchable as fields across the BFI's internal catalog. This is a technical feat!
Joanna walked us through a snippet of how the code works. Python is the language of the code and it calls upon Media Info software to do some heavy lifting of the file analysis.
MediaInfo is a project by the people behind the No Time to Wait Conference. It is a free, open-source tool that analyzes and reads out metadata in a human readable way. At TAPE I use Media Info allllll the time for simple tasks like how many Gigabytes is a file to investigate more complicated problems like why won't my file open in Quicktime (hint, it's always Apple's fault). MediaInfo can point me to embedded information about video files that are obscured to the user because your computer assumes you'll never need it (hint, you do!)
What Joanna's script does is point the computer to a universe of instructions - giving it access to a library of how MediaInfo analyzes, sorts, and presents technical media, then telling the computer to grab specific fields of information from the file (like file format, duration, number of audio channels), and then presents that metadata in a human-readable CSV (comma separated value) file that can be imported into a spreadsheet or even a database with ease!
This script would be especially helpful for someone who is cataloging and "ingesting" digital files into an archive. You have no idea how the file was made and so that technical information is critical to figuring out how to sustainably care for and potentially migrate the file onto newer hardware or even more sustainable file formats. For TAPE, we aren't quite dealing with digital files in this way, we are creating digital files to give to people.
ππ °ππ ²π Ύπ Ύπ Ίπ ΄π ³ π °π ½π ³ π Ύπ Ώππ Έπ Όπ Έππ Έπ ½π Ά πππ Ύππ °π Άπ ΄ π ΅π Ύπ π ΅π Έπ »π Ό ππ ²π °π ½π
JΓ©rΓ΄me Martinez is the founder of MediaArea, one of the backers of NTTW and an open source software company with a suite of tools for analyzing digital media. He lead the conversation about Optimizing Storage for DPX film scans using an open source program he worked to develop called RawCooked.
DPX is an image sequence format used for film scans. In the same way a gif is a string of still images that are in sequence, DPX are a string of uncompressed images that have metadata that relates them together so they can be transformed into a moving image. Particular to film scanners, DPX are extremely large files - from my experience in scanning an ~2 hour feature film scanned at 4K Resolution (4,000 pixels running horizontally across the active image area) is usually about 10 TB of data. As fetishists increasingly become obsessed with 6K , 8K, 12K and other absurd resolutions, storage becomes a definite crisis. For context, the UCLA Film & Television Archive's Digital Lab outputs about 50TB a week of digitized film & video, with holdings of about 9 Petabytes of unique data.
In moving image preservation, we often rely on LTO (Linear Tape Open) storage to address this high demand for storage. LTO is more "shelf stable" data tape, i.e. 0s and 1s encoded on magnetic tape - yes ! most long term storage still relies on tape! There are lots of wonderful things about LTO and a lot of annoying things, but the format itself is reliable and LTO 8 can support 10TB of storage per tape. LTO will come up again in this blog.
RawCooked is an open source program to transform DPX sequences into a preservation stable, open source file format - Matroska container & FFV1 codec. Patrons of tape will know that we digitize all home movies to Matroska FFV1 as it is a preferred format of a number of leading archives such as the British Film Institute and Library of Congress. Matroska is an open source container, meaning it isn't owned by a particular company but rather maintained by a large international community. FFV1 is similarly open source, and is a codec that relies on lossless compression, like a zip file it reduces file size without reducing quality by re-encoding data with high-efficiency standards.
To explain container and codec I use the analogy of a book. The content of the book is like the video and audio streams. A container is like the front cover, back cover, and table of contents. It contains the content and tells you what the book is and where to find the content within it. The codec is like the language of the book, it tells you information about how to read the book. You must be able to read the language to understand the content, in the same way the computer needs to be able to be able to interpret the codec to decode the content.
Rawcooked is a remarkable tool for preservation. It transforms DPX sequences into Matroska FFV1 files, reducing file size by 2/3, i.e. reducing the number of LTO tapes you need to buy. And also reducing the number of tapes you need to migrate each time an older generation becomes obsolete. Later in the conference Joanna White from the BFI revealed that Rawcooked saved the BFI ---------- in just 6 months from storage costs.
And critically, the Rawcooked function is fully reversible. You can reverse the process of transformation from DPX to MKV FFV1 back to DPX and your file will mathematically be the exact same. This is the same as a ZIP file, when you unzip your file is identical to pre-zipping. How do we know it is identical? Using checksums!
ππ½ππΈπππππ are a mathematical fingerprint applied to a file. You run an algorithm over a file and it generates a string of numbers to represent the file as is. When you run the algorithm over the file again, you check to see if there is fixity --- If the string of numbers changes, the file has changed. If it is the same, the file is the same.
FFV1 and Matroska are quite good at handling checksums and managing corruption. There is a checksum embedded in the file by slice (more on that later) but it effectively means that many many portions of a single frame are assigned a checksum, which is detectable. Therefore, a corrupt slice can be replaced by a complete frame and does not corrupt the entire frame or the entire file. Therefore, FFV1 may be safer from corruption than even DPX which will corrupt on the frame level, and therefore that frame is gone. Just like if you scratched a frame up on a physical film, that frame is gone.
Rawcooked has two problems that are in active development. The first:
Rawcooked currently supports DPX "flavors" for scanners that conform to particular standards. The scanners that I personally use - the Lasergraphics machine - and the ones that organizations like Origins Archival use (the Film Fabriek) use two additional "flavors" of DPX file not currently supported by RawCooked. The RGGB color space and Bayer Pattern. This a particular philosophy and method of film scanner's sensors that use Red, Green, and Blue filter layers to filter incoming light to active elements that are sensitive to each of those colors. This is a mimicking of the human eye and developed by Kodak, thus closely matching the color technology of their color film that has red, green, and blue separated dye layers. To get RawCooked to support this DPX flavor, it is simply money for developing so that the program can understand and encode such flavors.
The other problem is speed. What Rawcooked does in saving money, it does consume in time. We are in the midst of a liminal period in computing power history in between the CPU and GPU support for various video codecs.
Codecs as I talked about above are like the "language" of a video file, requiring translation to encode and decode by a computer. Like anything else we have to tell the computer how to do that. Right now most computers have been given instructions to do video functions on their GPU only for certain codecs. Whereas CPUs have instructions for more codecs. But CPUs are slower and so for open source codecs like FFV1, the computer takes much longer than if it were a codec like ProRes which can use the GPU.
I wonder why! It's almost like Apple owns ProRes and thus supports faster encoding for its proprietary codecs ???
Therefore, right now programs like FFMPEG and codecs like FFV1 have only been given permission to run on most CPUs, and don't have access to GPUs. Even though they are generally pretty lightweight in the grand scheme of things, the CPU thing slows them down significantly. Even when running Rawcooked on 6K files, the CPU will not exceed 3 mb of usage, so it can take 8 hours to process 1 hour of DPX. But that's changing - see more later in this blog.
One really exciting thing is I feel like I finally got a straight answer on slice count for FFV1. If you've been trained by me at the desk, I frankly sort of shrug my shoulders and say no one smarter than me has explained this in a way I understand so IDK ! In FFV1, specifically with vrecord too, you can select the slice count. But someone smarter than me finally explained it!
Slices are the number of times you divide an image during compression. “Multi-thread” or “parallel processing” is a way to understand it. Your CPU can use multiple cores to process more than one part of a single frame at a time, increasing processing speed. DV (Digital Video) tape works similarly, creating blocks of the image that are processed in parallel per frame rather than trying to process the entire frame as one big image. Speeds it up and reduces file size.
Fewer slices = fewer segments inside the frame. Easier for compression. Faster but a little bit clunkier.
More slices = more segments inside the frame. Harder for compression. Slower but more precise.
For 6K overscans, raw cooked automatically indexes at 1024 slices. Probably overkill, but it does preserve the rich nuances between slices.
π ΅π »π Ύπ Ώπ Ώπ π ³π Έππ Ίπ
This was probably one of my favorite parts of the conference. I’m literally so ready to set up a floppy disk transfer station for TAPE.
We first walked through identifying the major types of floppy disk starting with the earliest models called “Stiffies” (not floppy at all) and the different sizes 3, 3.5,
5.25, and 8” sizes.
The interior construction of a floppy disk is made up of a disk covered in iron-xoide, housed inside of a paper or plastic shell, sometimes with a spring and thin metal shutter.
Information is encoded onto the disk in two ways. First, there are sectors or pie slice areas that are separated from other sectors. Within each sector there are tracks that contain the written information.
There are also sides to disks, where sometimes the disk is formatted to be single or double sided. In addition to size there are three density modes for disks, which represent how frequently each track is encoded - 20 track (double density) 40 track, 80 track (high density). Much like videotape formats, these density types are specific to the players, but are backwards compatible, meaning high density drives will play 80, 40, and 20 track disks, but 20 track drives will only play 20 track disks. You can typically identify the type of track configuration by the number and location of notches, with double holes indicating high density.
Of course, there are lots of weird formats and propriety formats that require specific drives like Mac formatted disks. And there are event things called flippy disks where people would DIY cut second notches into floppy disks to trick the drive into writing to the disk in high density. This can be particularly tricky to read back because it is formatted to write information in previously blank areas. Plus if the disk is actually single sided, the other side will be encoded backwards.
If you’re counting there are three main compatibility issues with floppy disks - the drive must be able to support
a) the size of the disk
b) density / number of tracks
c) single or double sided disks
In addition to any number of proprietary, early, DIY, or non-standard formats. Floppys, flippys, and stiffies! But newer drives, high density, are more likely to support the greatest number of disks.
Once you find a drive that can support the floppy disks of interest it’s time to clean it. It likely has never been cleaned so be sure to open the drive and give it a deep dusting with compressed air. You a clean the reader heads with isopropyl alcohol, as any dust on the heads will scratch the disk.
The drive usually goes bad with the rubber belt that powers the motor - if the drive won't move that belt needs replacement. If the drive is squealing, it needs lubrication.
You may also need to clean the disk itself. Dirt or mold on a floppy disk causes the drive to squeal, because the head is tracking the contaminant across the disk and scratching it up. You can clean it with alcohol and a non-abrasive wipe like PEC pad.
Then you’ll need a ribbon that connects to your drive. Each drive has specific number of pins, usually 34 or 26 pins which are easy to find on eBay. You’ll need power for the drive, which can be a portable unit, usually 5 or 12V, but 8” floppy drives have 240V.
The most important tool is a floppy controller, which allows the older drive to communicate with a modern computer. There are bunch of different models, usually built by the retro gaming community such as the Supercard Pro or Cryoflux, but Leontien recommended the Grease Weazle because it has a lot of support for various formats, an active development team, and an active community supporting and providing documentation. The Grease Weazle connects to the computer via USB.There are two ways to read a floppy disk - flux steam and logical image. A flux stream is a capture of the fluctuations in the magnetic field on the disk. The logical image is a transformation of that fluctuations into human readable files. The flux stream is necessary to figure out how the disk is encoded and to capture an accurate full image of the disk. The logical image is based on the correct assessment of the disk's formatting from the flux stream so we can get the files off of it.
Leontien then demo'd the Grease Weazle and terminal commands needed to image the disks. Opening the terminal, she ran the command to read the flux stream in the .scp format. The command is:
ο½ο½ ο½ο½ ο½ο½ οΌ·ο½ο½ο½ο½ο½ο½ο½/ο½ο½ ο½ο½.ο½ο½ο½ --dο½ο½ο½ο½ =οΌ’
which translates to:
GreaseWeazle Read Drive B and write it to Workshop folder and the filename test.scp
One challenge she mentioned is figuring out how the drive sees itself. In the command you have to specify the location of the drive you want to access. Each drive sees itself as a different input and she recommended trying them in this order
--Drive=A / --Drive=B / --Drive=0 / --Drive=1
Then opening HVC Floppy Emulator allows us to examine the flux streams. From there, you can determine the number of tracks and even tell the logical format read to skip over empty tracks. This is particularly useful when reading a disk that has fewer tracks than the drive can read (i.e. a 20 track in an 80 track drive). You can tell it to skip empty tracks otherwise your image turns up a bunch of red for empty (unreadable) tracks.
All that red is empty information from not specifying the number of tracks
You can also tell it the formatting of the drive - in this example it was an IBM 360 format. How do we know? We have to match the size, track number, and other technical information to the logical format using this handy wikipedia page!
Because the Grease Weazle was designed for retro gaming, it allows the ability to write over disks, which you wouldn't want for preservation. It doesn't respect write blockers (i.e. stickers covering the read write notch). Plus stickers tend to get stuck in the drive so they aren't particularly useful. Instead, you have to explicitly tell the GW to write in the command line, which you would never do because you're always going to give it the ο½ο½ ο½ο½ ο½ο½ command.
I am so excited to get starting building a lab for TAPE! Leontien, as part of her grant funded project is writing a guide for the Digital Preservation Coalition, so there will a comprehensive guide on floppy disks that we can all reference. This is particularly exciting because much of the information is scattered across forums and there is no centralized place to hold it.
π Έππ Έππ · π ΅π Έπ »π Ό π °ππ ²π ·π Έπ π ΄
After the talks, I snuck onto the booked up tour of the Irish Film Institute (IFI). I wasn’t going to miss it, sorry. I first learned about the IFI through their incredible python scripts - thanks to AV Preservation at UCLA Library’s Special Collections, I use the copyit.py script all the time! It moves files from one location to another, generating a file level checksum and then checking the checksum upon delivery to ensure that the files match in both locations. But there are dozens of tasks and full workflows the IFI uses, that I highly recommend.
The Archive is housed in a 17th century building in the heart of Dublin, right next to their cinema for the public. They have a full bookstore and DVD collection, a bar, a cafe, and a 3-screen cinema showing independent and repertory cinema. Their programming is closest to the Laemmele in Los Angeles. We walked through their workspaces, including their digitization lab for film which uses a lot of the same equipment I am familiar with (like Cintels (Blackmagic) and through their collections workroom which has 4 Steenbecks used for print traffic inspection. We also got to learn from their Special Collections department (anything but moving image) about their print and 3-d object collection.
We talked a bit about the challenges with repairing Steenbecks, especially in Ireland. Steenbeck was bought 2 years ago and now there are no official technicians or parts associated with the company, only an aging group of engineers and repair people who want and need to pass on their knowledge. It’s a similar issue to VCRs and other video electronic equipment.
I missed the afternoon talks and workshops unfortunately and I did skip the social dinner (sorry!). Instead I returned to the IFI after some dinner for a screening organized by the Archive of Nightshift (1981) dir. Robina Rose preceded by Meshes of the Afternoon and a conversation with the legendary Irish experimental filmmaker Pat Murphy. I met up with RuairΓ who came down from Belfast for the screening and some other filmmakers and archivists at the IFI. It was a real gathering of the local scene of artists, filmmakers, programmers, and archivists!
Nightshift played in LA earlier this year but I missed it, so was excited to see it and with the additional context provided by Murphy who was a peer of Rose.
After the screening, I hung out with a group of people who I met and we had a Guinness outside the Archive. We talked a lot about the emergent microcinema scene in Dublin as many of them were the organizers behind Fanvid, a DIY film club showcasing artist-made work and experimental film in roving locations. I met Alexa who works at the IFI as an archivist I also met Frank who makes some really cool work with RuairΓ around media censorship and building a Pirate Television archive particularly in Northern Ireland. Frank is part of the Repeater Collective which is a DIY collective for artists and musicians in Ireland. I am super excited to check out his film Few Can See and potentially bring some work to the Whammy! microcinema!
Gifs made using FFMPEG - Instructions sourced from
- How to convert string of JPGs into an MP4
- Convert MP4 to Gif (I use the second one!)
- Cool text fonts generated here
wonderful post Jackie! Very informative and fun. Must have a Guinness while cleaning floppy disks at Whammy!
ReplyDelete