Rebecca Sutton Koeser bio photo

Rebecca Sutton Koeser

Lead Developer, The Center for Digital Humanities at Princeton University

Twitter LinkedIn Github ORCID iD Keybase Humanities Commons

As part of my Fedora 3 migration work, I’ve done some initial investigation on datastreams that are missing checksums (since it’s not clear if there’s an easy way to calculate them programmatically, although there should be).

For the 1,878 ETD-related objects in my test Fedora 3 instance, here are the totals by datastream:

PREMIS327
RELS-EXT2
POLICY1513
DC1442
MODS327
XHTML265
FILE223
MADS141
SKOS1

The FILE datastream is the only binary datastream here, and that’s where we care about checksums the most.

A little backstory on this for those who don’t know: Fedora 2 has a bug with the checksums that made it impossible to set the checksum on datastream updates. (Related problem discovered more recently: PHP SOAP doesn’t set nulls properly; if I’d known this I may have been able to work around the Fedora bug.) So, basically any datastream that was updated after initial ingest lost its checksum.

I’ve already done some updates on this test data to clean up the RELS-EXT; I suspect this is why so few of them are missing checksums.

My next step is to see if I can find a way to set the checksum without creating a new datastream version.