Skip to main content

Totals for ETD Fedora datastreams missing checksums

·196 words·1 min
fedora etd

As part of my Fedora 3 migration work, I’ve done some initial investigation on datastreams that are missing checksums (since it’s not clear if there’s an easy way to calculate them programmatically, although there should be).

For the 1,878 ETD-related objects in my test Fedora 3 instance, here are the totals by datastream:

PREMIS327
RELS-EXT2
POLICY1513
DC1442
MODS327
XHTML265
FILE223
MADS141
SKOS1

The FILE datastream is the only binary datastream here, and that’s where we care about checksums the most.

A little backstory on this for those who don’t know: Fedora 2 has a bug with the checksums that made it impossible to set the checksum on datastream updates. (Related problem discovered more recently: PHP SOAP doesn’t set nulls properly; if I’d known this I may have been able to work around the Fedora bug.) So, basically any datastream that was updated after initial ingest lost its checksum.

I’ve already done some updates on this test data to clean up the RELS-EXT; I suspect this is why so few of them are missing checksums.

My next step is to see if I can find a way to set the checksum without creating a new datastream version.

Author
Rebecca Sutton Koeser
humanities research software engineer, thinker, writer