Rebecca Sutton Koeser bio photo

Rebecca Sutton Koeser

Lead Developer at Princeton University Center for Digital Humanities, PhD in English Literature.

Twitter LinkedIn Github ORCID iD Keybase Humanities Commons

As part of my Fedora 3 migration work, I’ve done some initial investigation on datastreams that are missing checksums (since it’s not clear if there’s an easy way to calculate them programmatically, although there should be).

For the 1,878 ETD-related objects in my test Fedora 3 instance, here are the totals by datastream:

PREMIS 327
RELS-EXT 2
POLICY 1513
DC 1442
MODS 327
XHTML 265
FILE 223
MADS 141
SKOS 1

The FILE datastream is the only binary datastream here, and that’s where we care about checksums the most.

A little backstory on this for those who don’t know: Fedora 2 has a bug with the checksums that made it impossible to set the checksum on datastream updates. (Related problem discovered more recently: PHP SOAP doesn’t set nulls properly; if I’d known this I may have been able to work around the Fedora bug.) So, basically any datastream that was updated after initial ingest lost its checksum.

I’ve already done some updates on this test data to clean up the RELS-EXT; I suspect this is why so few of them are missing checksums.

My next step is to see if I can find a way to set the checksum without creating a new datastream version.