As part of my Fedora 3 migration work, I’ve done some initial investigation on datastreams that are missing checksums (since it’s not clear if there’s an easy way to calculate them programmatically, although there should be).
For the 1,878 ETD-related objects in my test Fedora 3 instance, here are the totals by datastream:
PREMIS | 327 |
RELS-EXT | 2 |
POLICY | 1513 |
DC | 1442 |
MODS | 327 |
XHTML | 265 |
FILE | 223 |
MADS | 141 |
SKOS | 1 |
The FILE datastream is the only binary datastream here, and that’s where we care about checksums the most.
A little backstory on this for those who don’t know: Fedora 2 has a bug with the checksums that made it impossible to set the checksum on datastream updates. (Related problem discovered more recently: PHP SOAP doesn’t set nulls properly; if I’d known this I may have been able to work around the Fedora bug.) So, basically any datastream that was updated after initial ingest lost its checksum.
I’ve already done some updates on this test data to clean up the RELS-EXT; I suspect this is why so few of them are missing checksums.
My next step is to see if I can find a way to set the checksum without creating a new datastream version.