Rebecca Sutton Koeser bio photo

Rebecca Sutton Koeser

Lead Developer at Princeton University Center for Digital Humanities, PhD in English Literature.

Twitter LinkedIn Github ORCID iD Keybase Humanities Commons
Saxifraga cochlearis

Saxifraga cochlearis

In his poem A Sort of a Song, William Carlos Williams wrote “no ideas but in things” and “saxifrage is my flower that splits the rocks.” What I’m doing here almost certainly isn’t what he meant– in fact, I may be doing the reverse, in that I am taking a poem and words and, in a sense, converting it back to, or at least representing it as, its component “things.” Even though it isn’t quite what Williams intended, these lines kept coming to mind as I worked ont his post, and it seems related to the things in poetry I’m discussing here.

Early on, near the beginning of this project, when we were experimenting with some of the tools and technologies we thought we might use to improve the process of identifying and tagging names in XML text, I noticed some strange output when I ran some of the poetry from the Belfast group sheets against the DBPedia Spotlight annotation service. Because I wasn’t restricting the identified resources to persons, places, or organizations (which is what our tools usually do when we’re trying to identify names to be tagged, e.g. in the NameDropper OxygenXML plugin we’re developing), it was identifying things like “potato”, “rock”, “eye”, “mouth”, “hand”, and “root” in the text. We’re now at the point in the project that we’re starting to shift towards using the tools we’ve been developing to enhance the EAD and TEI XML associated with the Belfast Group, and as I’ve begun working on tagging some of the poetry I was reminded of this and thought it might be worth a little more investigation and thought.

For this experiment, I restricted myself to Seamus Heaney’s poem Digging, as it appears in the draft on one of the Belfast group sheets (there are some slight wording differences from the published version).

Below are the things that DBpedia Spotlight identifies in the poem. I’m using the DBpedia thumbnails (or Wikipedia thumbnails, in the few cases where the DBpedia thumbnail image link was broken) to emphasize the “thingness” of the entities that Spotlight recognizes. Each image links to the corresponding DBpedia resource, and if you hover your mouse over the image you should see a snippet of the poem where the entity was recognized. I’ve sorted them out into three groups semi-manually, since I’m still having difficulty filtering based on support and similarity scores without losing useful data, although in this case it seemed like very few of the identified resources had high certainty, I suspect due to the poetic language.

First, the things that DBpedia Spotlight recognized accurately, in the order that they occur in the poem.


Finger

Finger

Thumb

Thumb

Pen

Pen

Gun

Gun

Window

Window

Wealth

Wealth

Sound

Sound

Spade

Spade

Sink

Sink

Rhythm

Rhythm

Potato

Potato

Drill

Drill

Boot

Boot

Knee

Knee

Root

Root

Brightness

Brightness

Scattering

Scattering

Potato

Potato

Hardness

Hardness

Hand

Hand

God

God

Handle (grip)

Handle (grip)

Spade

Spade

Grandparent

Grandparent

Cutting

Cutting

Peat

Peat

Bog

Bog

Milk

Milk

Bottle

Bottle

Paper

Paper

Drink

Drink

Sod

Sod

Shoulder

Shoulder

Good and evil

Good and evil

Potato

Potato

Slapping

Slapping

Peat

Peat

Cutting

Cutting

Spade

Spade

Man

Man

Finger

Finger

Thumb

Thumb

Pen

Pen


It’s sort of an odd way to read a poem, but it’s also kind of intriguing. Among other things, I think this highlights how full of actual physical items, especially body parts, the text is.

Second, a few of the resources that aren’t quite correctly matched up to the text, but are still interesting and semi-relevant.


Squatting

Squatting

Coarse fishing

Coarse fishing

Lugger

Lugger

Shaft mining

Shaft mining

Tell

Tell

Fell

Fell


I actually found these mis-identifications somewhat thought-provoking. To some degree, they betray the extent to which DBpedia is thing-centric, so that verbs and adjectives are mis-identified as nouns (again, with low confidence or support scores). But I find the notion of the poet’s pen “squatting” between thumb and finger, in the sense of taking up residence in an abandoned space without permission, rather appealing and fascinating. In the case of some of the other mis-identifications, it seems that Spotlight is picking up the context of digging and working outdoors, hence the mountains and archeological entities. And in the case of the lugger ship, this mis-identification actually drove me back to the text, and when I looked at “lug” in context I discovered that I didn’t actually know what it was, and had to go looking to figure out that the lug and shaft are parts of a shovel or spade.

Third, some of the mis-identified things that are humorously, obviously wrong. In this case have actors and musicians or bands, conceptually unrelated items, and even a video game. I’m including these here partly because they make me laugh, but also to demonstrate that the technology still has limitations and we need to be careful how we apply it.


Doris Day

Doris Day

Toner

Toner

Vomiting

Vomiting

Turf war

Turf war

Common cold

Common cold

Molding (process)

Molding (process)

The Edge

The Edge

The Roots

The Roots

Dig Dug

Dig Dug


For those who are interested, here are some technical notes on how I generated this post.

    lookup-names heaney1.xml -c 0.1 \
        --tei-xpath '//t:body[@xml:id="heaney1_1045"]'  \
        --scores --csv /tmp/heaney-digging.csv
    
  • Wrote a simple python script to iterate through the CSV file and generate the HTML I wanted for each item, pulling the label and thumbnail from DBpedia, and using the context pulled from the poem.

  • Manually sorted out the entities I wanted into the three groups, preserving order, and fixed missing thumbnails where I could (some of the DBpedia thumbnail references are invalid; I’m guessing this is because they have been updated on Wikipedia since the last time the current DBpedia data was regenerated).