A Speculative Experiment, Useful Here and Now?: Tagging the Digital Edition

How do you conduct a speculative experiment around a digital humanities tool, while also creating something that's useful to readers and scholars right now? I've been using a site policy on tagging to test the differences between my speculative design and the probable reality of my site's use.

My Infinite Ulysses participatory digital edition will let readers and scholars add contextual annotations to James Joyce's Ulysses. Those annotations can be tagged with a word or phrase that will help other readers find their particular signal in the potential noise of a (hypothetically) heavily annotated text; possibilities arising from tagging annotations include:

filtering the annotations to only show those on particular areas of interest (e.g. annotations that translate foreign words, or annotations aimed at first-time readers)
filtering out annotations the reader doesn't want to see (e.g. spoilers about something that only becomes clear later in the book)
easily creating a curated "edition" of Ulysses aimed at a specific topic (e.g. an edition meant for use by a class focusing on the place of books, letters, newspapers, and other written material in the Modernist novel)

Ulysses' complexity and hypertextuality make it a great candidate for extension through intense annotation, and I've written previously about scholarly concerns over what heavy contextual annotation and interpretation would do to the text: two circumstances that make Ulysses perfect for a speculative experiment (An approach that can learn about an object such as a digital edition by imagining and designing for a hypothetical and/or future circumstance) in designing a digital edition with thousands of users actively creating content (annotations, questions/answers) on the site, and generally using the features of the site to their limit.

So: tagged annotations, and a site designed to customize a large amount of noise so that each reader receives the signal he or she requires. How can I make this tagging actually work—both in my speculative experiment hypothetical of thousands of active site users, and in the more realistic possibility of far less activity?

Building for Two Extremes of Site Usage

I see two possible routes for tag creation: restricting users to a limited set of tags I provide myself (perhaps supplemented with user-created tags by specific request), or allowing users to create and apply their own tags that become part of the public pool of available tags (a folksonomy).

Trust the users to find how tags serve them best: I start from the premise that allowing users more control over their reading experience provides them with a more customized reading experience, a major goal of the site. I'd prefer to model what I see as appropriate use of tags, and then assume users will employ the feature in a way that benefits their reading and, by extension, that of other users (even if not in the way I expected). From my experiences dealing with spam and observing internet trolling, though, I'll have a plan in place for moderation of tags before they join the public pool, if early use of the site suggests a need for it (and I assume I'll need some sort of anti-spam tactics, although requiring user accounts and IP banning will be of some help in preventing spammers and allowing real users more freedom).

So Much Annotation Tagging!

With my speculative experiment hat on, allowing users to build an unbridled folksonomy of thousands of tags sounds like problems: too many tags for moderation by one dissertating graduate student, redundant tags with slightly different spellings or phrasings, and an overwhelming list of possible tags for the user to select from are all possible problems. For a site dealing with extreme amounts of user-generated content, good approaches to moderation draw on the user community (e.g. as in asking users to click a "flag as inappropriate/spam" button when spam is encountered).

I admire StackExchange's approach to moderation (I'm also thinking about incorporating StackExchange's feed of unanswered questions into the site—a quick way to see parts of the text you might help another person with, and a way to generate "reputation" points toward a user undergoing less moderation and more automatically public content), which requires site users to build up "reputation" points through successively more broader site privileges. These privileges range from users who have contributed very little to the site being limited to actions such as flagging posts, to those who have performed enough "reputable" actions on the site being allowed moderator-like abilities such as casting deletion votes on negatively voted answers.

Because I don't want users to have any waiting time before jumping into using the site, I'd need to modify StackExchange's approach so that new users have a full range of user abilities from the very start—but whether annotations and answers they create are visible to the public could be dependent on their site activity (time spent on the site, indicating reading) and/or their contributions (one decent annotation = probably not a spammer). I'd also allow users to create opt-in groups of users who can see their annotations regardless of whether they're visible to the general public, allowing users such as literature classes or book clubs to work together with no delay.

A Reasonable Amount of Annotation Tagging?

In reality, the site won't generate that extreme level of content described by my speculative experiment. Anecdotal evidence from other Speaking in Code attendees suggests that site users creating too many tags won't be an issue; the more likely reality (as with any DH project) is that your project doesn't see enough visitors turn into site users, or enough site users actively adding content (such as new tags) to your site.

Less site use than my speculative extreme is good, because I can design a site that draws on StackExchange's and Reddit's communal moderation, while putting into place a policy that relies on content generation being low enough that I can moderate it if needed (with the help of a "flag for moderation" option for users). The site is built around the idea of communal discussion of Ulysses, and I'd much rather need to moderate public content after it's been posted, then place any delay on a user sharing his thoughts with the community. The StackExchange approach, while great for professionals who have a reason to return to the site and build up reputation, doesn't work quite as well for a DH project where any barrier to entry might turn away potential readers.

I'll be gathering statistics (e.g. "Community Transcription — Thirty-One Months" http://wardepartmentpapers.org/blog/?p=1442) and noting design decisions on other crowdsourcing projects over the coming month, with an emphasis on crowdsourcing that requires active user decision-making or interpretation. Crowdsourcing that specifically asks for little active attention on the part of the user is a great place to learn about intuitive design, though. Peter Organisciak's recent blog post "CrowdCamp Report: Waitsourcing, approaches to low-effort crowdsourcing" prototypes several crowdsourcing instruments that demonstrate useful ways of lowering effort for user participation, suggesting some novel ways of asking users to help with annotation tags (e.g. I could separate tagging into a feed as with the unanswered questions, or place a small widget in the footer of the site that pulls up an untagged annotation and asks for a tag... unclear whether those would be useful or used, but food for thought).

At the end of the Infinite Ulysses project, I expect to address each site feature (e.g. annotation tagging) across three contexts:

precedents from other DH projects (e.g. how tagging moderation was addressed, the amount of tagging users actually supplied)
the statistical realities of my site, observed via user studies (how often was moderation required? how many tags did users actually apply to annotations?)
what designing the site with a speculative experiment in mind teaches me about the limits of digital editions

A Blended Approach to Tags

My current thinking is to take a blended approach to tags: seed the Infinite Ulysses site with contextual annotations before opening it for public viewing, and also seed those annotations with tags (Tags will link to page with a paragraph providing more information about the tag (for user-provided tags, these might need to be created by me?).) from a small set of broadly useful ideas:

Each foreign language that appears in the text (church Latin, French...)
Literary references (e.g. Milton, Shakespeare's specific plays)
Clues (e.g. to the identity of Bloom's correspondent and Macintosh)
Spoilers (annotations drawing on information you'd only know if you'd already read the book)
Vocabulary (help with difficult words, the history and contemporary uses of particular phrases or slang)
Music (making it easy to pull up references to song, lyrics, poetry in the text)
Simple and advanced plot explanations (can be toggled off for those not interested in this help)
Reading media (references to books, newsprint, letters)
Colonialism and usurpation
Some sort of tagging that indicates the granularity of an annotation, so that annotations appropriate for scholars versed in interpretations of the book reach them, and annotations helpful to a first-time-reader understanding of the book reach those users (suggested by Ronan Crowley)

Users will be free to add their own tags, but I'll keep a close eye on site usee during user testing and decide whether those user-created tags will be automatically public, visible only to the user who created them until I moderate them, or start as private but automatically appear publicly once the user has proven understanding of site use with some small level of activity on the site (the StackExchange approach). When I begin user-testing, I'll have a chance to ask users what their preference is and see how tagging might actually play out; I'm excited for my users to show me which option is best for them!

I successfully defended my digital humanities doctoral dissertation in Spring 2015. The Infinite Ulysses social+digital reading platform that was part of that project has been retired into an archival form: a static site with a slideshow tour of past interactive features.