For this year's Bloomsday, Rhonda Armstrong, Regina Higgins, Steven Hoelscher, Pamela Andrews and I collaborated digitally to extend the Ulysses dataset and visualization work begun at THATCamp Prime 2012 (aka Bloomsday 2012). Rhonda, Regina, Steven, and Pamela each thoroughly scoured ten pages of the book to add to our knowledge about the network of character relationships in the novel, and I extended last year's "Wandering Rocks" visualization (off of the data created by Chad Rutkowski and me in 2012), adding in weights showing the "depth" of each character interaction. A huge thank-you to Rhonda, Regina, Steven, and Pamela for their time and effort expanding the public dataset of Ulysses character interactions!
So! You can start immediately below with reading my meta-analysis of the project (basically, how a non-weekend viz project might attack the same problem), jump down to inspect the visualizations, or jump even farther to read the instructions and suggestions I shared with this year's scholars (including an explanation of the "weight" system we used to record a subjective depth to each character interaction).
Related: You can check out last year's Bloomsday visualization work, or read the tutorials (1, 2) for making basic Gephi (infoviz software) visualizations that I created as part of my ACH Microgrant work.
1. The big caveats with the dataset were caused by my decision to assign volunteers ten-page sections before the actual event: people needed to use the edition of Ulysses they had on hand (which meant assigning increments of ten pages to different people meant different, possibly overlapping, sections of the book were covered), and because some people needed to drop out of participating before Bloomsday, there are gaps in coverage (e.g. pages 1-10 aren't covered). The current dataset covers the following pieces of the book: pages 11-40 and 101-110 (1990 Vintage edition), pages 71-80 (the Project Gutenberg e-text), and pages 81-90 (1961 Modern Library edition)...plus all of "Wandering Rocks".
In retrospect, I probably should have handled dividing pages differently—a better way to handle page numbering: wait until the day of the event to assign pages to those who are still participating, and either require use of the same edition (maybe by scanning and sending out the needed pages to volunteers, so no has the barrier to participation of needing to buy a different edition of Ulysses) or falling back to the non-authoritative Project Gutenberg e-text (easy to copy and paste different sections of the text into emails to the different participants; flaws of this version of the text don't majorly impact our character relationship recording activities). On the other hand, much of the interest in helping out with this project is tied to a desire to (re)read Ulysses, and letting scholars use their preferred edition is important to that reading experience.
Would I do this project again? Absolutely! The participants were amazing collaborators, volunteering their time and effort to expand a public dataset, and having a chance to learn just a bit more about information visualization is always a treat. Next time, though, I'll need to take more time planning out more thorough coverage of a single edition of the work. I'd also spend more time packaging our finished data for use by others (e.g. those with more Gephi chops than me). I did open viewing of our data recording spreadsheet to the public (and am happy to give anyone who wishes to augment the dataset editing capability). You can also check out the CSVs I imported to Gephi (Wandering Rocks spreadsheet, Start of Book spreadsheet), which put the data we gathered in the Google Spreadsheet into Gephi-ingestable form (and fix a few redundant variant-spellings and the like). Given its issues (gaps in page coverage, data from different editions), I didn't actively court anyone to use our data. Next time, with better coverage of the book, I'd reach out to DH tweeps who work more regularly in infoviz to help augment and use the data. But! If you want to work with any of the data or visualizations images, please go ahead! This post, the dataset on GoogleDocs, the two CSVs linked above, and the three visualization images in this post are all CC BY.
2. Visualizations without explanation? (Sacrilege!) Given the holiday and learning nature of this project, we didn't approach it with quite the same end goal as most "proper" visualization projects (by which I mean, scholarship with project-length time and effort, producing an end result of narrated new knowledge). The data creation by the team's scholars was rigorous under the given parameters of a defined but sometimes ambiguous weight spectrum and non-continuous selections from the novel, but it was also a weekend project: looking at the visualizations below, I can confirm some expectations and enjoy getting an unusually broad view of the network of the novel's socializing, but there is much more someone with the time and interest can do!
A full visualization project would benefit from continuous data, or at least data with interesting ways of separating non-continuous sets of pages (e.g. all sections when Stephen is physically present as a way of looking on his effect on social interaction). Some things we'd do with more time to devote to this project:
That's by no means an exhaustive list, and again, I feel that this kind of dipping ones toes into the waters of a new technique is important—as long as you are aware of what, if any, arguments you're trying to make with your scholarly (or weekend scholar) production.
3. Gephi problems! The current version of Gephi (0.8.2beta) wasn't working on OSX 10.8.4; it loads fine but freezes on the data laboratory screen. I reverted to an older version (0.8.1beta) that works fine, after losing some time trying to get things working.
4. The subjectivity of placing types of relationship interactions along a spectrum (or even just of placing character interactions into "types") could definitely benefit from more debate and detail (see below for a table of the interaction coding we used—I think we came up with it during THATCamp Prime 2012?). The spectrum as it stands has several issues. The sometimes-directionality of the weights made it difficult to know when to record more than one interaction; the codes have both one-way interactions types (X thinks about Y) and two-way interactions (X and Y have a conversation). The interaction code 3, "Omniscient third person narrator (for when two people interact, but it isn't from just one of their viewpoints)", was meant to control for directionality and make "source" and "target" designations meaningful, but didn't end up being useful because interaction codes 4-7 were ambiguous as to direction.
A relatedly subjective issue arose from the choice of node names; for example, the woman's arm that flings a coin to the one-legged sailor clearly belongs to Molly (though maybe not clearly at that point in one's reading)—should that interaction accrue to the "Molly" node, or should a second "her" node be created? Other portions of the book would provide similar problems (e.g. Martha Clifford/Nurse Callan).
Visualization A, Wandering Rocks Redux. (Click image to view larger.) A visualization of characters interacting with other characters in the "Wandering Rocks" chapter (all pages of chapter covered), with lines weighted by "depth" of the interaction: more intimate encounters have a heavier weight, while "lighter" encounters (e.g. character thinks of another characters or salutes them but does not converse) have lighter connecting lines. See the "Instructions" section below for more on this weighting process. (This visualization augments last year's infoviz by adding weights for each encounter.)
Visualization B, All Characters in Selections from the Beginning of Ulysses. (Click image to view larger.) Below, character interactions drawn from selections near the beginning of the novel: pages 11-40 and 101-110 (1990 Vintage edition), pages 71-80 (the Project Gutenberg e-text), and pages 81-90 (1961 Modern Library edition). Again, lines are weighted by "depth" of the interaction: more intimate encounters have a heavier weight.
Visualization C, the major socializers in the beginning selections from Ulysses. (Click image to view larger.) Below, a different look at the same dataset (selections from the beginning of the novel). The visualization above shows a node for each character recorded, which makes it less readable. Below, I filtered the dataset to only show those characters who interacted with more than one other person (in Gephi, go to filters > topology > degree range and set how many edges a node must have to appear), giving an easier view of the major socializers in this part of the book (again, lines are weighted with thicker lines showing more intimate interaction).
The instructions I sent to the other members of the team:
1. Each of you should have access to the Bloomsday InfoViz spreadsheet, where you can see who's been assigned which pages/sections of the book and record your data (check your inbox for a link sent by Google). Please feel very free to take a different or additional set of pages, or do a smaller or larger number of pages than you volunteered for—just make a note on the spreadsheet so we don't do redundant work. Note that the spreadsheet has three different sheets (tabs at the bottom of the page): one for claiming pages, one for recording your data, and one with info on how to code the type of character interaction.
2. There's a bit of difficulty in working together with potentially different editions/page numbers of the book. If you could each let me know the publisher/year of the copy of Ulysses you're using (on the spreadsheet's first sheet), I'll make sure we're not accidentally overlapping [well, that turned out to be too difficult...]. Alternatively or in addition to reading from a print book, you might copy and paste the chunk you want to work on from the Project Gutenberg e-text into a word document; I find this makes skimming for character interactions easier, since you can go through and highlight things. The e-text doesn't have page numbers as far as a I recall, so if you work this way please just let me know the sentence you start and end on.
3. Read or skim for character interactions! (See below for details*.)
4. Record the interactions you read on the spreadsheet. The columns are used as follows:
Coded interactions by number along a highly subjective spectrum of relationship depth (see "Lessons" section above for some of the issues that rose from this coding):
|Coding for Type of Character Interaction||Code to Use in Type of Interaction Column|
|Character thinking of another character||1|
|Character observing another character||2|
|Omniscient third person narrator (for when two people interact, but it isn't from just one of their viewpoints)||3|
|Character acknowledging another character without speaking (tip of the hat, nod, etc.)||4|
|Character voicing salutation to another character or other extremely brief interchange||5|
|Character entering into conversation with another character||6|
|Character having intimate contact with another character||7|
Happy (belated) Bloomsday!