In his upcoming book, Macroanalysis: Digital Methods and Literary History, Matthew Jockers offers a playful vignette aimed at illustrating topic modeling for the initiate. Jockers provides us with a preview of the story on his blog, in which he conjures up the “LDA buffet,” a restaurant that offers a menu of themes from which authors can choose to create new works.

The assumption is that authors pick from a finite number of themes for their fiction, and in the process mimic the experience of a diner deciding on various dishes that might be available at a buffet. In this case the diners happen to be Jane Austen and Herman Melville. The contents of their plates, be they representations of Persuasion or Moby Dick, are a collection of words mixed together in much the same way that their novels incorporate various “motifs, themes, topics and tropes (seasonal).”

After their meal, the pair come upon Ernest Hemingway, who was recently banned from the LDA Buffet and hopes to uncover what was on the menu by disecting and categorizing each word in their work. Jockers does a great job of relating Hemingway’s process to the work of digital humanists as they engage is topic modeling and macroanalysis. After reading Jockers’ piece, I’ve been wondering about the utensils that writers might use to choose items from this imaginary buffet. At the risk of overextending the metahpor, I began to think of notebooks as plates and search tools as serving spoons.

In considering this process, I was reminded of a recent New Yorker essay by John McPhee titled “Structure.”1 In the essay, McPhee explains the difficulty of organizing his writing into a coherent narrative. In so doing, he gives what I found to be a fascinating account of his interactions with technology as his writing process evolved over the years. McPhee relates his use of various methods to structure his creative nonfiction, from cutting and filing strips of paper to running macros on Kedit, a now quasi-defunct text editor. McPhee’s interview with an information technologist named Howard Strauss is particularly illustrative of this transformation:

He listened to the whole process from pocket notebooks to coded slices of paper, then mentioned a text editor called Kedit, citing its exceptional capabilities in sorting. Kedit (pronounced “kay-edit”), a product of the Mansfield Software Group, is the only text editor I have ever used. I have never used a word processor. Kedit did not paginate, italicize, approve of spelling, or screw around with headers, WYSIWYGs, thesauruses, dictionaries, footnotes, or Sanskrit fonts. Instead, Howard wrote programs to run with Kedit in imitation of the way I had gone about things for two and a half decades.

He wrote Structur. He wrote Alpha. He wrote mini-macros galore. Structur lacked an “e” because, in those days, in the Kedit directory eight letters was the maximum he could use in naming a file …

Structur exploded my notes. It read the codes by which each note was given a destination or destinations (including the dustbin). It created and named as many new Kedit files as there were codes, and, of course, it preserved intact the original set. In my first I.B.M. computer, Structur took about four minutes to sift and separate fifty thousand words. My first computer cost five thousand dollars. I called it a five-thousand-dollar pair of scissors.

Perhaps writing factual prose lends itself to this kind of editing process. When one is writing fiction, there isn’t necessarily a need to record and structure interviews and events. However, there is still (at least conventionally) a need for organization in terms of narrative, and for the structuring of one’s own ideas.

Humanists are often known for their serendipitous creative processes, and of course each writer has their own unique approach to their work. But in the case of “Structure,” I was struck by how McPhee’s pre-writing techniques were similar to some of the analysis done by digital humanists. For example, McPhee explains how, “Kedit’s All command shows me all the times I use any word or phrase in a given piece, and tells me how many lines separate each use from the next.” Of course, he was likely more worried with varying his word choice than topic modeling, but the similarities are there.

In any event, I began to wonder whether the powerful new tools currently being used for analysis will also be incorporated into the writing process itself. Perhaps this is already happening and I’m simply not aware. It would be amazing to read another essay similar to McPhee’s, but from the perspective of someone writing 50 years from now. I can’t wait to find out what new utensils will be available and how they will be used.

1.^Reading the article seems to require a subscription, but Indiana University folks should be able to read it here or here. Unfortunately, the database versions lack the useful figures in the original, which include various maps and charts.

  1. John, check out this essay by Aaron Hamburger in the New York Times: http://opinionator.blogs.nytimes.com/2013/01/21/outlining-in-reverse/ about reverse outlining for another example of using DH-like analysis in the writing process itself. The connection isn’t mine – this blog by Fred Gibbs: http://fredgibbs.net/blog/history-theory/learning-to-read-again/ points out how Hamburger’s technique reveals the true analytical power of DH – and Gibbs’ blog is how I found the NYT piece in the first place.

    I’m intrigued by the possibility of there being different macro-structures for the fiction and non-fiction genres, with there being less of a ‘standard’ structure for fiction than non-fiction. It seems logical to guess that the corpus of non-fiction would share more similar structures amongst itself than fiction, but I wonder if that’s a common-sense conclusion that would still hold up after some DH analysis. Maybe the structural frameworks behind the corpus of fiction are just as pronounced as in non-fiction, even if you allow for the idiosyncrasies of individual writers. As more writers like McPhee and Hamburger turn to DH methods to analyze the structure of their own writing, it will be interesting to see if their individual conclusions coalesce and begin to hint at being able to say anything meaningful about a macro-structure existing for fiction and/or non-fiction.

  2. You described Jockers’ imagery really well! I really enjoyed his piece, and I feel it resonates with the feeling some readers and analysts may have of digesting the essence of a work through its various pieces! I think that Digital Humanists are fascinated by discovering connections hidden below the surface of literature. You bring up an interesting idea too. I wonder, if a writer used the same LDA Buffet concept to create a work, would the need to include a certain amount of words that match a theme and to esconce connections inspire or limit the creative process? Would these work be more or less valid as art in any way, since these decisions must be so intentional? Or would the work be a totally new form, a type of literary hide-and-seek or cut-and-paste? It might even be possible to create narratives that “could have been written” by silenced voices during historical periods. Or in a more dystopian vision, would the Digital Humanities come to mean that all our art and literature is written by computers, through methods statistically proven to provoke emotion!

