In his upcoming book, Macroanalysis: Digital Methods and Literary History, Matthew Jockers offers a playful vignette aimed at illustrating topic modeling for the initiate. Jockers provides us with a preview of the story on his blog, in which he conjures up the “LDA buffet,” a restaurant that offers a menu of themes from which authors can choose to create new works.
The assumption is that authors pick from a finite number of themes for their fiction, and in the process mimic the experience of a diner deciding on various dishes that might be available at a buffet. In this case the diners happen to be Jane Austen and Herman Melville. The contents of their plates, be they representations of Persuasion or Moby Dick, are a collection of words mixed together in much the same way that their novels incorporate various “motifs, themes, topics and tropes (seasonal).”
After their meal, the pair come upon Ernest Hemingway, who was recently banned from the LDA Buffet and hopes to uncover what was on the menu by disecting and categorizing each word in their work. Jockers does a great job of relating Hemingway’s process to the work of digital humanists as they engage is topic modeling and macroanalysis. After reading Jockers’ piece, I’ve been wondering about the utensils that writers might use to choose items from this imaginary buffet. At the risk of overextending the metahpor, I began to think of notebooks as plates and search tools as serving spoons.
In considering this process, I was reminded of a recent New Yorker essay by John McPhee titled “Structure.”1 In the essay, McPhee explains the difficulty of organizing his writing into a coherent narrative. In so doing, he gives what I found to be a fascinating account of his interactions with technology as his writing process evolved over the years. McPhee relates his use of various methods to structure his creative nonfiction, from cutting and filing strips of paper to running macros on Kedit, a now quasi-defunct text editor. McPhee’s interview with an information technologist named Howard Strauss is particularly illustrative of this transformation:
He listened to the whole process from pocket notebooks to coded slices of paper, then mentioned a text editor called Kedit, citing its exceptional capabilities in sorting. Kedit (pronounced “kay-edit”), a product of the Mansfield Software Group, is the only text editor I have ever used. I have never used a word processor. Kedit did not paginate, italicize, approve of spelling, or screw around with headers, WYSIWYGs, thesauruses, dictionaries, footnotes, or Sanskrit fonts. Instead, Howard wrote programs to run with Kedit in imitation of the way I had gone about things for two and a half decades.
He wrote Structur. He wrote Alpha. He wrote mini-macros galore. Structur lacked an “e” because, in those days, in the Kedit directory eight letters was the maximum he could use in naming a file …
Structur exploded my notes. It read the codes by which each note was given a destination or destinations (including the dustbin). It created and named as many new Kedit files as there were codes, and, of course, it preserved intact the original set. In my first I.B.M. computer, Structur took about four minutes to sift and separate fifty thousand words. My first computer cost five thousand dollars. I called it a five-thousand-dollar pair of scissors.
Perhaps writing factual prose lends itself to this kind of editing process. When one is writing fiction, there isn’t necessarily a need to record and structure interviews and events. However, there is still (at least conventionally) a need for organization in terms of narrative, and for the structuring of one’s own ideas.
Humanists are often known for their serendipitous creative processes, and of course each writer has their own unique approach to their work. But in the case of “Structure,” I was struck by how McPhee’s pre-writing techniques were similar to some of the analysis done by digital humanists. For example, McPhee explains how, “Kedit’s All command shows me all the times I use any word or phrase in a given piece, and tells me how many lines separate each use from the next.” Of course, he was likely more worried with varying his word choice than topic modeling, but the similarities are there.
In any event, I began to wonder whether the powerful new tools currently being used for analysis will also be incorporated into the writing process itself. Perhaps this is already happening and I’m simply not aware. It would be amazing to read another essay similar to McPhee’s, but from the perspective of someone writing 50 years from now. I can’t wait to find out what new utensils will be available and how they will be used.
1.^Reading the article seems to require a subscription, but Indiana University folks should be able to read it here or here. Unfortunately, the database versions lack the useful figures in the original, which include various maps and charts.