Just a short followup to our in-class discussion of Matt Jockers’s Macroanalysis.  Not surprisingly, much of Jockers’s general message was preaching to the converted, given that we’re all here as volunteers interested in the digital humanities. Not surprisingly, much of what we had to say in terms of general appreciation, statistical methodology, presentation, and so forth has been covered in the blogosphere in one place or another or another or anotherBut still, there were some intriguing challenges and questions, both in terms of the DH methodology and in terms of the literary analysis.  One area in particular bridged the two: the consideration of literature as part of the culture industry.

First, the disclaimer. Overall, we were all very impressed by the sustained case Jockers made both for his use of quantitative tools in general and the specific uses to which he put those tools.   There’s still some convincing to be done.  One of his central claims is that we see more by “looking” at a larger data set than any one person could read.  No arguments there.

That said, Jockers demonstrated by examining Charles Fanning’s claim that there was a sudden dearth of Irish-American writing (Irish in America, writing about Irish-American experiences) in the 1910s and 1920s.  By looking at a large dataset, rather than relying on his own reading, as Fanning had done, Jockers showed that in fact there was not a drop-off overall, but only among those Irish writing in the eastern United States.  That’s an important finding.  But I wonder: was Fanning’s failure to see the same thing because he could only read so many books, that is, because he relied only on his human faculties rather than computer-aided number-crunching, or just a blind spot, or just an accident of what archives he had access to?  In other words, was it because of the difference between close reading and distant reading, or could/did that blind spot be a symptom of other factors not necessarily related to those different ways of analyzing literature?  I’m inclined to believe Jockers’s overall conclusions that there was not a dearth of Irish-American writing in the 1910s and 1920s, but I’m not sure that his conclusion necessarily makes an iron-clad case for his methodology in this case.

On another note, as several students remarked, while Jockers’s data-intensive methodological approach is fairly new to literary studies, his view of how literature is produced, by whom, and for whom is very traditional.  In Macroanalysis, authors are treated as individual auteurs, rather than, for the most part, creative writers who also had to make a living.  This is not to say that Jockers implies that each author is sui generis—far from it.  In fact, most of his work involves considering the relationship between individual authors and their cohort (in time and in space).  But as Alex Koch said, “I kept on thinking about Adorno as I read this piece.” Especially as we considered Jockers’s fascinating development of a statistical model to analyze literary influence, we considered that there are two influences missing: that of publishers and audiences. I’d be willing to bet that if Moby-Dick had sold thousands of copies rather than hundreds, then Jockers’s topic modeling engine would have detected whaling as a much more prevalent theme. Why? Because publishers would have been more likely to print more of those kinds of books out of the assumption that they’d at least break even on the deal. Who knows how many whaling novels were rejected because of lack of interest, or how many slavery-oriented novels were more or less commissioned by publishers after the blockbuster success of Uncle Tom’s Cabin. One suggestion as a way to consider this: given that Jockers has author and date metadata, does he also have access to the name and location of the publishers?  And how about how many editions each book went through? To find and code all this for over 700 novels wouldn’t be easy.  But it might be a way to extend the analysis using the same methodology but adding another dimension.

Another industry-related thought: Jockers’s tools and methodology are pretty close to state-of-the-art for the humanities. But, as I learned doing Scholar’s Dashboard (an NEH start-up grant to consider what tools/interface features humanities scholars would like for online collections), humanities state-of-the-art looks primitive compared to the kinds of analysis big commercial firms are doing. We’re looking for influence with a few hundred datapoints? Think of what Google, Netflix, Bing, and Amazon are doing to figure out what we’re searching for or might want to buy. Becky Jenkins had worked for Amazon, and noted that what we’re doing is child’s play.

And on that culture industry front, what’s being done in music and movies these days.  Matt Younglove had visited the Muzac plant recently, and they’ve made a science of figuring out what songs and when to play in individual stores based upon target demographics and various metrics concerning the songs (meter, pitch, artist, onset and decay, you name it). Netflix’s recommendation engine for figuring out what movies we’d like is worlds beyond anything we’re doing in DH.  But then again, one way they improved it was by a crowdsourcing contest in which they gave away $1 million. Are we in DH reinventing the wheel, because of our lack of resources? Or maybe just reinventing the tricycle, while industrial giants are designing electric cars?

All in all, with the possible exception of Moretti’s Graphs, Maps, and Trees, Macroanalysis might be the best demonstration of how DH methods can actually address scholarly questions—and provoke new ones.

Whatcha thinkin'?