Writing as Programming as Writing

Back in 2003, at the Digital Humanities conference in Athens, Georgia, Geoffrey Rockwell and I threw all caution to the wind and performed a live Brechto-Socratic dialogue on the relationship between programming and writing. People still ask us about it, but we’ve never been sure exactly what to do with it.

The intellectual glow of THATCamp must have put me in the mood, because this afternoon it came to me — the ideal forum for what Geoffrey and I privately refer to as “Untitled #4.”

It was impossible to give Geoffrey’s character his signature beard, but other than that, I would say it perfectly recreates the original performance (including our extraordinary range as “voice talents”). Enjoy!

You can also see it (and download it) here. Or here and here. Or, um, here.

Comments

Day in the Life

Today, I’m blogging on a different site as part of the Day in the Life of the Digital Humanities. There’s an RSS feed, if you’d like to drink from the fire hose.

I’m already doing it wrong, of course — my first post is way too long, and it won’t help with any kind of auto-ethnographic anything. But then, I’m skeptical toward this whole thing. And I’m on Spring Break.

Comments

The No-Reading Seminar

In my digital humanities classes, I always try to combine the technical with the philosophical (which, I believe, is one of the things that characterizes DH as a discipline). So, we’ll often study control structures on Monday and Wednesday, and then spend Friday talking about new media theory and digital humanities more generally. In the first semester, we read mostly excerpts and articles (McLuhan, Bush, Licklider, Turing, Hayles, Bolter, McCarty, Manovich, Kirschenbaum, and Rockwell show up pretty regularly). In the second semester, however, I usually suggest that we focus on one or two texts — preferably, some very difficult texts. Last semester we read a good bit of A Thousand Plateaus (we planned to read Badiou’s Being and Event, but didn’t get to it).

This semester, I had a bit of a brainstorm and suggested to the students that we might read Heidegger’s “The Question Concerning Technology,” but read it only in class with each other. In other words, no one is allowed to read the text outside of class. We all bring a copy of the essay, but then we put a version up on the screen for everyone to read, and we each take turns reading paragraphs. They liked this idea.

We’ve now done it twice and have made it all the way to the eighth paragraph of the essay. I’m not at all bothered by the slow pace, because I truly think that this is one of most enlightening class discussions I’ve ever been a part of (either as a student or a teacher).

What do we talk about? Mostly, we try to make sure that we understand Heidegger (this is a very difficult essay even relative to Heidegger’s already demanding corpus). But the real thrill, is that we end up thinking deeply about whether we agree or disagree with him, about our own definitions of technology, about causality, definition, ontology, and the tradition in which we’re reading. I walk out of the room thinking, “Now that’s a discussion,” while firmly believing that the professor is only a very small part of what’s going on.

As far as I can tell, the students are also finding it enlightening. We may burn out as winter turns to spring, but for now, I am being reminded every Friday of what the classroom is all about.

Comments (1)

The Paradox of the University

This past February, the College of Education and Human Sciences at the University of Nebraska-Lincoln extended an invitation to William Ayers (Distinguished Professor of Education at the University of Illinois at Chicago) to speak at its annual student research conference — an event which, this year, corresponds with the 100th anniversary of the College. Ayers was thought a good choice by the faculty committee charged with selecting a speaker. He is an internationally known scholar — the author of 17 books and more than 100 articles in his field — and an authority on urban educational reform. His talk was to be about qualitative research methods in education.

A week or so ago, that offer was rescinded. The official explanation is that Ayers’s visit represented an unacceptable security risk for the University. The risk, however, was not posed by Ayers himself, but by the thousands of Nebraskans (I infer that number from the number of emails the University received) who were incensed by the decision of the College. Over the course of the last week, I have read impassioned denouncements of the University written by citizens of Nebraska, alumni, and prominent donors. The governor of the state has called the faculty’s decision an embarrassment. Even those entrusted with the administration of the University — the President and the Chair of the Board of Regents — have criticized the choice of Bill Ayers as a speaker. One Regent has condemned my colleagues in Education for their arrogance. The University’s threat assessment committee was able to cite evidence that some were moved to such anger over this invitation, that they appeared to be contemplating violent acts against people attending the symposium and Ayers himself.

I find all of this deeply troubling. Like many, I am inclined to use the term “academic freedom” as an alias for my frustration and outrage over what has just occurred. But in reality, I feel that something deeper and more vital has been attacked and denigrated by these events. This deeper thing is the idea of the university itself — an idea to which I have literally devoted my life. As a citizen, I can easily withstand the will of the majority being contrary to my own. I might even be able to carry on as a scholar and an intellectual without exposure to Bill Ayers’s ideas. But as a professor at the University, I cannot do my work — which, I would like to argue, is also the people’s work — without the social contract that allows universities to exist. If the people of Nebraska, their Governor, and the University’s own administrators do not believe in that contract, then I believe we run the risk of having a university only in name.

Bill Ayers provides an apt occasion for talking about this contract and about the consequent notion of a university. So let me stipulate a few things about Bill Ayers for the sake of argument. Let me first assume that Bill Ayers purposefully advocated and participated in direct, violent action against the United States in order to protest the Vietnam War. Let me further suppose him to be — as many have charged — wholly unrepentant toward these acts. We will, for the sake of this discussion, assume only that he is not now a violent criminal or a fugitive from the law.

The question is this: Should the faculty of the College of Education — or, for that matter, any faculty at any reputable institution of higher learning — be permitted to invite such a person to speak?

I believe that the answer to this question must be “yes.” I further believe that this answer is basic to the definition of a university, explanatory of the university’s role in society, and essential for the health of a civilized society. I believe that answering “no” to this question introduces intolerable restrictions on the intellectual life of a nation (this one, or any other), and that it has dangerous consequences for democracy and freedom.

These are bold claims. I hope they will also be understood as being, at least in intention, patriotic claims. But in order to make any claim at all, am I not obligated to defend Bill Ayers?

In fact, I am not. Neither is the College of Education, the Deans, the President, the Chancellor, or the Regents. And this is because the proper discernment of Bill Ayers’s ideas is the very reason we bring him into a university environment with the request that he share and elaborate his viewpoints to a wider community of scholars.

One possible objection seems obvious: Are not Bill Ayers’s ideas manifest? And are they not manifestly evil? And if they are, what possible choice do we have but to accuse the UNL faculty of endorsing those ideas? This has been the substance of most of the attacks leveled against the University. We are accused of “having an agenda” and of forcing that agenda on others. And not just others! We are accused of forcing our (liberal, socialist, anarchist, anti-American) views on students, who, being innocent and impressionable children, are left virtually defenseless and without recourse toward more balanced and even-handed forms of instruction.

Yet this approach to the question of Bill Ayers is quite obviously an example of the very thing that we are being told we must not do. The opposite of the scholar who “has an agenda,” after all, is the scholar who is presumably neutral, dispassionate, and willing to hear both sides. A truly dispassionate scholar would have to invite Bill Ayers to speak — even if the result of that engagement was condemnation. So on its face, the notion that the veracity or usefulness of certain ideas are already a settled matter in advance of any investigation betrays an “agenda” of its own — a belief that universities actually should not be neutral and dispassionate, but should instead support the predetermined beliefs of the society that supports it.

But then what of Bill Ayers himself? He is surely not at all dispassionate. He has an agenda, makes no apologies for it, and is happy not only to argue his positions in open debate, but to use his position as a teacher to convince his students that he is right. If we condemn, in the foregoing bit of sophistry, the partisans of neutrality for their own secret agenda, are we not compelled to condemn Bill Ayers himself for the same thing? He might be a professor, but he’s surely as prejudicial in his views as any one of those who condemn his agenda.

I believe I have just set forth — in three paragraphs — the paradox of the university. On the one hand, it proclaims itself, as Thomas Jefferson once said, “not afraid to follow truth wherever it may lead, nor to tolerate any error so long as reason is left free to combat it.” On the other hand, it harbors people who, having followed truth “wherever it may lead,” will proclaim quite loudly that they have found it. The university is therefore at once both open-minded and doctrinaire, neutral and biased, relativistic and dogmatic.

I wouldn’t have it any other way, and I think the people’s money is well spent by supporting such institutions. Because this is how we, as a society, honor what Jefferson called “the illimitable freedom of the human mind.” It is also how we produce informed, responsible citizens and advance human knowledge.

To understand the educational role of the university, we need to dispense with the notion that universities are sites of unified belief and opinion. I suppose I could rest that claim on my own experience, but one’s own intuitions about the behavior of human beings will probably suffice to make it plain. A student making their way through the University will have to deal with Bill Ayers’s attempt to convince them of his views, but they will also have to deal with those who disagree completely with Bill Ayers (because in academia, there is nothing to be gained, professionally speaking, from thinking like someone else). They will have to contend with those who think that persuasion is always inappropriate in a university environment, and with those who think it is the only coherent rationale for teaching. Such dizzying oppositions are, moreover, not confined to courses on political matters; they are, rather, constitutive of higher education in any subject. In one class, Shakespeare is portrayed as an Elizabethan radical. In another, it is demonstrated that he was an obsequious toady of the Queen. A third eschews all politics so that Shakespeare’s language can be discussed and illuminated. In some classes (in my classes), all three ideas get aired. Or rather, one is put forth until I get the sense that my students might be starting to agree. Then I forcefully argue the opposite. This is sometimes called the “Socratic method.” Most teachers (including Bill Ayers) understand it to be the oldest trick in the book.

The goal of this trick is not to convince students that one idea is right and the other wrong, but to get them to distrust — with all their being — knee-jerk opinions, empty bromides, hasty conclusions, and unreflected assumptions. Or rather, it is to convince students that one idea is right and the other wrong — because that is the only real and genuine way to bring about intellectual maturity. As individual scholars, we have our own agendas. As members of a corporate institution, we distrust agendas with all our might. We are literally both. We are trying to create students — and by extension, citizens — who are literally both. We want them to listen to both sides, but have strong, heart-felt (and informed) opinions. We want them to be open to the truth wherever it may lead, but we also want them to speak the truth (especially to power). In this project, the ability of the individual student to accept or reject an idea is presupposed. We do not regard them as children, but as adults capable of mature judgment.

As a research institution, universities try to produce ideas that are of benefit to society. If they are successful in doing that (and here, I am thinking of everything from educational policy to nanotechnology), it is because they are utterly ruthless in the way they vet ideas. I think there is a perception that Bill Ayers’s visit would be a kind of love-in in which the choir is subjected to preaching and preconceived notions are affirmed. If so, I believe it would be an unusual — if not a unique — moment in the history of the modern academy. In fact, Bill Ayers’s ideas on “qualitative methodologies” (to say nothing of his ideas on armed political action) would be subjected to what would be regarded in most circumstances (in the public square or on television, for example) as withering critique. Ayers himself would be surprised if that didn’t occur, and those in attendance would consider the symposium a great success if it did. It is another instance in which the paradox of the university manifests itself. We want people like Bill Ayers to have strong opinions. We also want to criticize those strong opinions. The truth that emerges from such collisions is the only kind of truth universities know how to make, and it has led to advances in every area of human inquiry and need. If there is a solution to the problems that confront us as a society — in matters ranging from bone cancer, to Middle East policy, to the nature of human love — that solution will in all likelihood first emerge in a laboratory or a seminar room where academics are doing what they do best: fighting and arguing over who’s right.

We now turn to a question that naturally emerges from consideration of the nature of universities: Who decides who gets a hearing in this forum I have described? Who gets to speak?

It won’t do to say, “trained academics” or “those with Ph.Ds.” Academia is not composed entirely of such people, does not confine its invitations exclusively to itself, and would be considerably impoverished if it were to do so. What it does demand, however, is that the people inviting and the people being invited both agree to the principle that truth and neutrality are not contradictory concepts. They must agree to be at once humble and audacious. They must be as quick to admit error as they are zealous of their own opinions. This is the distinguishing feature of a faculty.

They got that way not by taking certain courses or acquiring certain degrees, but by having been mentored into a community that is utterly intolerant toward people who believe in some lesser version of truth and neutrality. There are those at the university who believe that Ayers’s actions as a member of the Weather Underground were gravely immoral. There are also those who believe that it is not only permissible to take up arms against an oppressive regime, but an obligation of a free people (they cite the founders of this country as an example). Such people very often occupy the same department. Their disagreement might be deep and even personal. But all faculty members are resolutely committed to the idea that a question like this deserves careful examination and scrutiny. They want a forum in which to examine words like “immoral” and “oppressive.” They would renounce their own positions in the debate before they would renounce their belief that such forums are necessary and vital for the continuation of civilized society. These people decide.

Such a system places great demands on a society. For while they may benefit in obvious ways from the fruits of a university (rendered in the form of an educated populace and through the donation of useful ideas), they have to tolerate what might at first seem offensive to freedom. They have to allow these professors to make decisions about who they need to listen to, whom they accept into their fold, what they talk about, and what they say to their students. They need to do this without interference from government and the public square (where the paradox of the academy cannot usefully exist in a permanent state). Even the interference of administrators damages the integrity of the system. Because without freedom from interference, both the education of students and academic research suffer.

One might suppose that the case of donors is different, and it is. But we must be clear about what a donor does when they withhold funds. They are not refusing to support the actions of the faculty, or its politics, or its decisions. They are refusing to fund the idea of the university. We do not take the generosity of those who contribute their own wealth to the maintenance of this idea for granted; we are humbled by it and grateful for it. But we do insist that people giving money to universities know what it is they’re supporting. In a sense, we ask them to support the paradox. If you are a donor, you won’t like everything we do. We don’t like everything we do. We believe in something greater. We hope that you do as well.

Reasonable people can disagree about Bill Ayers. People can also disagree about Bill Ayers’s having a place in our university forums. But people cannot condemn the right of the faculty to make such decisions and still be supporting the idea of a university — this, or any other. I call upon the citizens of Nebraska to support what Governor Heineman recently called (in his rejection of the decision to bring Ayers to campus) “the people’s university” by supporting the ideals upon which the modern university was founded. As an employee of this University — one honored and privileged to be counted among those who decide on matters of education and debate at this institution — I call upon all administrators loudly and forcefully to support the idea of a university and the academic freedom without which it literally cannot exist. Finally, I call upon the Board of Regents to recognize their role as those, first among citizens, who commit themselves to supporting the project of university research and education against all challenge from without, even when — especially when — the will of the people moves against the ideals that make us a university.

Stephen Ramsay
Lincoln, Nebraska
October 26th, 2008

Comments (7)

The Large Hadron Collider Explained

Hack-a-day is reporting that CERN has released the manual for the Large Hadron Collider. Just in time, really, because I’ve been thinking of buying one of these.

Large Hadron Collider

Of course, the manual contains the usual stuff:

  1. Make sure the Large Hadron Collider is plugged in.
  2. Note that investigation of supersymmetric particles, strangelets, and micro black holes can lead to injury or death. It is a violation of Federal law to use this product in a manner inconsistent with its labeling.
  3. Please be sure to complete the enclosed Product Registration Card in order to receive important updates for your Large Hadron Colliider, and to receive notice of new products from CERN.
  4. Your Large Hadron Collider comes with an extra Compact Muon Solenoid. 96 tons of liquid helium sold separately.

Comments (3)

Have You Hugged Your Sysadmin Today?

I had no idea that today was System Administrator Appreciation Day — that is, until I got a very sweet note from a colleague thanking me for maintaining the dev server at CDRH.

Sysadmins are sometimes thought of as occupying one of the lower rungs of the technical ladder (and most are fond of exploiting that fact with lots of self-deprecating humor). But really, most of us secretly enjoy the engine room. For me, system administration is sort of like fixing your own car. I really don’t have to do it, but I like doing it. I love a good, well-oiled (up-to-date, secure, tuned) machine, and when there’s a crisis, I sometimes feel like the Master Chief on the flight deck of an aircraft carrier. Pilots get all the glory, but it’s the guy with the wrench who keeps the birds in the air. Thanks, y’all.

Comments

MLA Stylin’

Like most of you, I’ve been perched on my front porch every day for two weeks waiting for the new (3rd. edition) of the MLA Style Manual and Guide to Scholarly Publishing to arrive. At long last, it came. Naturally, I sat down and read it cover to cover.

I don’t want to spoil it for anyone. I will note, however, that my lifelong dream of having one of my articles cited as an example of the use of italicized titles was not realized.

However, that insufferable slight was almost ameliorated by a new section on Fair Use that, in comparison to the cold informational tone of previous editions, almost rises to the level of protest. I particularly welcome the addition of such detailed explanations, which occur amidst frequent mention of the case law governing the Fair Use provision:

Congress intended the statutory provision . . . to restate the fair use doctrine that existed before the passage of the act, not to change, narrow, or enlarge it in any way, as the reports of the House and Senate committees make clear. Accordingly, all decisions of the courts before and after the 1976 Copyright Act are relevant to the determination of copyright law. [. . .] Furthermore, the Copyright Act makes no statement amount the relative importance of the [four] factors, and the Supreme Court clarified in Campbell v. Acuff-Rose Music, Inc. (1994) that no one factor is more important than the others, nor must the use be supported by all four factors to be fair. (51)

And my favorite . . .

Although one occasionally hears that it is acceptable to use some percentage of the work or some specified number of words, neither the statute nor any regulation nor case law sanctions such guidelines on the quantity of material protected by copyright that may be taken without permission, and authors should not rely on them.

Actually, one doesn’t “occasionally hear” that it is acceptable. I have yet to encounter a library or department that doesn’t hand out a sheet describing exactly how many pages (or lines, or words) one can copy from a text before it violates what is widely understood to be the most important of the four factors (”The effect of the use on the potential market for or value of the copyrighted work”). “Very few,” is the message communicated by these guidelines, and yet most people I know understand it as an articulation of the Fair Use provision. Few seem to be aware that these guidelines were written by the Association of American Publishers — a trade association primarily concerned with protecting the industry — and have no basis in law.

It’s refreshing to see an articulation of Fair Use (put forth by a major scholarly society) that does not attempt to frighten authors into complying with the industry’s reading of the statute, but instead subtly urges American authors to assert their Fair Use rights as citizens engaged in “criticism, comment, news reporting, teaching . . ., scholarship, or research” (51). Perhaps we could excerpt this fine section (2.2.13) of the MLA Style Manual and hand it out in department copy centers as a replacement for the AAP’s manifesto?

Comments (3)

Digital Campus

I was delighted to be a guest (along with Bill Turkel) on Digital Campus for their 25th episode. I haven’t listened to it yet — and so I’m not sure to what degree I made a fool of myself — but it was great fun to hang out with Bill, Dan, and Tom.

Digital Campus, of course, is the fantastic podcast put out by The Center for History and New Media at George Mason University.

Comments (4)

The Race Car Bed

Some time ago, I posted an essay on craftsmanship that featured a piece of furniture that my father built. That essay turned out to be the most popular blog post I’ve written, and it also led to a number of emails expressing admiration for my father’s skills as a woodworker. It is with great pleasure, then, that I post this most recent example of my father’s work — a “race car” bed for my three-year-old nephew Angus:

race_car1

race_car2

race_car3

Now, I’m going to guess that most of my readers are over the age of thirty. But I know what you’re thinking: Can I have a race car bed?

Comments (8)

High Performance Computing for English Majors

[HPC has been coming up a lot lately in conversations I've been having with other DH specialists. Or it was, before I went in for sinus surgery a week ago. I'm still recovering from that, and I'm not really sure about my ability to blog coherently. So please accept this essay from the archives. It's from a talk I gave at MLA in 2006.]

There are people in this world who spend untold amounts of time tweaking and tuning their cars for some perceived need for high performance that seldom materializes on roadways intended for passenger automobiles. They spend hours “modding” their rides: changing the gas-to-air ratio, boring out the cylinders, fiddling with the feathers and springs on the shock absorbers, and injecting nitrous oxide into the fuel line in order to get “Dude, like 450 horsepower” out of a sedan principally designed to ferry children to and from school.

I have precisely this relationship with computers. The latest chip, the fastest disks, the most efficient bus architectures all fill me with a kind atavistic frisson. And once I lay my hands on the geek equivalent of NOS, I start rebuilding the kernel, changing the shared memory footprint, altering the thread model, reconfiguring the drive geometry, and adding optimization flags to my C compiler. It is true that my machines are often on the verge of melting, but that’s the price of perfection. There’s even a special version of the Linux kernel for bleeding-edge speed freaks called the “Love Kernel.” It’s essentially the standard Linux kernel with hundreds of high-speed performance patches applied indiscriminately. Here’s a quote from the README for the Love kernel:

IMPORTANT: steel300 and OneOfOne remind you that the patches here are sometimes experimental and could explode upon impact, make your [soda|pop] really bland, or other badness. We aren’t responsible for that, but we will mention that these patches will also make your kernel ROCK LIKE NINJA.

And that’s what I want to do. I want my computers to rock like ninja.

In a sense, ordinary training in software design is responsible for creating this insane desire for speed. The entire study of algorithms and data structures is framed by a concern with the trade-offs between time and space. If you undertake formal study of these matters, you find that much of what you’re doing is calculating the best and worst case scenarios for storage and retrieval within a particular data structure or under the strictures of a certain algorithm. After awhile, you can’t help but equate faster, smaller, and more scalable with better.

But if you study software engineering and design methodology at any level of detail — or better yet, start writing production code — you quickly discover that this equation is downright dangerous. Code optimization is fine when you’re talking about a fake implementation of a sorting algorithm. In a large, complex system intended for actual users, however, premature optimization is more than likely to result in brittle, unreadable code. And this assumes that you understand where the bottlenecks are in the first place. This is why even a brief foray as a computational test pilot will cause one to develop certain rational instincts about code efficiency. You begin to lower the bar to something like “fast enough” in order to create code that is more easily maintained and understood. You begin to distrust any optimization that isn’t completely verifiable using profilers and benchmarking tools. You begin to realize that it might be safer and more efficient to drive the kids to school in a minivan. Or rather, you realize that this is the rational position, even as you irrationally try to break the sound barrier.

I have been writing software for use in the context of digital humanities for about ten years. During that time, I have written thousands of lines of code, but all of it has fallen neatly into one of two categories. Either it was intended to deliver data to the Web, or it was intended to perform some kind of data analysis operation offline. That covers a lot of different types of systems, of course. Sometimes the data being delivered to the Web consisted of reams of GIS data that had to be paired with text, styled, and delivered to a client framework that would render a real-time animated map. Sometimes the offline data analysis consisted of computing complex graph theoretical algorithms for the purpose of studying relationships within a corpus. But in the former case, network latency had the effect of making most of my shrewd optimizations seem futile. Why work for hours on some little speed hack when the processing that occurs prior to network delivery and rendering is only a small fraction of the total end-to-end userspace time? In the latter case, it really didn’t matter how long the analysis took. I was the only one who needed the data, and there really wasn’t any particular rush. Who cares if it takes fifteen minutes — or even fifteen hours — to crunch the numbers?

For the last few years, I have been giving talks in which I proclaim an “age of tools” in digital humanities, and the evangelium goes something like this: Over the last twenty years, we have spent millions digitizing texts and putting them online. The resulting digital full-text archives are among the greatest achievements in digital humanities. Yet for all their wonder, they remain committed to a vision of digital textuality firmly ensconced within the metaphor of the physical library. You can browse the text, read the text, search the text, and even download the text, but you can’t really do much beyond that. It is time to start thinking of ways to exploit this data with analytical tools and visualizations. Ideally, such tools should be an integral part of the experience of working with Web-based text collections.

Several of my colleagues in the field are working on something like this, including my fellow panelists [Greg Crane and Geoff Rockwell - ed.]. My own contribution is as a member of the Nora Project, which endeavors to implement the credo outlined above with an emphasis on particular varieties of text analysis — including, most significantly, data mining and machine learning algorithms. I won’t speak for Geoff and Greg, but I think I know why I’m here today talking about high-performance. It’s because for the first time in my career, caffeine-addled speed optimizations seem not only warranted, but necessary.

They’re necessary, because when we talk about large, full-text archives empowered by text analytical tools and visualizations, we’re really talking about trying to make procedures traditionally thought of as batch-processing jobs and importing them into a world in which, as Jacob Nielson famously noted, you have eight seconds to do something interesting.

Our data mining operations rely on massive matrices of data drawn from text corpora. For example, we might have a giant table (consisting of millions of cells) where one column is filled with word frequency counts, another one is filled with markers indicating the presence or absence of a certain feature, another is filled with ratios between nouns and verbs, and so on. We start out not knowing what any of this data really means, but we do know that texts (or parts of texts) in the corpus cluster in certain ways. There are genre distinctions, years of composition, different authors, different countries of origin. So we add one more column of data indicating the “label” for the particular text or text section. Text classification is the process of using statistics to figure out what patterns of low-level features conspire to make a text fit a particular label. So the usual method involves having a domain expert label some of the texts, and then setting the data mining algorithms loose on the rest of the matrix, so it can generate a set of predictive rules. If the rules are robust (and this is the exciting part) you should have a system that can correctly assign labels for texts it has never seen before. And, of course, the labels can be anything at all.

We’ve used data mining to create things like systems that can detect eroticism and sentimentality in English poetry and prose. And as soon as we say that, two objections emerge immediately. First, “Do we really need a system that can tell us that a particular Shakespeare play is a history? Don’t we already know that?” And second, “Who decides what passages are erotic or sentimental in the first place?” The first objection is an entirely sensible one, but what really intrigues us is the fact that the system often gets is “wrong” in some thoroughly thrilling way. The first time we ran a data mining operation on Shakespeare, it calmly informed us that both Romeo and Juliet and Othello are comedies. The computer scientists on the team were ready to go back to the drawing board, but the literary critics were more excited than ever, because, of course, a number of influential critics have noted that these two plays follow the basic dramatic structure of comedy, and all we wanted to do was look at the generated rules to see what low-level features are complicit in this subtle moment of generic ambiguity. The second objection — “who decides what the labels are” — is also a sensible objection, but we have an easy answer to that one. The user should decide. The user should be able to choose what vectors go into the matrix, and choose the labels.

And that brings me, at long last, to the main topic of this panel. Because until recently, no one has thought of data mining as a live, interactive process. To undertake meaningful data mining on full-text archives of literary texts, you need to parse the XML documents, tokenize them, run a series of natural language processing algorithms (to determining things like parts-of-speech), check them against a gazetteer (for named-entity resolution), and then crunch all the numbers. Then you need to assemble all of that data into a matrix. Then you need to do the actual data mining algorithm. Then you need to deliver it to the client and render it. This always takes hours, and it occasionally takes days. If you’re offline, it doesn’t matter (though even offline, you want to come to this problem fully armed with high-performance equipment). Online, it violates Nielson’s eight-second rule in a way that borders on the grotesque.

It’s possible to approach the optimization of this process in a thoroughly rational manner. First, you look at the whole end-to-end system and try to divide the operation into things that bind early and things that bind late. There’s no reason to parse the XML data and do the feature extraction live. All of that can be done at the pre-processing stage and loaded into a datastore of some kind. It might take days to do that, but if you’re clever, you can get a ton of “canned” data ready to be loaded into a matrix for analysis. After you’ve done that, you can think about ways to minimize the amount of data the system has to analyze, perhaps by segmenting the data in such a way that the system has less material to sort through as it loads the matrix. You might then look for obvious inefficiencies in the analysis layer itself, and try to optimize those as much as you can (without creating brittle, difficult-to-understand code). Finally, you can figure out ways to distribute the analytical process across multiple processors.

We’ve done all of that. We’ve canned it, chunked it, speed-hacked it, and even figured out a way to multithread the process across any arbitrary number of processors. The resulting system is dazzlingly fast. It’s just not fast enough for the Web. And so it is time, we think, to turn to some serious hardware.

And when we say serious, we’re not talking about expensive servers (we’ve got those). We’re talking about seriously expensive servers — distributed clusters of the sort that are used for things like particle physics, weather simulation, and the video rendering for Attack of the Clones. And that’s a problem.

It’s a problem, because in the context of a university, “high-performance computing” isn’t a technical term at all. It’s a financial act of faith made by very senior members of the administration, and a site of intense territorial protection by the “hard” scientists who help to make that act of faith seem less fraught with religious peril. A bunch of English professors who want to get into high-performance computing need to convince administrators that they should get a piece of the pie, and they need to convince the physicists that literary critics have just as much of a right to these resources as anyone else. Which should be an easy matter. All we need to do is talk to the people who are exploring the origins of the universe, and ask them to step aside for a moment while we look for dirty words in Dickinson.

And, of course, we won’t be asking them to step aside “for a moment.” Nearly everything done on these systems represents a batch job. The experiment (or the video rendering task) might take a long time, but it usually has a beginning and an end. We’re talking about ongoing processes running on a kind of supercollider Web server. Perhaps we need our own high-performance cluster? But then, who pays for such a thing? Digital humanities can bring in grant dollars, but most of the funding agencies we deal with are loath to fund even moderate amounts of overhead. Perhaps we are in over our heads.

Now, I’ve already confessed to being a semi-delusional, speed-obsessed maniac. Perhaps all of this represents nothing more than the idle fantasy of someone who wants “Dude, like 450 million words per second.” Surely, there’s much that we can do to bring about the age of tools without pouring millions of dollars into hardware. Why be so ambitious at this early stage? Do we really need to be thinking about high-performance computing for English majors?

I think we do need to be thinking about it — not because it’s a thing we need to have today, but because it’s a battle we’re going to need to fight tomorrow. To get where we are now in terms of text collections, we had to fight for resources that were unheard of among humanists. We were successful in that effort, not because we came up with outstanding technical arguments, but because we succeeded in effecting a cultural change at our institutions. We were able to convince Vice Presidents for Research that we could attract students and grant dollars. We were able to convince University Presidents that digital humanities was something of wide interest to the public (not to mention donors). We were able to convince library Deans that research efforts in this area could pay dividends in terms of prestige. And finally, we were able to convince our own professional societies (including the MLA) that scholarship in this area was essential to the future of the academy (witness, for example, that most astonishing of documents, the “Guidelines for Evaluating Work with Digital Media” put out by the MLA this year).

Of course, one need not act like a Ninja in order to rock like one. The best way to get into the high-stakes game of high-performance computing is to create compelling reasons to participate. I continue to believe that bringing analytical procedures to existing digital archives — particularly those that are as easy to use as search engines — is a worthy, if ambitious goal. Shadetree mechanics might have little hope of building their own highways, but clever digital humanists, by remaining committed to broad visions of the power of full-text archives, might well create the conditions in which high-performance becomes an ordinary part of our work as a discipline.

Comments (1)