Bridging the Gap

I am so excited that Celeste Sharpe and I have been awarded an ACH microgrant for “Bridging the Gap: Women, Code, and the Digital Humanities”. This grant will help us create a curriculum for workshops aimed at bridging the gender gap between those who code and those who do not in the Digital Humanities

While our first “Rails Girls” event was quite successful, one of the most repeated pieces of feedback we received was that participants were unsure how to connect what they had learned about building a Rails application to their own scholarly work. This is not surprising - the Rails Girls curriculum is aimed at helping women develop the skills necessary to land technical jobs and so focuses on instrumental understandings of code. As our participants reported, this is not the most obviously applicable type of knowledge in the context of humanities research.

What is necessary instead, and what we have been given the grant to develop, is a curriculum that focuses both on technical skills and on computational thinking, the intellectual skill of breaking complex problems into discrete, algorithmically solvable parts.1 It is computational thinking that is necessary to answer the “but what can I do with it?” question. And it is computational thinking, even more than technical know-how, that is necessary for developing interesting research questions that can be solved with computers.

This emphasis on computational thinking, as opposed to instrumental knowledge, is also a response to a growing concern regarding the emphasis on tools within DH. This concern has already been articulated very well by Matthew Lincoln in his post “Tool Trouble” and by Ted Underwood in his talk “Beyond Tools”. To echo some of Matthew’s points, the focus on tool-use in DH runs the risk of hindering rather than developing computational thinking, hiding the computational work behind a (hopefully) pretty interface and deceptively clear and visually interesting results. The emphasis on “tool-use” reinforces a sense of digital humanities methods as ways to get answers to particular questions along the way to constructing traditional humanities arguments. As a result, digital work undertaken in this manner often fails to grapple with the complex, and often problematic, theoretically-inflected models that digital tools produce.

(As a grand sweeping side note, I think it is this tool-centric approach to digital humanities that is most likely to be adopted by the various disciplines. When computational methods are merely tools, they pose little challenge to established modes of thinking, making it reasonable to say that we will all be digital humanists eventually.)

Tools are useful - they provide a means of addressing particular problems efficiently and consistently. But tools are packaged ways to solve particular problems in particular, computational ways, and are designed according to particular theoretical and philosophical assumptions. And they must be interacted with as such, not as “black boxes” or answer generators that tell us something about our stuff.

Moving beyond tool-use requires computational thinking, and is the intentional combining of computational and humanities modes of thinking that, I think, produces the most innovative work in the field. In learning to think computationally, one learns to conceptualize problems differently, to view the object of study in multiple ways and develop research questions driven both by humanities concerns and computational modes of problem solving. One also becomes able to reflect back on those computational modes and evaluate the ways the epistemological assumptions at work shape the models being created.

Rather than teaching coding as one tool among many for answering humanistic questions, it is increasingly clear to us that it is necessary to teach computational thinking through learning to code - to learn to think through problems in computational ways and to ask new questions as the result. It is this pattern of thinking computationally that we are interested in fostering in our workshops.


  1. This is my reformulation of Cuny, Snyder, and Wing’s definition of computational thinking as “… the thought processes involved in formulating problems and their solutions so that the solutions are represented in a form that can be effectively carried out by an information-processing agent.” http://www.cs.cmu.edu/~CompThink/

Summer of Research, Part I

The Lean Dissertation

This summer I am beginning to work in earnest on my dissertation. The years of course work and exams are done, completed, passed and past. The experience is an odd combination of freeing and overwhelming, as many who have hit this stage before me have commented.

In an effort to get started on this dissertation thing, I am working first on a “chapter” analyzing Kellogg and the ways he and those associated with him applied the Seventh-day Adventist vision of health and salvation at the Battle Creek Sanitarium. While “chapter” is the standard way to refer to sections of dissertations, and so is useful, it is a bit misleading. For many dissertations, each chapter is a smaller argument that contributes to the argument of the whole, an essential building block in the construction of the historical narrative.

But this dissertation is a little bit different. In my dissertation prospectus, I proposed a digital project, one that uses computational methods and that investigates alternative, digital, modes of presenting historical interpretations. While all academic work is iterative, in that as one researches and writes, the ideas become clearer and the argument is refined, this sort of project requires an even more intentionally iterative approach. Rather than waiting to release a finished – and most likely large and complex – system, I am adopting the pattern of Lean development – building, measuring, and learning – and plan to move through smaller iterations of the project, testing my hypotheses against the data and against reader feedback as I go.

This means organizing my work in terms of questions and hypotheses that get to the core of my project. Rather than developing a segment of the argument or processing a set of sources, each “chapter” is instead a microcosm of the dissertation as a whole: a chance to test my hypotheses and to prove myself wrong. By setting up multiple experiments or “chapters”, I can refine the thesis while also building and testing aspects of the argument. And for all this testing to be useful, it also means that I have to be willing to pivot, to change approaches and let go of ideas that are less successful.

If this sounds like generally good academic practice, wonderful! However, too often the perceived expectations of the university make it difficult to take the risk of releasing as one goes, of investing in strains of thought that might not pan out. The impulse instead is to keep ones ideas and work close until they are “perfect,” out of fear of being “scooped” or being wrong. If, however, one begins with the assumptions that “ALL models are wrong” and that ideas are best developed when exposed to multiple inputs and lines of criticism, then it becomes valuable to test ones assumptions and approaches frequently.

And so, as part of this processes, I will be using the blog for a couple of aspects of the testing. First, I will be blogging about the technical work I am doing as a way of opening up and sharing my digital methodologies, both as a check on my work and as a resource for others. Second, I will be using the blog to share some of the interface components as I develop on them, as a way to test their usefulness before integrating them into a larger argument. And, hopefully, by the beginning of the fall semester, there will be another tab in the top nav to a first iteration of my dissertation project. I invite you to follow along and to offer comments, criticism, and suggestions as the project develops!

All Models are Wrong

This piece is cross-posted on the ACH Blog

“Essentially, all models are wrong but some are useful.” This phrase, attributed to George Box, was the mantra for the Data Mining for Digital Humanists course at the Digital Humanities Summer Institute last week. Throughout the course, we worked to understand the mathematical concepts behind some of the most common algorithms used to solve classification and clustering problems. In data mining, the goal is to create a heuristic that allows one to predict with a high degree of accuracy how to classify new data based on the data one has seen in the past. From Amazon and Netflix recommendations to Pandora and even self-driving cars, the results of data mining work are present all across our digital lives.

And yet all models are wrong. In thinking about how to use data mining in my own work, it was helpful to learn that the goal of data mining is not to perfectly fit the model to the data at hand. Data mining is predictive: based on the results of the past, you create a model that predicts the label or classification to be assigned to future data. If you match your model too closely to the original data, if you overfit, the predictive power of the model decreases. Models are created to generalize. There are always outliers and items that are misclassified, but the payoff is the ability to abstract and to get a reliable prediction when faced with new data.

All models are wrong but some are useful. This should not be surprising to scholars in the humanities. In history, we frequently work with models to help us elucidate processes and explain events. We use economic models, political models, and models of race, class, and gender to explain actions and causes. While useful for drawing attention to the ways particular factors shaped events in the past, these models are necessarily incomplete. We know that there is more than economics at play, more than politics, more than religion, more than individual genius.

One area of concern surrounding computational approaches to humanities scholarship is the question of truth claims. Will algorithmic approaches fool us into claiming that we have uncovered the “truth” of the text, the “true” cause of a historical event, the “truth” of the human experience? Are computational approaches at odds with humanistic thinking that stresses contingency, subjectivity, and irreducibility? These are open questions that we need to continue to explore.

But all models are wrong. Data mining that seeks the “truth” is data mining done poorly. I think computational approaches can be very useful in the context of the humanities when we are looking for useful patterns to shed light on particular questions. Different models of the same data can answer different questions or suggest different conclusions. This makes data mining not an answer generating machine but a way of modeling data that is itself the result of theoretical and interpretive choices.

Remembering that all models are wrong is very useful for my own work. I am working to model and discuss the ways health reform ideology, social constructions that delineate power relationships, and religious and theological commitments were in conversation in the development of Seventh-day Adventism. There are many ways to model language similarity and use, some of which will be more useful than others for exploring the interrelationship between multiple cultural systems. My goal, however, is not to create the perfect model - that is impossible. Instead, my goal is to create useful models that help elucidate particular aspects or patterns, models that will necessarily fall short and make no claims to expressing the “truth” of the historical processes I study.

What I am taking away from the data mining course is a better understanding of how those in computer science, statistics, and applied mathematics view the intellectual space where they work. Rather than black-boxes that give “truth,” the algorithms developed and used for data mining problems constitute a variety of models that work with different ranges of accuracy on different problems. There is no one right model, but there are a variety of algorithms that one can use to isolate aspects of a given dataset. This allows one to create useful models of the data in order to describe, make predictions, and better understand the material at hand. And this is an understanding that I think has many possibilities for productive overlap with research in the humanities.