Selecting a Digital Workflow

Disclaimer: workflows are rather infinitely customizable to fit project goals, intellectual patterns, and individual quirks. I am writing mine up because it has been useful for me to see how others are solving such problems. I invite you to take what is useful, discard what is not, and share what you have found.

Writing a dissertation is, among many things, a exercise in data management. Primary materials and their accompanying notes must be organized. Secondary materials and their notes must be organized. Ones own writing must be organized. And, when doing computational analysis, the code, processes, and results must be organized. This, I can attest, can be a daunting mountain of things to organize (and I like to organize).

There are many available solutions for wrangling this data, some of which cost money, some of which are free, and all of which cater to different workflows. For myself, I have found that my reliance on computational methods and my goal of HTML as the final output make many of the solutions my colleagues use with great success (such as Scrivener and Evernote) just not work well for me. In addition, my own restlessness with interfaces has made interoperability a high priority - I need to be able to use many tools on the same document without having to worry about exporting to different formats in-between. This has lead to my first constraint: my work needs to be done in a plain text format that, eventually, will easily convert to HTML.

The second problem is version control. On a number of projects in the past, written in Word or Pages, I have created many different named versions along the way. When getting feedback, new versions were created and results had to be reincorporated back into the authoritative version. This is all, well, very inefficient. And, while many of these sins can be overcome when writing text for humans, the universe is less forgiving when working with code. And so, constraint number two: my work needs to be under version control and duplication should be limited.

The third problem is one of reproducibility and computational methods. The experiments that I run on my data need to be clearly documented and reproducible. While this is generally good practice, it is especially important for computational research. Reproducible and clearly documented work is important for the consistency of my own work over time, and to enable my scholarship to be interrogated and used by others. And so, the third constraint: my work needs to be done in such a way that my methods are transparent, consistent, and reproducible.

My solution to the first constraint is Markdown. All of my writing is being done in markdown — including this blogpost! Markdown enables me to designate the information hierarchies and easily convert to HTML, but also to work on files in multiple interfaces. Currently, I am writing in Ulysses, but I also have used Notebooks, IA Writer, and Sublime text editor. What I love about Ulysses is that I can have one writing interface, but draw from multiple folders, most of which are tied to git repositories and/or dropbox. Ulysses has a handy file browser window, so it is easy to move between files and see them all as part of the larger project. But regardless of the application being used, it is the same file that I am working with across multiple platforms, applications, and devices.

My solution to the second constraint is Github and the Github Student Developer Pack. Git and Github enable version control, and multiple branches, and all sorts of powerful ways to keep my computational work organized. In addition, Github’s student pack comes with 5 private repos (private while you are a student), along with a number of other cool perks. While I am a fan of working in the open and plan to open up my work over time, I have also come to value being able to make mistakes in the quiet of my own digital world. Dissertations are hard, and intimidating, and there is something nice about not having to get it right from the very beginning while still being able to share with advisors and select colleagues.

But, Github hasn’t proved the best for the writing side of things. The default wiki is limited in its functionality, especially in terms of enabling commenting, and repos are not really made for reading. My current version control solution for the writing side of the project is Penflip, billed as “Github for Writing”. Writing with Penflip can be done through the web interface, or by cloning down the files (which are in markdown) and working locally. As such, the platform conforms to the markdown and the “one version” requirements. The platform is free if your writing is public, and $10/month for private repos, which is what I am trying out. I am using PenFlip a bit like a wiki, with general pages for notes on different primary documents and overarching pages that describe lines of inquiry and link the associated note and code files together.1 This is the central core of my workflow - the place that ties together notes, code, and analysis and lays the ground work for the final product. So far it is working well for the writing, though using it for distribution commenting has been a little bumpy as the interface needs a little more refining.

And the code. The code was actually the first of these problems I solved. Of the programming languages to learn, I have found Python to be the most comfortable to work in. It has great libraries for text analysis, and comes with the benefit of Jupyter Notebooks (previously IPython Notebooks). Working in IPython Notebooks allows me to integrate descriptive text in markdown with code and the resulting visualizations. The resulting document is plain text, can be versioned, and displays in HTML for sharing. The platform conforms to my first two requirements — markdown and version controlled — while also making my methods transparent and reproducible. I run the Jupyter server locally, and am sharing pieces that have been successfully executed via nbviewer - for example, I recently released the code I used to create my pilot sample of one of my primary corpuses. I am also abstracting code that is run more than once into a local library of functions. Having these functions enables me to reliably follow the same processes over time, thus creating results that are reproducible and comparable.

And that is my current tooling for organizing and writing both my digital experiments and my surrounding explanations and narrative.

  1. I somehow missed the “research wiki” bus a while back, but thanks to a recent comment made by Abby Mullen, I am a happy convert to the whole concept.

Bridging the Gap

I am so excited that Celeste Sharpe and I have been awarded an ACH microgrant for “Bridging the Gap: Women, Code, and the Digital Humanities”. This grant will help us create a curriculum for workshops aimed at bridging the gender gap between those who code and those who do not in the Digital Humanities

While our first “Rails Girls” event was quite successful, one of the most repeated pieces of feedback we received was that participants were unsure how to connect what they had learned about building a Rails application to their own scholarly work. This is not surprising - the Rails Girls curriculum is aimed at helping women develop the skills necessary to land technical jobs and so focuses on instrumental understandings of code. As our participants reported, this is not the most obviously applicable type of knowledge in the context of humanities research.

What is necessary instead, and what we have been given the grant to develop, is a curriculum that focuses both on technical skills and on computational thinking, the intellectual skill of breaking complex problems into discrete, algorithmically solvable parts.1 It is computational thinking that is necessary to answer the “but what can I do with it?” question. And it is computational thinking, even more than technical know-how, that is necessary for developing interesting research questions that can be solved with computers.

This emphasis on computational thinking, as opposed to instrumental knowledge, is also a response to a growing concern regarding the emphasis on tools within DH. This concern has already been articulated very well by Matthew Lincoln in his post “Tool Trouble” and by Ted Underwood in his talk “Beyond Tools”. To echo some of Matthew’s points, the focus on tool-use in DH runs the risk of hindering rather than developing computational thinking, hiding the computational work behind a (hopefully) pretty interface and deceptively clear and visually interesting results. The emphasis on “tool-use” reinforces a sense of digital humanities methods as ways to get answers to particular questions along the way to constructing traditional humanities arguments. As a result, digital work undertaken in this manner often fails to grapple with the complex, and often problematic, theoretically-inflected models that digital tools produce.

(As a grand sweeping side note, I think it is this tool-centric approach to digital humanities that is most likely to be adopted by the various disciplines. When computational methods are merely tools, they pose little challenge to established modes of thinking, making it reasonable to say that we will all be digital humanists eventually.)

Tools are useful - they provide a means of addressing particular problems efficiently and consistently. But tools are packaged ways to solve particular problems in particular, computational ways, and are designed according to particular theoretical and philosophical assumptions. And they must be interacted with as such, not as “black boxes” or answer generators that tell us something about our stuff.

Moving beyond tool-use requires computational thinking, and is the intentional combining of computational and humanities modes of thinking that, I think, produces the most innovative work in the field. In learning to think computationally, one learns to conceptualize problems differently, to view the object of study in multiple ways and develop research questions driven both by humanities concerns and computational modes of problem solving. One also becomes able to reflect back on those computational modes and evaluate the ways the epistemological assumptions at work shape the models being created.

Rather than teaching coding as one tool among many for answering humanistic questions, it is increasingly clear to us that it is necessary to teach computational thinking through learning to code - to learn to think through problems in computational ways and to ask new questions as the result. It is this pattern of thinking computationally that we are interested in fostering in our workshops.

  1. This is my reformulation of Cuny, Snyder, and Wing’s definition of computational thinking as “… the thought processes involved in formulating problems and their solutions so that the solutions are represented in a form that can be effectively carried out by an information-processing agent.”

Summer of Research, Part I

The Lean Dissertation

This summer I am beginning to work in earnest on my dissertation. The years of course work and exams are done, completed, passed and past. The experience is an odd combination of freeing and overwhelming, as many who have hit this stage before me have commented.

In an effort to get started on this dissertation thing, I am working first on a “chapter” analyzing Kellogg and the ways he and those associated with him applied the Seventh-day Adventist vision of health and salvation at the Battle Creek Sanitarium. While “chapter” is the standard way to refer to sections of dissertations, and so is useful, it is a bit misleading. For many dissertations, each chapter is a smaller argument that contributes to the argument of the whole, an essential building block in the construction of the historical narrative.

But this dissertation is a little bit different. In my dissertation prospectus, I proposed a digital project, one that uses computational methods and that investigates alternative, digital, modes of presenting historical interpretations. While all academic work is iterative, in that as one researches and writes, the ideas become clearer and the argument is refined, this sort of project requires an even more intentionally iterative approach. Rather than waiting to release a finished – and most likely large and complex – system, I am adopting the pattern of Lean development – building, measuring, and learning – and plan to move through smaller iterations of the project, testing my hypotheses against the data and against reader feedback as I go.

This means organizing my work in terms of questions and hypotheses that get to the core of my project. Rather than developing a segment of the argument or processing a set of sources, each “chapter” is instead a microcosm of the dissertation as a whole: a chance to test my hypotheses and to prove myself wrong. By setting up multiple experiments or “chapters”, I can refine the thesis while also building and testing aspects of the argument. And for all this testing to be useful, it also means that I have to be willing to pivot, to change approaches and let go of ideas that are less successful.

If this sounds like generally good academic practice, wonderful! However, too often the perceived expectations of the university make it difficult to take the risk of releasing as one goes, of investing in strains of thought that might not pan out. The impulse instead is to keep ones ideas and work close until they are “perfect,” out of fear of being “scooped” or being wrong. If, however, one begins with the assumptions that “ALL models are wrong” and that ideas are best developed when exposed to multiple inputs and lines of criticism, then it becomes valuable to test ones assumptions and approaches frequently.

And so, as part of this processes, I will be using the blog for a couple of aspects of the testing. First, I will be blogging about the technical work I am doing as a way of opening up and sharing my digital methodologies, both as a check on my work and as a resource for others. Second, I will be using the blog to share some of the interface components as I develop on them, as a way to test their usefulness before integrating them into a larger argument. And, hopefully, by the beginning of the fall semester, there will be another tab in the top nav to a first iteration of my dissertation project. I invite you to follow along and to offer comments, criticism, and suggestions as the project develops!