Using pyLDAvis with Mallet

Update: I think I have finally figured out the trick to smoothing for the document-topic matrix - really I was making things more difficult than I needed to. Because I optimized the topic distribution, the alpha value IS the list of values, which can be added to the matrix of document-topic counts without any additional trouble or manipulation. I’ve updated the code and the visualization below to reflect this change. As a result, there is more variation in topic size (which makes sense) and the number match the doc-topics output from MALLET (which I should have been using to verify my results all along.)

... Read More

Ways to Compute Topics over Time, Part 4

This is part of a series of technical essays documenting the computational analysis that undergirds my dissertation, A Gospel of Health and Salvation. For an overview of the dissertation project, you can read the current project description at jeriwieringa.com. You can access the Jupyter notebooks on Github.


This is the last in a series of posts which constitute a “lit review” of sorts, documenting the range of methods scholars are using to compute the distribution of topics over time. The strategies I am considering are:

To explore a range of strategies for computing and visualizing topics over time from a standard LDA model, I am using a model I created from my dissertation materials. You can download the files needed to follow along from https://www.dropbox.com/s/9uf6kzkm1t12v6x/2017-06-21.zip?dl=0.

... Read More

Ways to Compute Topics over Time, Part 3

This is part of a series of technical essays documenting the computational analysis that undergirds my dissertation, A Gospel of Health and Salvation. For an overview of the dissertation project, you can read the current project description at jeriwieringa.com. You can access the Jupyter notebooks on Github.

... Read More