Update: I think I have finally figured out the trick to smoothing for the document-topic matrix - really I was making things more difficult than I needed to. Because I optimized the topic distribution, the
alpha value IS the list of values, which can be added to the matrix of document-topic counts without any additional trouble or manipulation. I’ve updated the code and the visualization below to reflect this change. As a result, there is more variation in topic size (which makes sense) and the number match the doc-topics output from MALLET (which I should have been using to verify my results all along.)
One useful library for viewing a topic model is LDAvis, an R package for creating interactive web visualizations of topic models, and its Python port, PyLDAvis. This library is focused on visualizing a topic model, using PCA to chart the relationship between topics and between topics and words in the topic model. It is also agnostic about library you use to create the topic model, so long as you extract the necessary data in the correct formats.
While the python version of the library works very smoothly with Gensim, which I have discussed before, there is little documentation for how to move from a topic model created using MALLET to data that can be processed by the LDAvis library. For reasons that require their own blog post, I have shifted from using Gensim for my topic model to using MALLET (spoilers: better documentation of output formats, more widespread use in the humanities so better documentation and code examples generally). But I still wanted to use this library to visualize the full model as a way of generating an overall view of the relationship between the 250 topics it contains.... Read More