Short Talk: A Blocks-Based Introduction to Text Analysis

View on Snap!Con

Presented By: Clifford B Anderson, Sarah Burriss, Mark L Schoenfield, Brian Broll, Corey Brady


In this talk, we will present our ongoing work introducing computational thinking to humanists as part of the Computational Thinking and Learning Initiative (CTLI) at Vanderbilt University. Our approach was specifically tailored toward text analysis and exploring how quantitative approaches can complement existing qualitative techniques in literary scholarship. We found blocks-based programming effective in supporting a powerful paradigm of interaction and facilitated a deep understanding of the content. Furthermore, we explored the use of a blocks-based environment to facilitate the integration of a diverse set of related tools including data storage and exploration.

Liveness and tinkerability are common characteristics of blocks-based programming environments. These characteristics not only aid in making programming more accessible to novices but also lend themselves nicely to the introduction of computational thinking to humanists. Such characteristics can facilitate the “low threshold” probing of programs and programming concepts in a way that supports deeper understanding of the concepts and abstractions used in the program. This probing can be particularly valuable when introducing more complex concepts into the environment such as black box machine learning models. In these cases, the probing of the program can facilitate a better understanding of the models themselves and stimulate meaningful discussions about their strengths and weaknesses.

During our sessions, students were introduced to and explored text analysis techniques primarily through the creation of NetsBlox projects. These projects first introduced students to basic programming concepts and then incorporated text analysis concepts. The concepts enabled students to interact with machine learning models and a dataset of historical texts.

The first project was an intelligent typewriter which would color-code the student’s text based on the detected sentiment. The sentiment detection was performed using the ParallelDots API (available as a NetsBlox service). This was a simple project which provided multiple avenues for extension including the use of different intelligent features powered by the ParallelDots API. One such example is censoring abusive text. Furthermore, using the program itself enabled students to probe the model provided by ParallelDots to better understand its own sensitivities and limitations. An example of one such limitation discovered by a student was the poor performance on historical text which invoked a discussion about the influence of training data on statistical machine learning models.

The second project improved upon the first project by connecting to historical texts of interest stored in a BaseX database from within NetsBlox. Additionally, students were introduced to CODAP, an educational tool for data analysis, and used it from within the NetsBlox environment (via a custom library). These capabilities enabled both access to more relevant texts as well as rich data exploration of quantified features of historical texts.

We believe that blocks-based programming environments share many qualities that make them well suited for introducing computational thinking across a range of academic disciplines, a core objective of the CTLI. In this talk, we will present this work in more detail as well as ongoing efforts on introducing more advanced text analysis topics such as word embeddings and dimensionality reduction to humanists.