Which terms are frequent in the inaugurals of US Presidents from Texas, and which are significant?
Click here to listen to the Texan Translation segment on KUT Public Radio in which the research presented here was discussed (aired on 22 January, 2021).
For the first analysis we’ll remove stopwords (see a definition of stopwords here). And then we will count the occurrences of each distinct word and see what’s most frequent!
An advanced measure of the importance of words in documents is tf-idf
. It is calculated using a formula that takes into account both each word’s raw frequency as well as the number of documents in the corpus in which it is used. This method is a way of finding out the most distinctive words in each text.
For example, in the context at hand, you might expect that each of the four speeches contains the words freedom or people. And while it is certainly interesting which president used them the most, these words aren’t really distinctive, simply because each president uses them on this occasion. But it’s instructive to look at the words that are both frequent and unique to each speech.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Hinrichs (2021, Jan. 20). Texan Inaugural Addresses: 3 Keywords. Retrieved from https://texan-inaugurals.netlify.app/posts/3-keywords/
BibTeX citation
@misc{keywords-tf-idf, author = {Hinrichs, Lars}, title = {Texan Inaugural Addresses: 3 Keywords}, url = {https://texan-inaugurals.netlify.app/posts/3-keywords/}, year = {2021} }