5 Parts of Speech

The frequency of various parts of speech.

Lars Hinrichs http://larshinrichs.site (The University of Texas at Austin)
01-20-2021
If nouns are bricks, are verbs mortar? (Image from M. Haupt/unsplash.com.)

Data tagging

The four speeches will be part-of-speech tagged (POS-tagged) for this analysis, so that each word will be marked for its grammatical class.

Table 1: Top of data frame, POS-tagged.
doc_id sid tid token token_with_ws lemma upos xpos tid_source relation year president
1965-Johnson 1 1 My My -PRON- DET PRP$ 3 poss 1965 Johnson
1965-Johnson 1 2 fellow fellow fellow ADJ JJ 3 amod 1965 Johnson
1965-Johnson 1 3 countrymen countrymen countryman NOUN NNS 19 nsubj 1965 Johnson
1965-Johnson 1 5 on on on ADP IN 19 prep 1965 Johnson
1965-Johnson 1 6 this this this DET DT 7 det 1965 Johnson
1965-Johnson 1 7 occasion occasion occasion NOUN NN 5 pobj 1965 Johnson
1965-Johnson 1 9 the the the DET DT 10 det 1965 Johnson
1965-Johnson 1 10 oath oath oath NOUN NN 19 nsubj 1965 Johnson
1965-Johnson 1 11 I I -PRON- PRON PRP 13 nsubj 1965 Johnson
1965-Johnson 1 12 have have have AUX VBP 13 aux 1965 Johnson

Comparing POS frequencies

We will work with the upos column, which gives a fine enough classification of POS into 14 different types:

 [1] "DET"   "ADJ"   "NOUN"  "ADP"   "PRON"  "AUX"   "VERB"  "CCONJ"
 [9] "PROPN" "PART"  "ADV"   "NUM"   "SCONJ" "INTJ" 

The unit of analysis will be frequency per 1,000 words, to make the numbers comparable.

Global frequencies

The following plot shows these values ordered by POS-tag.

Comparison of POS frequencies.

Figure 1: Comparison of POS frequencies.

Nouns vs. Verbs

A big point of interest is the degree of overall nominal vs. verbal nature of a text. There is a group of parts of speech that pattern together with nouns. They are:

The other group of tags that co-occur and that characterize a more verbal style are

Let us treat the two groups in aggregate and compare the speeches for their frequencies.

Aggregate counts for nominal and verbal POS-groups in the speeches.

Figure 2: Aggregate counts for nominal and verbal POS-groups in the speeches.

The speech that truly stands out (again) is 2005-Bush. He makes a clear switch to more nominal style, compensated by a drop in verbal tags, relative to his Texan predecessors.

Interpretation: What is a “more nominal” style?

A more nominal style corresponds to more “conceptual” thinking, whereas a more verbal style corresponds to more “dynamic” thinking. This conceptual-dynamic index (based on POS frequencies) has been shown to correlate with academic success in college students, where the students who show more nominal/conceptual styles in their admissions essays end up having significantly greater academic success over 4 years in college (Pennebaker et al. 2014). Written, academic texts are the most nominal on a continuum from verbal to nominal style; informal spoken conversations are the most verbal (Biber 1991).

So that is what we are looking at: Bush 43’s second inaugural reveals a significantly more conceptual, bookish style of thinking and speaking than what we see in the other Texan speeches. The post-9/11 Bush is the most academic figure among Texan presidents.

What might be helpful is a qualitative follow-up analysis that looks at such questions as:

Also, some background research on who the speechwriters were would be of interest. Was there a significant change from Bush-2001 to Bush-2005?

Pronouns

It will be of interest to see which pronouns prevail in the four speeches. We will look at the different levels of person (1st person singular, 2nd person singular, and so on). Here is our definition of classes:

s1 <- c("i", "my", "me", "myself")

sp2 <- c("you", "your")

s3 <- c("he", "she", "it", "his", "her", "its", "him",
        "himself", "herself", "itself", "something",
        "everyone", "anything")

p1 <- c("we", "our", "us", "ourselves")

p3 <- c("they", "their", "them", "themselves")

A few clear trends are obvious:

Interpretation: What pronoun frequencies mean

As Pennebaker has shown many times (Kacewicz et al. 2013; Pennebaker 2011), higher frequencies of 1-p-sg. pronouns correspond to weak sense of self, depression, lower positions in social hierarchies. By contrast, speakers and writers with a strong sense of self use low frequencies of 1-p-sg.; they instead address other and speak about others.

By these metrics, the second inaugural by George W. Bush is a reflection of a strong sense of self in the speaker. The stylistic elements in this speech project strength.

What might be helpful is a qualitative follow-up analysis that looks at such questions as

Conclusion

The POS frequencies provide multiple hints that Bush-2005 stands out among Texan inaugurals. His position at the beginning of his second term was, of course, unique: he and the country were recovering from a big, national trauma. While Johnson’s time before his first inaugural was certainly not trauma-free (and, arguably, neither was Bush-41’s), the traumas that Bush 43 dealt with and the leadership demanded of him at this point were different in nature.

Arnold, Taylor. 2017. “A Tidy Data Model for Natural Language Processing Using cleanNLP.” The R Journal 9 (2): 120. https://journal.r-project.org/archive/2017/RJ-2017-035/index.html.
Biber, Douglas. 1991. Variation Across Speech and Writing. Cambridge University Press.
Kacewicz, Ewa, James W. Pennebaker, Matthew Davis, Moongee Jeon, and Arthur C. Graesser. 2013. “Pronoun Use Reflects Standings in Social Hierarchies.” Journal of Language and Social Psychology 33 (2): 125–43. https://doi.org/10.1177/0261927X13502654.
Pennebaker, James W. 2011. The Secret Life of Pronouns: What Our Words Say about Us. 1st edition. Bloomsbury Press.
Pennebaker, James W., Cindy K. Chung, Joey Frazee, Gary M. Lavergne, and David I. Beaver. 2014. “When Small Words Foretell Academic Success: The Case of College Admissions Essays.” PloS One 9 (12): e115844. https://doi.org/10.1371/journal.pone.0115844.
Ushey, Kevin, J. J. Allaire, and Yuan Tang. 2020. Reticulate: Interface to ’python’. https://CRAN.R-project.org/package=reticulate.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Hinrichs (2021, Jan. 20). Texan Inaugural Addresses: 5 Parts of Speech. Retrieved from https://texan-inaugurals.netlify.app/posts/5-pos/

BibTeX citation

@misc{hinrichs20215,
  author = {Hinrichs, Lars},
  title = {Texan Inaugural Addresses: 5 Parts of Speech},
  url = {https://texan-inaugurals.netlify.app/posts/5-pos/},
  year = {2021}
}