Gender and verbs across 100,000 stories: a tidy analysis – Variance Explained
I was fascinated by my colleague Julia Silge’s recent blog post on what verbs tend to occur after “he” or “she” in several novels, and what they might imply about gender roles within fictional work. This made me wonder what trends could be found across a larger dataset of stories.
Mark Riedl’s Wikipedia plots dataset that I examined in yesterday’s post offers a great opportunity to analyze this question. The dataset contains over 100,000 descriptions of plots from films, novels, TV shows, and video games. The stories span centuries and come from tens of thousands of authors, but the descriptions are written by a modern audience, which means we can quantify gender roles across a wide variety of genres. Since the dataset contains plot descriptions rather than primary sources, it’s also more about what happens at than how an author describes the work: we’re less likely to see “thinks” or “says”, but more likely to see “shoots” or “escapes”.