As an early and long time fan of the Simpsons TV show, I found this interesting analysis of show in Data Science Weekly – Issue 150.
After 27 years on TV, the analysis reveals some surprising details about show’s most significant characters, side characters, and shows the declining TV ratings correlating with overall decline in TV ratings. There is a lot more to check out.
One interesting item was the the application of term frequency–inverse document frequency (tf-idf) to attempt to generate episode summaries. TF-IDF is a popular technique determine which words are most significant to a document that is itself part of a larger corpus. In this case, the documents were the individual episode scripts, and the corpus is the collection of all scripts.
All code used in this post is available on GitHub so you can take this further.