Data Science Thursday: The Simpsons

As an early and long time fan of the Simpsons TV show, I found this interesting analysis of show in Data Science Weekly – Issue 150.

After 27 years on TV, the analysis reveals some surprising details about  show’s most significant characters,  side characters, and shows the declining TV ratings correlating with overall decline in TV ratings.  There is a lot more to check out.

One interesting item was the the application of  term frequency–inverse document frequency (tf-idf) to attempt to generate episode summaries.  TF-IDF is a popular technique determine which words are most significant to a document that is itself part of a larger corpus. In this case, the documents were the individual episode scripts, and the corpus is the collection of all scripts.

All code used in this post is available on GitHub so you can take this further.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s