Machine Learning’s First Cheating Scandal

At first glance of the Data Science Weekly Issues 108  an article on  Machine Learning’s First Cheating Scandal, is great click bait.  It begs the questions: How do you cheat at learning?

Then you see that it was for an algorithm competition.  The core idea in these competitions is that different algorithms are tested on the same data set.  From this, we can gain an objective sense of their relative performance.

The issue/difficulty they were trying to overcome is that deep learning/deep neural networks take a long time to train.  ML algorithms are generally compared based on their performance on tasks that have specified training and testing data.

Within the confines of a competition, overfitting is a problem. Competitors frequently tune their engines unknowingly or knowingly (let’s assume they’re working in good faith) to the test data sets few thousands instances. The result is overfitting.  It’s very hard to determine if have the authors just fitted their algorithm well or is the speedup is inherent to the base pattern.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s