Overfitting in ML

When Charles Darwin was trying to decide whether he should propose to his cousin Emma, he created this list of pros and cons. In favor of marriage- he added children and companionship; against marriage- he listed the lack of time, freedom and having less money to spend on books.

Others including Benjamin Franklin highly approved of this model called Moral or Prudential algebra.

Fast forward to the age of ML – of teaching computers to make decisions from experience- the question of how many factors should be consider is at the center of a problem that statisticians call Overfitting. And looks like there is wisdom to deliberately thinking less and being aware of overfitting

“One of the deepest roots of machine learning is that in fact it’s not always better to use a more complex model, one that takes a great number of factors into account. And the issue is not just that the extra factors might offer diminishing returns- performing better than a simpler model but not enough to justify the added complexity. Rather they might make a prediction is dramatically worse.”

-Algorithms to Live By

Once we start observing, overfitting is everywhere.

How do you think we are guilty of over thinking or overfitting, having multiple factors to create complex models that just fit our data? (Think highly processed complex foods -vs- simple ingredients to satisfy the palate)

P.S: Soon after the above exercise, Darwin went down to overthink the Timing of the wedding- creating a pros/cons list, finally deciding to “Never mind, trust a chance.” He eventually proposed to Emma and they led a happy family life.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: