The Signal and the Noise (Review) – Becoming A Data Scientist

This is a review of The Signal and the Noise: Why So Many Predictions Fail — but Some Don’t by Nate Silver.

So if you were following me on Twitter, you may realize that it took me months to read this book, which is very unlike me. I normally devour books that interest me in a weekend, or a few weeks at most. There are a few reasons for that.

I was taking grad classes at the time and had little time for reading
At times, the book gets quite tedious (I’ll expand on this later), but mostly,
My Kindle tricked me into thinking the book was extraordinarily long.

If you have the Kindle version of the book, you may notice that when you click on the asterisks (and I recommend doing so, there are some funny notes and valuable insights hidden there), it will take you to a page that has only the short “footnote” and nothing else. It turns out that the Kindle version has these all as pages at the end of the book. As I was reading, and figured I’d be nearly halfway through, I looked and saw I was still not even at 30%. I thought it must be ridiculously long, so I put it off and didn’t read very enthusiastically when I did pick it up. Then, when I did finally take time to finish reading it in the past few weeks since the semester ended, I finished much earlier than expected, because the Kindle text ends at the 66% mark! The last full third of the book is all endnotes, footnotes, and references. So, it’s not as long as it first appears!

Considering that, there are still some sections that felt tedious to me. There was a long section about baseball statistics and Nate Silver’s early analysis work that was quite self-indulgent, and I recommend skimming it unless you’re actually interested in sports statistics. I didn’t feel it added much to the overall message of the book.

Despite it taking me a long time to read, I did like the overall message, which seems to be “most people are terrible at making predictions, so don’t get overconfident in a single model or forecaster, and learn how to tell if your model is good.” I think it’s important to remember that models we build based on data can be quite good, but they are still affected by human error or bias, and unless you have perfect data (which one could argue doesn’t actually exist because you can’t ever capture the entire context), your model won’t be perfect, and often will be much worse.

Another recurring point in the book is that we should constantly update our ideas about the world based on new information and probabilities (a Bayesian approach).

The book doesn’t give technical or mathematical details, except during an explanation of Bayesian statistics (which included some memorable examples about terrorism forecasts before and after 9/11), but gives several different examples of applying certain approaches to analysis that generally do or don’t work. It is very anecdotal, and based on the topics and the interviewees, appears pretty male-focused in topics and participants (I only remember one woman being interviewed or cited), so I would guess that people who have interests similar to Silver (heavy on sports and financial industry), would enjoy it more than I did. I did find it to be a worthwhile read, though.

Here are some topics I noted that came up a few times:

Data doesn’t speak for itself, we give it meaning. Related: Our predictions are never completely objective.
You’d be surprised how many studies out there can’t be replicated, and how many models are overfitted and not really good predictors
Weather forecasting is one area where models have improved significantly over time and are quite accurate now.
Don’t be a “hedgehog” and get stuck on rigid “truths” about the world as if there are immutable underlying laws that you understand and can apply in all situations. Instead be a “fox”, seeing the uncertainty in every situation, and looking at multiple ways to approach every problem. (It has been found that “foxes” are better forecasters.)

And here are some quotes from sections I bookmarked:

“We forget – or we willfully ignore – that our models are simplifications of the world. We figure that if we make a mistake, it will be at the margin. In complex systems, however, mistakes are not measured in degrees but in whole orders of magnitude.”

“…experts either aren’t very good at providing an honest description of the uncertainty in their forecasts, or they aren’t very interested in doing so. This property of overconfident predictions has been identified in many other fields… It seems to apply both when we use our judgment to make a forecast… and when we use a statistical model to do so.”

If you have strong analytical skills that might be applicable in a number of disciplines, it is very much worth considering the strength of the competition. It is often possible to make a profit by being pretty good at prediction in fields where the competition succumbs to poor incentives, bad habits, or blind adherence to tradition… It is much harder to be very good in fields where everyone else is getting basics right…”

I think you get the idea about the type of advice he gives along with the examples he details. I wish the book were slightly more technical, and a little less verbose in sections, but overall it is a good read for anyone considering a career in building data models for forecasting, who needs to gain some insight into picking out “signal” from “noise” and not falling into the many common traps analysts apparently fall into more often than we should.

Overall, I give it 4 out of 5 stars. (5 for overall content, -1 for dragging on in some spots since I’m an impatient reader)

See more books I’m reading, have read, or plan to read here: Becoming A Data Scientist “Learning” Page.

2 comments

Ralph Winters says:

July 2, 2014 at 8:27 am

Both this book, and “The Black Swan” by Taleb are very humbling experiences for any Data Scientist. Noise is much more prevalent than Signal nowadays in many potential Data Science deep dives, and recognizing as such is something that just can’t be taught. This is why I would give much a lot of emphasis to Domain Expertise and communication skills as a necessary part of the Data Scientist skill set.

Renee says:

July 2, 2014 at 11:58 am

Thanks for the feedback, Ralph! I’ll have to put The Black Swan on my reading list.

2 comments

Leave a comment Cancel reply