Comments on: The Imitation Game, and the Human Element in Data Science

By: Renee

Renee — Sun, 09 Aug 2015 00:57:24 +0000

Here’s a video about the enigma machine, and the flaw that was discovered that helped break the code:
https://www.youtube.com/watch?v=V4V2bpZlqx8

By: Renee

Renee — Sat, 08 Aug 2015 23:44:50 +0000

In reply to Joerg. Huh interesting that you say it's as high as 50%. I would tend to agree (and it's similar when working as a data analyst), and wonder if other people have found the same in their data science roles.

By: Renee

Renee — Sat, 08 Aug 2015 23:43:20 +0000

In reply to Nicole.

Yep, I agree with you. And I’ve read interviews with several data scientists where they emphasize “make sure you know the question you are trying to answer, and how the answer to that question will be used before you start developing an approach”.

Also, good point about autopilot vs emergency manual mode.

By: Joerg

Joerg — Sat, 08 Aug 2015 21:19:43 +0000

Oh I think that the human aspect is the most important aspect of Data Science. You need to communicate your findings, need to meet business needs, need to get the DevOps to help you with your stack etc. I think Machine Learning is like 5%, programming is 45 % and 50 % is communication in Data Science (numbers sampled from a rear end distribution)

By: Nicole

Nicole — Sat, 08 Aug 2015 21:16:16 +0000

Amen. And it’s brute-force-data-science that’s unfortunately going to be what “democratizes” it over the next few years… then, after a few high profile bad decisions (which hopefully don’t involve extensive litigation or threats to human health or safety) the “craft” aspect should creep back in. “Autopilot Mode” is coming (e.g. with BigML and Amazon ML services) but you still need a pilot in case of emergency. Which, in data science, could potentially be *every single time*.

This reminds me of the discussions we were having years ago in astronomy when storage was getting cheaper, and data volumes per unit time were getting bigger and bigger. Easy solution? Just archive all of it, of course! But without being able to effectively describe the original observer’s intent, and store and search that, the value of the data was pretty low. So why spend a few tens of thousands of dollars a year on storing data that really didn’t have much archival value?

I think data science is similar. We really have to cautiously examine what value using a particular model will add… and really examine it in terms of current context and envisioned context. Brute force cloud ML services can’t do that. Nor would we want them to. But guaranteed, a lot of people will be doing just that.