What is a Data Scientist?

It’s pretty clear based on the variety of (sometimes conflicting) information out there that the definition of “data scientist” is still being developed. Data Science is basically a new trendy term for a collection of things that has already existed: data modeling, data mining, statistical analysis, predictive analytics, machine learning, etc.

However, this “new” career path called Data Scientist seems to require at least three things: an understanding of advanced statistics, ability to program and use a variety of analytical tools, and the ability to grasp business/domain concepts so you can ask the right questions and understand how to interpret the results in context.

One popular definition is: “Person who is better at statistics than any software engineer and better at software engineering than any statistician.” – Josh Wills, Cloudera [reference]

I like this venn diagram by Drew Conway, which gives a good overview of the areas involved in data science:


I also like the view that it’s hard to find one person that is really an expert in all of the areas required for data science, so it would be best for a company to find people with familiarity with all of these topics, but really aim to build a team with diverse talent that covers all of these bases and can work together to serve the organization’s data science needs.

Another question, since I’m writing a blog about “Becoming a Data Scientist” is when will I consider myself to have “made it”? When will I be a data scientist? That is closely tied to my goals, which I’m still thinking about, and which I’ll outline in another post.

Read More about Data Scientists…

Data Scientist on Wikipedia:

WSJ CIO Journal:
It Takes Teams to Solve the Data Scientist Shortage

Harvard Business Review:
The Sexiest Job of the 21st Century