Is Data Scientist the Right Career Path for Me?

This is a post in response to the interview article “Is Data Scientist the right career path for you? Candid Advice” posted on KDnuggets here: http://www.kdnuggets.com/2014/03/data-scientist-right-career-path-candid-advice.html

In that post, Paco Nathan, “a data science expert with 25+ years of industry experience”, is interviewed about Data Science as a career path, and gives his opinions on whether it is a “sexy” career, as well as sharing advice for those people (like me) considering a career in the field.

Though I think Paco Nathan makes some very good points throughout the interview, such as “be careful of where you go to work”, “learn to leverage the evolving Py data stack”, “learn to lead an interdisciplinary team”, and “find mentors”, my overall reaction to the interview was somewhat negative and I almost stopped reading before I got to the good advice.

I think my reaction mostly stems from the excerpt below, at the beginning of the interview:

Anmol Rajpurohit: Data Scientist has been termed as the sexiest job of 21st century. Do you agree? What advice would you give to people thinking of a long career in Data Science?

Paco Nathan: I don’t agree. Not many people have the breadth of skills to perform the role, nor the patience that is absolutely needed to acquire those skills, nor the desire to get there.

As a self test:

  • prepare an analysis and visualization of an unknown data set, while impatient stakeholders watch over your shoulder and ask pointed questions; be prepared to make quantitative arguments about the confidence of the results
  • describe “loss function” and “regularization term” each in 25 words or less, with a compare/contrast of several examples, and show how to structure a range of trade­offs for model transparency, predictive power, and resource requirements
  • pitch a reorg proposal to an executive staff session which implies firing some ranking people
  • interview 3­4 different departments that are hostile to your project, to tease out the metadata for datasets that they’ve been reluctant to release
  • build, test, and deploy a mission­-critical app with real­time SLAs, efficiently across a 1000+ node cluster
  • troubleshoot intermittent bugs in somebody else’s code which is at least 2000 lines long, without their assistance
  • leverage ensemble approaches to enhance a predictive model that you’re working on
  • work on a deadline in paired programming with people from 3­4 different fields completely disjoint from the work that you’ve done

If one doesn’t feel absolutely comfortable performing each of those listed above, right now, then my advice is to avoid “Data Science” as a career.

It’s that last sentence that is really bothering me. It’s unnecessarily discouraging.

It seems to me that this list, and especially the admonition that if you don’t feel *absolutely* comfortable performing all of these tasks *right now*, don’t even try to become a data scientist is a scare tactic and carries some of an “I survived it, but I doubt you can” attitude.

When I mentioned my negative reaction to the article on Twitter, Paco thanked me for my feedback, and asked whether I would consider it discouraging to give advice that if you’re in HR, you’d be firing people, or that if you plan to become an entrepreneur, there’s a high chance of failure.

No, I don’t think it’s overly negative to give people “reality checks” that may counter the popular culture version of what a particular job entails. I don’t doubt that someone pursuing a data science career will come across one or more of the situations he mentioned in the interview in their lifetime. However, how often do these occur? Is it a sure thing you’ll have a negative experience? Do these situations appear in every variation of a “data scientist” job? Are they more likely to happen if you join a company of a particular size or culture? Do you really have to be comfortable doing those things before even deciding to pursue data science? (Do you have to be comfortable running a marathon before you start training for one?)

Giving the advice quoted above is like saying “I’ve been running marathons for 25 years. Before you start training, give yourself this self test: Are you comfortable with recovering from a hamstring injury right now? Because hamstring injuries are common among marathon runners. I injured mine and it was excruciating” in response to someone that mentions to you that it’s their dream to one day run a marathon. Advising that it’s going to be a difficult path forward, with the possibility of an injury or negative experience along the way, and some tips for avoiding the pitfalls is enough. You don’t have to advise them not to try because training for a marathon is not as glamorous as they might first believe.

Should I, as someone that is a database designer, SQL developer, and Data Analyst that has run my own business, worked a data analyst at an internationally-known educational/retail business, developed analytical sql reports at a major university, and am halfway through a systems engineering master’s degree at a top university, “avoid data science as a career” because I wouldn’t feel comfortable “building, testing, and deploying a mission­-critical app with real­time SLAs, efficiently across a 1000+ node cluster” “right now”? I don’t think so.

Now, maybe, as Daniel Tunkelang suggested, Paco Nathan is trying to “counter the proliferation of be-a-data-scientist-quick” programs.

I can understand that point of view, and it seems like good advice to tell someone that if you’re going to become a data scientist, you shouldn’t expect it to come easy or be a quick process, and once you get hired, you are going to be up against a lot of misconceptions while having to deal with a lot of different types of people and power structures and challenging projects that aren’t as “sexy” as you might have first expected.

But is that a reason not to try? Another thought: if you fall short of becoming the “unicorn” data scientist that is a true expert in computer science, statistics, business, and has domain expertise to boot, have you failed? Or could you maybe be a great contributor to a data science team, or find out that you’re really awesome at one of the related sub-areas you encounter as you learn, and can have a great high-paying career doing something you love in a company that isn’t necessarily going to be “hostile to your project”?

Getting a graduate engineering degree has been harder than I anticipated (and not just because of the content), but would I go discourage someone that wants to get a similar degree? No. Would I give them advice that it’s going to be harder than they might think? Yes. And that could be stated similarly to how Deborah Siegel said in her positive response to the interview,

“What I got from the excellent advice is that data science is not a comfortable career if one want to be accepted and well-defined within a business, or if one expects clear requirements.”

But would I tell someone that if they aren’t comfortable solving convex optimization problems “right now”, they might as well avoid systems engineering graduate programs altogether? No. That would be unnecessarily discouraging, when they might later find out they excel in the degree program despite not even knowing what they’re getting into when they apply.

Though I don’t doubt that Paco Nathan has a lot of experience in the area of data science, and some of his advice is definitely valuable, his experience is not all-encompassing of all possible data science career possibilities (some of which don’t even exist yet!) to give such broad warnings about the career path as a whole. In any case, the last sentence in the section of the interview I highlighted above is way too extreme. If I took that advice to heart, I might quit now and never achieve the career I believe is a great fit for me. “If you don’t feel completely comfortable that you can handle [extreme situation] given your experience and skill set RIGHT NOW, avoid this career path!” is terrible advice to anyone starting out in any field, and put a bad taste in my mouth that overshadowed the otherwise good information.

I’m not a data scientist yet, so I could be wrong, but I have trouble listening to advice from someone that starts with a “self test” meant to scare and discourage, rather than educate and encourage, those that might have the wrong impression about Data Science, but might really end up becoming great data scientists despite any overly-dreamy first impressions.

UPDATE: Here are some of the replies/discussion from Twitter on this topic

2 Comments

  1. Paco Nathan
    Apr 2, 2014

    Loved this article! Your advice is taken to heart.

    Leading with a self-test was not my best moment. I’d edit that, in retrospect, to rearrange the arc narrative. Scaring people away from a really truly interesting career path was not my intent…

    Or was it? I do see much media coverage (without naming names) overly-hyping the field. I see many people who should know better saying inane things like “DS, oh that’s just statistics”… Those, yes, those let’s scare away from the field. Without delay.

    By the characteristics of your drive to learn and articulate and excel, you are clearly *not* among the intended discouragees.

    And, frankly, a few ancestors from the northwest corner of a rather green isle on the North Atlantic passed down to me a distinct passion for writings that are a wee bit over the top — to help illustrate subtle points which might otherwise be missed. Guilty as charged.

    But there’s another thing… something clearly disturbing, dark, menacing, as well as other diverse adverbial forms of treachery… For this point, I’ll defer to much better business minds, but clearly there is a process of Disruption afoot. Not the disruption of which VCs speak and dream — they’re not thinking large enough IMHO. No, more like the disruption that Amazon visited upon the entire Retail sector, whilst said incumbents had the best BI tools that money could buy monitoring their own failures. Likewise, what Google did to Advertising… not the “smash and grab” practiced by oh so many venture capitalists, but a “beat them up and take their lunch money” strategy executed at the scale of many billionaires. And perhaps, what Tesla is now doing to GM… there are clear tumults ahead for business at enormous scale. Enormous tumults driven by data. There will be winners and losers. People working specifically in these DS roles will play key roles in that process, and (statistically) will be delivering the bad news, so to speak, more often than not. I see that daily in my consulting practice, the process of self-recognition among those winners and losers, more of the latter of course, and it’s brutal. If one really wishes a career in DS, brace oneself for that experience, because I promise it will become the rule and not the exception.

    But it will likely be quite fun, so I truly hope that your bold mind is along that same path contributing and extending the practice!

  2. Renee
    Apr 2, 2014

    I appreciate your responses on twitter and this thoughtful reply, and your statement that you might edit that self test in retrospect. I do think there was interesting and helpful advice in your article once I got past the part I had an issue with, and your points about the over-hype and about having to sometimes (and maybe increasingly often) be “the bearer of bad news” is well taken.

    Also, I agree with the sentiment that some people aren’t taking the time to fully understand what is involved in becoming a data scientist, and that the over-hype could lead to people rushing into the arena that maybe shouldn’t. (Though who are we to judge? With the right attitude, someone that may seem “wrong” for the role at first could become a valuable contributor.)

    With the apparently high number of data science positions that need to be filled, and even more that will be created in the coming years, encouraging anyone with an interest to pursue the field might be a good thing, as long as that encouragement is tempered with a bit of reality check, along with some good advice as to what to expect and how to achieve in the field of data science.

    I’m currently reading the O’Reilly book Doing Data Science: Straight Talk from the Frontline, and in it, the authors outline a list of “prerequisites” they assume you have before you even start reading, and they don’t hold back in explaining the types of knowledge you will need to acquire, and tasks you will need to be able to do, to call yourself a data scientist. Though accessible, it’s not simply a “causal business read”, and all through the book, they have references for additional recommended reading so you can dive in deeper as needed. I am enjoying the book and plan to do all of the exercises this summer and post a review here afterward.

    Anyway, I appreciate the reply, and understand what you’re trying to say about the field not necessarily being as glamorous as advertised, and I appreciate you taking the time to understand my point of there being a difference between “reality check” and discouragement.

    Thanks!