A Couple Articles of Interest

March 29, 2015

Hi, there! I’ve been so busy with projects lately (for work, class, personal), that I haven’t been able to do as many data science learning projects as I’ve wanted to, but I have been reading up, and I wanted to share some articles I recently found interesting with you! The first is “Crushed It! Landing a data science job” by Erin Shellman, which I received via the Get a Data Science Job newsletter. Erin talks about her recent experience with various types of Data Science interviews, which ended in her being hired at Amazon Web Services. She gives great advice about how to prepare for these interviews, as well as a ton of great resources for learning. I’ve definitely bookmarked several of her book and course recommendations. Advice from Erin Shellman: “Take the time to be sure that you can explain core concepts in your own words. Screening questions are commonly phrased like this: “how would you explain to an engineer how to interpret a p-value?” Explain it to an engineer, someone who, presumably, isn’t a statistician and might not be used to that language. You don’t want it to be the first time you’ve had to rephrase basic definitions like that. Also, don’t underestimate what nerves can do to your ability to recall information, even stuff you really thought you understood.” Check out her full post here: http://www.erinshellman.com/crushed-it-landing-a-data-science-job/ The second article interested me, because about a year ago when I heard about UrtheCast, I was brainstorming data-sciencey projects that could be done with satellite image data, and thought it would be so cool to learn how to identify and track large mammals like elephants and whales from space. Well, guess what? It’s already being done! If you think you like elephants and whales now, just wait until you see them FROM SPACE http://t.co/bz6ND5wKkC — Parker Higgins (@xor) March 28, 2015 The article Parker Higgins is referring to in the tweet above is from The Atlantic, and details how analysts at DigitalGlobe were able to identify elephants in the Democratic Republic of Congo and help protect them from poachers. A different team at DigitalGlobe used an algorithm to spot whales surfacing near the coast of Argentina from ocean imagery – how cool! At the Girl Develop It! meetup where I talked about data science, some of the participants and I had questions about using geospatial/GIS data for various types of analysis, and decided to schedule a new meetup. My husband knows some professors that do research in this area, and there were a couple people at the meetup that also had some experience with satellite imagery, so I’m going to plan a follow-up meetup to discuss that topic in more depth. Keep an eye on this blog for any announcements about that. I’m thinking maybe late May? I always tweet interesting articles I read, so for more, follow me on Twitter! Tweets by...

Read More

What I’m up to

October 26, 2014

I haven’t written here in a while because I haven’t “finished” anything I have been wanting to write about, but why wait until I’m completely done, right? So, here’s a bit about what I’ve been up to data-science-wise: I’m in a grad class called Stochastic Models and we’re learning about Markov Chains right now. Fascinating stuff! Here’s a cool site that visually shows some Markov Chain concepts. My other grad class is Intro to Systems Engineering. (Yeah, because of the courses offered online, I’m taking the intro class in my second-to-last semester!) We just did a neat project in that class that involved coming up with a strategy and participating in a baseball draft, so I’ll come back later and write about that in more detail. I’m almost at the end of Udacity’s Cloudera Hadoop course. I am really enjoying learning about MapReduce, and will definitely write up a review of the class when I’m done. The biggest frustrations I’ve had so far haven’t involved the Hadoop concepts, but using the VM they provide has been frustrating! All I have left on that is the final project, so soon i’ll be able to cross that one off my goals list. Soon, I need to come up with a final project for my Systems Engineering Masters degree. I’m definitely doing something data science related, and will update when that plan is finalized. I’ve been telling more people about my data science plans, and have had more people asking me about data science and the learning process. I may be giving a talk to my alma mater’s IEEE Computer Society club meeting soon about “What is data science?”, so that will be fun! Are you “becoming a data scientist”, too? What projects are you in the middle of right...

Read More

Codecademy Python Course: Completed

September 21, 2014

I can cross off another item on my Goals list since i finally jumped back into the Codecademy “Python Fundamentals” course and completed the final topics this afternoon. I think the course would be good for people that have had at least an introductory programming course in the past. I didn’t have much trouble with the tasks (though a few were pretty tricky), but I have programming experience (and taught myself some advanced Python outside of the course for my Machine Learning class) and can imagine that someone that had never programmed before and was unfamiliar with basic concepts might get totally stuck at points in the course. I think they need 2 levels of “hints” per topic so that if you just need hints on the most common difficult things that trip people up, you click it once and get the hints they show now. But if you’re truly stuck and need to be walked through it, they should have more in-depth hints for true beginners. The site estimates it will take you 13 hours to complete the course. I don’t know how much time I spent on it total, since it was broken up over months. It took me about an hour to finish the final 10% of the course, covering classes, inheritance, overrides, file input/output and reviews, then also going back and figuring out where the final 1% was that it said I hadn’t completed (apparently I skipped some topic mid-course accidentally) so I could get the 100% topic complete status. The topics covered are: Python Syntax Strings and Console Output Conditionals and Control Flow Functions Lists & Dictionaries Loops Iteration over Data Structures Bitwise Operators Classes File Input & Output I thought this was a good set of topics for an intro course. If they dropped anything, I think Bitwise Operators was a “bit” unnecessary for beginners. I liked the projects they included to test out the skills you learned, like writing a program as if you are a teacher and need to calculate statistics on your class’ test scores. Overall, I think Codecademy did a good job with this course, and I would point other programmers that want to quickly get up to speed on Python to take this course. I would also point beginners to the course, but with a warning that there are tricky spots they may need outside resources to get...

Read More

Goal #1 Reached!

May 13, 2014

My first “Becoming A Data Scientist” goal was to get an “A” in my Machine Learning class this semester, and I did! Now I can cross that one off the list: Updated Goals

Read More

ML Project 4 Results

May 11, 2014

I am happy to report that I got 100% on the final project I did in the last 2 weeks for my Machine Learning grad class (which is especially great because that was 30% of my grade for the semester!) and I got some good feedback from the professor: Very good analysis and you showed great potential to become a good researcher! Comments: 1. when you code your categories features, 1 of k coding is a good choice. Did you apply this method to all categories features? 2. Some time, normorlize features will make a huge difference. One way to do this is to comput the z-score for features before you train a model on the data. 3. In terms of machine learning application, your analysis is good. If you try to find a social study expert to collobrate with you, I believe your findings can be published on high impacting journals. 4. In order to publish your work, you will need to do some research to found what have been done in this field. This is especially encouraging since I want to become a data scientist, so hearing positive feedback like this, even encouraging me to publish after having only taken one semester of Machine Learning, feels great! So, I will take time this summer to do more research and learning and expand on this project (since it was a rush to complete enough to turn in on time in this class but there’s a lot more I want to do with it), and I will collaborate with some people at the university where I work to further distill the results and see if we can apply them to segment out some potential first-time donors for next fiscal year. This is...

Read More

Is Data Scientist the Right Career Path for Me?

April 1, 2014

This is a post in response to the interview article “Is Data Scientist the right career path for you? Candid Advice” posted on KDnuggets here: http://www.kdnuggets.com/2014/03/data-scientist-right-career-path-candid-advice.html In that post, Paco Nathan, “a data science expert with 25+ years of industry experience”, is interviewed about Data Science as a career path, and gives his opinions on whether it is a “sexy” career, as well as sharing advice for those people (like me) considering a career in the field. Though I think Paco Nathan makes some very good points throughout the interview, such as “be careful of where you go to work”, “learn to leverage the evolving Py data stack”, “learn to lead an interdisciplinary team”, and “find mentors”, my overall reaction to the interview was somewhat negative and I almost stopped reading before I got to the good advice. I think my reaction mostly stems from the excerpt below, at the beginning of the interview: Anmol Rajpurohit: Data Scientist has been termed as the sexiest job of 21st century. Do you agree? What advice would you give to people thinking of a long career in Data Science? Paco Nathan: I don’t agree. Not many people have the breadth of skills to perform the role, nor the patience that is absolutely needed to acquire those skills, nor the desire to get there. As a self test: prepare an analysis and visualization of an unknown data set, while impatient stakeholders watch over your shoulder and ask pointed questions; be prepared to make quantitative arguments about the confidence of the results describe “loss function” and “regularization term” each in 25 words or less, with a compare/contrast of several examples, and show how to structure a range of trade­offs for model transparency, predictive power, and resource requirements pitch a reorg proposal to an executive staff session which implies firing some ranking people interview 3­4 different departments that are hostile to your project, to tease out the metadata for datasets that they’ve been reluctant to release build, test, and deploy a mission­-critical app with real­time SLAs, efficiently across a 1000+ node cluster troubleshoot intermittent bugs in somebody else’s code which is at least 2000 lines long, without their assistance leverage ensemble approaches to enhance a predictive model that you’re working on work on a deadline in paired programming with people from 3­4 different fields completely disjoint from the work that you’ve done If one doesn’t feel absolutely comfortable performing each of those listed above, right now, then my advice is to avoid “Data Science” as a career. It’s that last sentence that is really bothering me. It’s unnecessarily discouraging. It seems to me that this list, and especially the admonition that if you don’t feel *absolutely* comfortable performing all of these tasks *right now*, don’t even try to become a data scientist is a scare tactic and carries some of an “I survived it, but I doubt you can” attitude. When I mentioned my negative reaction to the article on Twitter, Paco thanked me for my feedback, and asked whether I would consider it discouraging to give advice that if you’re in HR, you’d be firing people, or that if you plan to become an entrepreneur, there’s a high chance of failure. curious: if one advised about HR ("you'll fire people") or about founding firms ("50% fail"), are those discouraging? @paix120 @dtunkelang — paco nathan (@pacoid) April 1, 2014 No, I don’t think it’s overly negative to give people “reality checks” that may counter the popular culture version of what a particular job entails. I don’t doubt that someone pursuing a data science career will come across one or more of the situations he mentioned in the interview in their lifetime. However, how often do these occur? Is it a sure thing you’ll have a negative experience? Do these situations appear in every variation of a “data scientist” job? Are they more likely to happen if you join a company of a particular size or culture? Do you really have to be comfortable doing those things before even deciding to pursue data science? (Do you have to be comfortable running a marathon before you start training for one?) Giving the advice quoted above is like saying “I’ve been running marathons for 25 years. Before you start training, give yourself this self test: Are you comfortable with recovering from a hamstring injury right now? Because hamstring injuries are common among marathon runners. I injured mine and it was excruciating” in response to someone that mentions to you that it’s their dream to one day run a marathon. Advising that it’s going to be a difficult path forward, with the possibility of...

Read More