DataSciGuide Contest

October 2, 2015

Want a way to help people that are learning data science, and also get a chance to win a $40 Amazon Gift Card? Review a data science blog, podcast, course, or other content at DataSciGuide! Here’s more info:

Read More

The Data Science Central “Incident”

July 8, 2015

I’m writing this post to respond both to what many of you saw Vincent Granville said about me on Facebook a couple days ago, which was brought to my attention yesterday: (in context) and to his apology this evening: I didn’t want to write a second post about Data Science Central, but after the huge response on twitter today, I want to document everything in one place so anyone looking back at this has all of the info to evaluate what has been said. I have thought a lot about Vincent Granville’s apology this evening, and honestly when I heard he had apologized, I hoped (but doubted) it would be sincere. I would have loved to be able to accept his apology and move on from all this. However, I can’t bring myself to accept the apology because it’s not really an apology, it’s an accusation. After writing a truly vile post about me, his “apology” accuses me of harassing *him*. He says that I have “attacked” him for 14 months, and is casting himself as a victim. He’s basically saying “I acted a fool in a heated moment because she’s been attacking me non-stop for over a year” (the heated moment apparently being a Facebook post about Ellen Pao that reminded him of me, and the “attacking” being me pointing out his questionable practices in a blog post and on twitter, I guess). Because of that, and because there are a lot of people who are *actually* harassed online who I think would be offended by his characterization, I want to document everything I’ve said about him, and challenge his definition of harassment. What I have done is document what I saw as some very questionable (if not unethical) behaviors, and occasionally initiated or participated in conversation on twitter about that. I have never “attacked” him in any way, but I want to leave it up to you readers to decide. Here is the history of my comments about Data Science Central and Vincent Granville: April 21-22, 2014: Initial twitter conversation with @tesherista about Data Science Central’s contest to find fake accounts created to attract women and minorities to Data Science Central, where Vincent Granville deleted Cory’s comments questioning the practice. (screenshots by @AltonDataSci) @tesherista went back to refer to @DataScienceCtrl. were your comments deleted?? — Data Science Renee (@BecomingDataSci) April 22, 2014 July 1, 2014: Original blog post ““Something has been bothering me about Data Science Central” here on Becoming a Data Scientist, where I wrote about the above experience, as well as exposing one of the fake Data Science Central profiles “Amy Cordan” as having a fake LinkedIN profile (still there as “Amy Sangrene”) with a fake Stanford Computer Science PhD, violating LinkedIN terms & conditions. (He mentioned me questioning his advanced degrees in his “apology”, and this is the only academic credential I have brought under scrutiny, that of “Amy”.) In response to this post, I received the following comments from readers (among others, you can see them at the end of the post linked above): Alton discussing his negative experience with the Data Science Central contest Ellie talking about losing trust in DSC when she tried to contact “Amy” and realized she wasn’t real Hubart and David questioning his academic background (maybe this is why he thought I did? because commenters on my post did?) “System Administrator” recalling another use of the name Amy Cordan by Vincent Granville online in the past Eric mentioning he found that Vincent Granville was accused of paying people to write positive Amazon reviews of his Developing Analytic Talent book A comment by someone who claims to be the “real” Amy Cordan (Henriques) and used to be close friends with Vincent Granville’s wife July 1-5, 2014: Twitter conversation with @altondatasci following the blog post above, as well as an explanation for why I wrote the post: My hope in writing this post is that @DataScienceCtrl will reach out and hire some real female and minority data scientists/writers. — Data Science Renee (@BecomingDataSci) July 2, 2014 October 26, 2014: Tweet to @kissmetrics (and conversation following) alerting them that Amy was a fake profile. June 8-10, 2015: Tweets after the real Amy Cordan commented on my blog, conversation between @ellieaskswhy, @metabrown312, and @tesherista on Twitter about fake Amy and deleted comments. Follow-up tweets warning people again about what we had found, and talking about the suspended accounts. June 24, 2015: Tweets about finding out my Data Science Central account was suspended. Here is every tweet I've ever mentioned @DataScienceCtrl. I have never contacted him otherwise. — Data Science Renee (@BecomingDataSci) July 7, 2015 Throughout all...

Read More

Codecademy Python Course: Completed

September 21, 2014

I can cross off another item on my Goals list since i finally jumped back into the Codecademy “Python Fundamentals” course and completed the final topics this afternoon. I think the course would be good for people that have had at least an introductory programming course in the past. I didn’t have much trouble with the tasks (though a few were pretty tricky), but I have programming experience (and taught myself some advanced Python outside of the course for my Machine Learning class) and can imagine that someone that had never programmed before and was unfamiliar with basic concepts might get totally stuck at points in the course. I think they need 2 levels of “hints” per topic so that if you just need hints on the most common difficult things that trip people up, you click it once and get the hints they show now. But if you’re truly stuck and need to be walked through it, they should have more in-depth hints for true beginners. The site estimates it will take you 13 hours to complete the course. I don’t know how much time I spent on it total, since it was broken up over months. It took me about an hour to finish the final 10% of the course, covering classes, inheritance, overrides, file input/output and reviews, then also going back and figuring out where the final 1% was that it said I hadn’t completed (apparently I skipped some topic mid-course accidentally) so I could get the 100% topic complete status. The topics covered are: Python Syntax Strings and Console Output Conditionals and Control Flow Functions Lists & Dictionaries Loops Iteration over Data Structures Bitwise Operators Classes File Input & Output I thought this was a good set of topics for an intro course. If they dropped anything, I think Bitwise Operators was a “bit” unnecessary for beginners. I liked the projects they included to test out the skills you learned, like writing a program as if you are a teacher and need to calculate statistics on your class’ test scores. Overall, I think Codecademy did a good job with this course, and I would point other programmers that want to quickly get up to speed on Python to take this course. I would also point beginners to the course, but with a warning that there are tricky spots they may need outside resources to get...

Read More

Something has been bothering me about Data Science Central

July 1, 2014

So, what I’m about to write about actually occurred a few months ago, but I am reminded of it every day when I receive an email from Data Science Central or see someone tweet an article from the blog network (which includes Analytics Bridge, Big Data News, etc.), so I figured if it’s still bothering me, it’s worth writing about.

Read More

Doing Data Science (Review)

June 13, 2014

I just finished reading Doing Data Science: Straight Talk from the Frontline, an O’Reilly book by Cathy O’Neil (@mathbabedotorg) and Rachel Schutt (Columbia Data Science blog). First let me say, I really enjoyed this book! I thought it gave a great overview of Data Science, which is very valuable at this early stage in my data science journey. The authors attempt to define Data Science, but also explain that the definition is evolving, and show throughout the book all of the different types of things that can be categorized as data science activities. I also enjoyed that they emphasize data science teams, and each presenter in the book (each chapter is based on a lecture in the course which had a guest speaker from the field) was introduced with their level of expertise in the various aspects of data science (see image below). For instance, some were more focused on machine learning, while others focused more on visualization, and they were from a variety of different industries. This was nice because it meant the authors didn’t use the same example problems repeatedly when discussing different techniques. Data Scientist Profile (via semanticcommunity, more here) Speaking of visualization, I would say that the one negative of the book is that the images were not designed to be printed in black and white, and many are hard to read. There is an image with the caption “red means cancer, green means not”, but the dots all appear to be similar colors of grey. There is an image the students in the class designed to show the various aspects of data science which is basically unreadable because it is tiny and has some text that comes out as grey-on-grey (I happened to find a color version of that image here). Now, don’t expect to read the book and immediately be able to go out and do all of the activities in the book. First of all, there is a list of prerequisites the authors assume you have. You don’t have to have a deep understanding of all of these fields in order to gain something from the book, but they use terminology at times from linear algebra, statistics, machine learning, and other technical areas, and you would definitely need some of these skills in order to do some of the suggested activities. However, throughout the book are constant definitions and clarifications, references to other texts, websites, and people. I found this to be incredibly useful – any time you want to learn more about a topic, the authors point out how to find more information, and recommend books on the subject. To make a metaphor, Rachel Schutt and Cathy O’Neil tell you about a great dish someone cooked, and give some general info about the process of making the dish, and what to watch out for when you attempt it yourself. They even include some quotes from the chef about the art of making this particular dish, and tips on preparing and presenting it. But you still have to go out and get the ingredients and tools and learn some cooking techniques and look in some other cookbooks in order to figure out the detailed steps. Then, you have to do a lot of chopping and sautéing and probably burn a few things before you successfully create a similar dish you can serve to your customers. They don’t just hand you a simple recipe, and you are probably a casual at-home cook, not a professional chef yet. You could describe the book as kind of a “roadmap” to data science. There is some math and some code, but it is much more breadth than depth. The book is not pretentious, and actually warns data scientists against hubris, since overconfidence in a certain tool or method can have negative impact on your work. There are a lot of “tips”, “things to think about”, and “lessons learned” that I feel give the reader a great sense of what pitfalls you might come across when doing real-world analysis, and how to avoid the common ones, but only a few step-by-step how-to’s and code examples (in R or Python). Some topics I bookmarked to learn more about that I hadn’t read about before “Doing Data Science” introduced them to me: F-score (a combination of precision and recall – terms defined in the book), Log Returns, Simpson’s Paradox, Exponential Random Graph Models. Some topics I already knew a little about, but “Doing Data Science” helped me better understand: various similarity/distance metrics, exploratory data analysis, data leakage, recommendation engines, confounding variables. I can imagine that some readers wouldn’t like that the book is “all...

Read More

The Signal and the Noise (Review)

May 27, 2014

This is a review of The Signal and the Noise: Why So Many Predictions Fail — but Some Don’t by Nate Silver.

Read More