twentytwentyone domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home1/moderna7/public_html/wp-includes/functions.php on line 6131The Summer of Data Science is a commitment to learn something this summer to enhance your data science skills, and to share what you learned. (Those of you in the Southern Hemisphere will have to pick up the excitement when we’re winding down during our fall/your spring and keep it going! Or, join us during your Winter of Data Science!)
For those of you who haven’t been following me for years, a hashtag I started back in 2015, #SoDS, is actually one of the things that started growing my twitter following. Here’s the history:
1st month of tweets, May 2015 Storified
Unfortunately, I didn’t keep up the ‘Storification’ after that, but you get the idea. It brought a bunch of us together to share our learning progress. We learned from each other, encouraged each other, and most of all geeked out about data science together!
I didn’t launch one last year, because I was starting a new job and taking a break from recording the podcast, and just didn’t want to take on too much. But I missed it, so I didn’t want to let another year pass without a Summer of Data Science, so we’re going to do it together again this year!
So, here are the only “rules”:
How to participate in the Summer of Data Science:
- Pick a thing or a short list of things related to data science that you want to learn more about this summer.
- Make a plan to learn it (like an online course, a practice project, etc.).
- Share that plan on social media, then post updates as you make progress, with the hashtag #SoDS17.
That’s it! (And yes, there’s a chef competition that used the same hashtag. No worries! Enjoy the food pics.)
If you’re looking for ideas for learning projects or topics, check out the Data Science Learning Club! Please write about your learning experiences and share in the Data Science Learning Club #SoDS forum, and/or on your own blog, and share on social media. I’ll check out the hashtag on twitter regularly and RT others. I’ll be participating myself, too!
Here’s a link to the hashtag on twitter: #SoDS17. See you there!
P.S. Did you know that there is a “Summer of Data Sci” song? :D
P.P.S. There are now Summer of Data Science 2017 t-shirts and tanks in the Becoming a Data Scientist teespring shop!
UPDATE: Here is a twitter Moment with a selection of tweets from the #SoDS17 participants this year! (It starts out with a bunch of intro tweets from me, but click through to twitter and keep scrolling!)
Summer of Data Science 2017
Podcast Video Playlist:
Youtube playlist of interview videos
More about the Data Science Learning Club:
Data Science Learning Club Welcome Message
Activity 14: Hidden Markov Models
Activity 15: Neural Nets for Text
Data Science Learning Club Meet & Greet
Mentioned in the episode:
Hadley Wickham’s Advanced R book
shinyGEO: a web-based application for analyzing gene expression omnibus datasets
]]>Please fill out the survey and share it with your friends and followers on social media! The survey is a little long/detailed, but most of it is optional. I value your opinions! Thank you so much for participating!!
]]>Podcast Audio Links:
Link to podcast Episode 12 audio
Podcast’s RSS feed for podcast subscription apps
Podcast on Stitcher
Podcast on iTunes
Podcast Video Playlist:
Youtube playlist of interview videos
More about the Data Science Learning Club:
Data Science Learning Club Welcome Message
Data Science Learning Club Meet & Greet
1) Verena Haunschmid
Data Science Learning Club Activity 07: Linear Regression
Verena’s Results for Linear Regression on Salary Dataset
Verena’s website
@ExpectAPatronum on Twitter
2) David Asboth
City University London Msc Data Science
Data Science Learning Club Activity 02: Creating Visuals for Exploratory Data Analysis
David’s results exploring London Underground data
Data Science Learning Club Activity 07: K-Means Clustering
David’s results using k-means to draw puppies in 3 colors
FlyLady (the house cleaning system I mentioned)
David’s website
@davidasboth on Twitter
3) Kerry Benjamin
Data Science Learning Club Activity 01: Find, Import, and Explore a Dataset
Kerry’s results for Activity 1 IGN Game Review Data exploration
Data Science Learning Club Activity 02: Creating Visuals for Exploratory Data Analysis
Kerry’s Blog Post about Activity 02 – “My First Data Set Part 2: The Fun Stuff”
Blog post about Data Camp – “The Data Science Journey Begins”
Kerry’s blog post “Getting Started in Data Science: A Beginner’s Perspective”
Kerry’s Blog “The Data Logs”
@kerry_benjamin1 on Twitter
4) Anthony Peña
molecular biology
biotechnology
Data Science Learning Club Activity 07: K-Means Clustering
Anthony’s results for Activity 07
The first activity involved setting up a development environment. Some people are using R, some using python, and there are several different development tools represented. In this thread, several people posted what setup they were using. I posted a “hello world” program and the code to output the package versions.
Activities 1-3 built upon one another to explore a dataset and generate descriptive statistics and visuals, culminating with a business Q&A:
I analyzed a subset of data from the eBird bird observation dataset from Cornell Ornithology for these activities. Some highlights included:
– Learning how to use the pandas python package to explore a dataset (code)
– Learning how to create cool exploratory visuals in Seaborn and Tableau. Here is an example scatterplot matrix made in Seaborn:

– I was most excited to learn how to build interactive Jupyter Notebook inputs, which I used to control Bokeh data visualizations to display Ruby-Throated Hummingbird migration into North America (notebook). Unfortunately, until I host them on a server where you can run the “live” version, you won’t be able to see the interactive widgets (a slider and dynamic dropdowns), but you can see a video of the slider working here:
Here’s my final output for Activity 3, a Jupyter Notebook (with code hidden, and unfortunately interactive widgets disabled) with the Q&A about the hummingbird migration:
Ruby-Throated Hummingbird Migration into North America

Activity 4 was built as a catch-up week for those of us who were behind, but had some ideas of math concepts to learn for those who had time.
We’re currently working on Activity 5, our first machine learning activity where we’re implementing Naive Bayes Classification.
All of my work is available in this github repository: https://github.com/paix120/DataScienceLearningClubActivities
I strongly encourage you to click through the forums and look at some of the other data explorations the members have been doing, including analysis of NFL data, personal music listening habits, transportation in London, German Soccer League data, top-grossing movies, and more!
It’s never too late to join the Data Science Learning Club! If you aren’t sure where to start, check out the welcome message for some clarification.
I’ll post again when I complete some of the machine learning activities!
]]>Podcast Video Playlist:
Youtube playlist of interview videos
More about the Data Science Learning Club:
Data Science Learning Club Welcome Message
Learning Club Activity 5: Naive Bayes Classification
Data Science Learning Club Meet & Greet
Resources/topics mentioned by Clare in the interview:
Management Science and Engineering
Markov Chains
Science, Technology, and Society at Stanford
A Challenge to Data Scientists (blog post Renee mentioned)
Mattermark
Product Management
Machine Learning
Open Source Data Science Masters
Nate Silver’s book The Signal and the Noise
Linear Algebra (on Khan Academy)
Bill Howe’s Introduction to Data Science Coursera Course
Recurrent Neural Nets
Bayesian Networks
Open Source Data Science Masters on GitHub (pull requests welcome!)
summer.ai (Update 2/15 – Clare’s company is now Luminant Data, Inc.)
@ClareCorthell on twitter
Other links:
SlideShare Slides about Open Source Data Science Masters
Talk Clare gave at Wrangle Conference about AI Design for Humans
]]>In Episode 2 of the Becoming a Data Scientist Podcast, we meet Safia Abdalla, who started programming and even exploring machine learning and natural language processing as a teenager, and is now a student at Northwestern University, a conference speaker and trainer, co-organizer of PyLadies Chicago, and a contributor to Project Jupyter.
Podcast Audio Links:
Link to podcast Episode 2 audio
Podcast’s RSS feed for podcast subscription apps
(I will distribute the feed out to iTunes and Pocket Cast ASAP. It’s available on Stitcher now!)
Podcast Video Playlist:
Youtube playlist where I’ll publish future videos
More about the Data Science Learning Club:
Data Science Learning Club Welcome Message
Learning Club Activity 2: Creating visuals for exploratory data analysis
Data Science Learning Club Meet & Greet
Here are the links to things Safia references in the video:
information retrieval
Introduction to Information Retrieval by C. D. Manning, P. Raghavan, H. Schütze
natural language processing
NLTK
machine learning
Northwestern Neuroscience and Robotics Lab
pyladies
Chicago PyLadies Meetups
mathematicalmonk’s YouTube series on machine learning
@captainsafia on twitter
Safia’s website
Safia’s blog
JupyterDay Chicago 2016 (post by Safia on jupyter.org)
Jupyter documentation
In this episode we meet Will Kurt, who talks about his path from English & Literature and Library & Information Science degrees to becoming the Lead Data Scientist at KISSmetrics. He also tells us about his probability blog, Count Bayesie, and I introduce Data Science Learning Club Activity 1. Will has some great advice for people learning data science!
Podcast Audio Links:
Link to podcast Episode 1 audio
Podcast’s RSS feed for podcast subscription apps
(I will distribute the feed out to sites like iTunes and Stitcher this week)
Podcast Video Playlist:
Youtube playlist where I’ll publish future videos
More about the Data Science Learning Club:
Data Science Learning Club Welcome Message
Learning Club Activity 1: Find and explore a dataset
Data Science Learning Club Meet & Greet
Here are the links to things Will references in the video:
Library and Information Science
Andrew Ng’s Machine Learning course on Coursera
probabalistic graphical models
Count Bayesie blog
Count Bayesie – Parameter Estimation and Hypothesis Testing
Donald Knuth
Literate programming
Claude Shannon’s Mathematical Theory of Communication
Count Bayesie – Measure Theory
Bayes’ Theorem with Lego
Voight-Kampff and Bayes Factor
Black Friday Puzzle – Markov Chains
Zen Buddhism concept of “beginner’s mind”
Count Bayesie Recommended Books on Probability and Statistics
]]>In this episode, I talk a little about the podcast, I talk about my own background, and I introduce the Data Science Learning Club. Enjoy!
(Note: Episode 1, the first interview episode, comes out Monday 12/21!)
Podcast Audio Links:
Link to podcast Episode 0 audio
Podcast’s RSS feed for podcast subscription apps
(I will distribute this out to sites like iTunes and Stitcher soon)
Podcast Video Playlist:
Youtube playlist where I’ll publish future videos
More about the Data Science Learning Club:
Blog post about Data Science Learning Club
Learning Club Activity 0: Set up your development environment
Data Science Learning Club Meet & Greet
Here are the links with more info of things I reference in the video:
turtle logo programming language
carmen sandiego
lemmings
SimCity
JMU Integrated Science and Technology (ISAT)
Visual Basic/VB.NET/ASP.NET
MS Access
PL/SQL
Oracle Data Warehouse
IBM Cognos
CGEP UVA Systems Engineering
Systems Engineering
Linear Algebra at Khan Academy
Stochastic Simulation
Optimization
Cognitive Systems Engineering
Principles of Data Visualization for Exploratory Data Analysis
Machine Learning
Naive Bayes
K-Means
Pattern Recognition and Machine Learning (class textbook)
Summer of Data Science
API and Market Basket Analysis
Jupyter
Docker and Jupyter
Doing Data Science by Cathy O’Neill and Rachel Schutt
O’Reilly Data Science Books
(I’ll post more specific books later)
At the end of each podcast episode, I’ll be “assigning” a “Learning Activity” for the Data Science Learning Club. So that is starting tomorrow, too! There won’t be anyone teaching the content, but we’ll be exploring it together for 1-2 weeks between podcast episodes (usually 2 weeks). I’ll post some resources to get everyone started and help out data science beginners, then we’ll each explore the activity on our own with whatever tools and techniques we choose, and we can post our results so we can all learn from one another. If anyone gets stuck, you can post a question to the forum and hopefully someone will be able to help you through it.
I just got the Data Science Learning Club forum set up today, and it’s at this URL: https://www.becomingadatascientist.com/learningclub
Go check it out, register so you can participate, read the Welcome thread, and introduce yourself in the Meet & Greet section! Then tomorrow, the first learning activity will launch and you can get started.
I’m so excited about launching this podcast and data science learning club, and hope this turns out to be a valuable experience for all of us! Keep an eye out on the blog for the podcast post, which should go up tomorrow!
Renee
]]>