Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the twentytwentyone domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home1/moderna7/public_html/wp-includes/functions.php on line 6131

Warning: Cannot modify header information - headers already sent by (output started at /home1/moderna7/public_html/wp-includes/functions.php:6131) in /home1/moderna7/public_html/wp-includes/feed-rss2.php on line 8
learning data science – Becoming A Data Scientist https://www.becomingadatascientist.com Documenting my path from "SQL Data Analyst pursuing an Engineering Master's Degree" to "Data Scientist" Sat, 05 Oct 2019 04:22:55 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 Summer of Data Science Goal-Setting https://www.becomingadatascientist.com/2018/06/06/summer-of-data-science-goal-setting/ https://www.becomingadatascientist.com/2018/06/06/summer-of-data-science-goal-setting/#respond Wed, 06 Jun 2018 06:14:49 +0000 https://www.becomingadatascientist.com/?p=1559 The purpose of the Summer of Data Science is to learn a specific topic or complete a project or read a book or finish a course so you can check something off of your long data science “to learn” list (get used to it being long, data scientists always have more to learn, so it never gets shorter!), and have fun achieving goals along with other data science learners during a fixed period of time. The deadline should be motivating, to get you to start and finish something before the summer is over.

Week 1 was all about brainstorming ideas and gathering resources – dreaming up what you’d love to learn, and finding content that will help you learn it.

Week 2 (which started yesterday, but don’t worry, jump in any time even if you see this blog post a month from now) is all about goal-setting.

You should set a #SoDS18 goal that’s lofty enough to excite and motivate you, but not so out of reach that you’ll never complete it and only get disheartened when halfway through the summer you realize you are only 10% of the way there.

I also want to make sure you know what makes a good goal. I like the definition used by the SMART approach:

Your goal should be

  • Specific
  • Measurable
  • Achievable
  • Relevant
  • Time-Bound

Instead of explaining each of these in detail (you can read more about it elsewhere on the internet), I’m going to give an example of things you can jot down for yourself for each of these, then an example summary tweet for 2 different #SoDS18 goals.

Let’s say the idea you had for what to learn this summer is “Start learning Python”, and the resource you found is DataQuest. Let’s turn that into a SMART goal:

Specific – Learn how to import, clean, and visualize data using python and pandas

Measurable – Complete all courses in the DatQuest Data Scientist Path

Achievable – I can spend at least 6 hours on this project every weekend, plus occasional weekday evenings, so I have enough time available to do the work [Note from Renee: I have not actually researched how long this course series would take to complete]. I have joined the #py4ds Slack community and will ask for help there and on DataQuest if I get stuck so I don’t get set far behind.

Relevant – I want to add python and pandas to my resume, and it’s my first step on my new path to becoming a data scientist, so it’s relevant to my career goals and I’m motivated to accomplish it.

Time-Bound – the Summer of Data Science ends on September 3, so I will finish this first goal by August 3 in order to have time to complete a small project during the last month of #SoDS18.

Example tweet to share this goal with the world:

My 1st#SoDS18 goal: I will learn to import, clean, and visualize data with python & pandas by spending 6-8 hours per week on the Data Scientist Path on DataQuest, and will complete it by August 3. I’ll ask in #py4ds Slack if I need help.

Or, if your idea is to “do a machine learning project using at least 2 different algorithms on some kind of dataset that could help people”. That can be converted to a SMART goal like:

Specific – Learn how to use random forest and logistic regression in R by experimenting with data from the Kaggle DonorsChoose.org Dataset to develop a list of donors to email about a particular type of project request

Measurable – I will complete exploratory data analysis on the available DonorsChoose data files and write a blog post about my findings that includes at least 3 visualizations. Then I will find out what it means to submit a Kaggle Kernel, build 2 machine learning models using random forest and logistic regression algorithms and compare their model evaluation metrics to each other, submit the Kernel (even if the contest period is over), and find and study at least 2 other people’s submissions to understand different approaches to the problem. Then I will write another blog post summarizing my results and findings.

Achievable – I have read about random forest and logistic regression online, and my friend gave me the Introduction to Statistical Learning book so I can better understand these machine learning algorithms. I have a bunch of resources bookmarked online in case I need extra references to understand the book. I will tweet using the #rstats hashtag or talk to my friend if I need help. If I find out the dataset I found isn’t great for learning these 2 algorithms, I will search for another dataset as needed. I can dedicate 2 hours a day 4 days per week to working on the project and researching these topics.

Relevant – I started learning R over the last year and have used it to complete labs at school, but want to expand my machine learning capabilities and apply my skills to a real-world dataset before I start applying for jobs in the fall.

Time-Bound – I have 12 weeks to complete the project this summer.

Example tweet

My #SoDS18 goals are to:
-explore the DonorsChoose Kaggle dataset
-use ISL book & online resources to learn to build random forest and logistic regression models
-create and submit a Kaggle Kernel to help DonorsChoose
-write at least 2 blog posts about it over the next 12 weeks

I think you get the idea!

I should also mention that you don’t want to over-plan. Notice the note about switching datasets if one doesn’t work out – plan to be flexible! You don’t yet know what you’re getting into, and you might need to find more time finding good resources to learn, getting help, or pivoting if your original plan doesn’t work out. That’s OK! Just go with the flow and try to achieve something comparable to your initial goal. But, you need an initial goal in order to figure out where you are relative to it!

So, finish brainstorming your learning ideas and finding resources this week, then narrow it down to a SMART goal, and tweet about it with the #SoDS18 hashtag so we know what you plan to learn during the Summer of Data Science 2018!

And if you’re still looking for project ideas, check out Mara Averick’s post, browse the #SoDS18 hashtag, or join a data science learning community! (More about this in another blog post later this week!)

 

]]>
https://www.becomingadatascientist.com/2018/06/06/summer-of-data-science-goal-setting/feed/ 0
T-Shirts!! https://www.becomingadatascientist.com/2017/02/18/t-shirts/ https://www.becomingadatascientist.com/2017/02/18/t-shirts/#respond Sat, 18 Feb 2017 20:55:46 +0000 https://www.becomingadatascientist.com/?p=1372 MarchApril 1 using this link: Becoming a Data Scientist Store – Free Shipping, you’ll get free shipping on your order! The design is a combination of those submitted to our contest by Amarendranath “Amar” Reddy and Ryne & Alexis. combined_shirt_final]]> The Becoming a Data Scientist tees are ready to sell! I ordered a couple myself before posting them for sale, to make sure the quality was good. They came out great!! And if you order from Teespring before MarchApril 1, 2017 using this link: Becoming a Data Scientist Store – Free Shipping, you’ll get free shipping on your order!

(Readers told me that the link above doesn’t discount at all for International shipping, so if you are outside the US, use this link for $3.99 off – equivalent to US Shipping cost)

combined_shirt_final

The design is a combination of those submitted to our contest by Amarendranath “Amar” Reddy and Ryne & Alexis. You can see their design submissions and read more about them on the finalists post! They are each receiving prizes for being selected. Thanks Amar, Ryne, and Alexis for the awesome design!

There are a variety of styles and colors available. The Premium Tee is 100% cotton. The Women’s Premium is a 50/50 cotton/poly blend, and is cut to fit more snugly. They are available in navy blue, gray, purple, and black. There’s even a long-sleeve version!

I make anywhere from $2-$7 on each order (it’s print-on-demand, so not cheap enough for me to make a significant profit yet, and my proceeds will be lower with the free shipping offer, but I want to reward those of you who are excited to flaunt your Becoming a Data Scientist status!) and every dollar earned from these will be going to the fund that helps support my new small team of assistants, who you’ll meet soon! Also, the more of them I sell, the lower the cost to print is per shirt, so please share with all of your friends!

20170214_232527 20170214_234400
Here are photos of me wearing the shirt, but this was before I made the front design slightly smaller (so it doesn’t wrap into armpit), and I moved the back design slightly higher and also made the gray dots (data points?) transparent so the color of the shirt will show through there now (see store images above for current design). You can see that the teal came out as a lighter blue in printing. This is the “Premium Tee” style in “New Navy”.

Here’s a model wearing a simulated version of the shirt.
d7748767dda4e3e

Order yours here, with Free Shipping Until March 1!

Update: Kids sizes now available, too!
(the design is on the front for kids’ shirts)
71040dc1d98e886

]]>
https://www.becomingadatascientist.com/2017/02/18/t-shirts/feed/ 0
Becoming a Data Scientist Podcast Episode 13: Debbie Berebichez https://www.becomingadatascientist.com/2016/07/14/becoming-a-data-scientist-podcast-episode-13-debbie-berebichez/ https://www.becomingadatascientist.com/2016/07/14/becoming-a-data-scientist-podcast-episode-13-debbie-berebichez/#respond Fri, 15 Jul 2016 02:52:29 +0000 https://www.becomingadatascientist.com/?p=1121
In this interview, we meet physicist Debbie Berebichez, who you might recognize from her TEDx talks, her appearances in Discovery Channel’s Outrageous Acts of Science and other TV shows! Debbie grew up in Mexico City and was discouraged by her family and teachers from studying science, but later went on to become the first Mexican woman to get a PhD in physics from Stanford, and is now Chief Data Scientist at Metis Data Science Bootcamp in New York. Podcast Audio Links: Link to podcast Episode 13 audio Podcast's RSS feed for podcast subscription apps]]>

In this interview, we meet physicist Debbie Berebichez, who you might recognize from her TEDx talks, her appearances in Discovery Channel’s Outrageous Acts of Science and other TV shows! Debbie grew up in Mexico City and was discouraged by her family and teachers from studying science, but later went on to become the first Mexican woman to get a PhD in physics from Stanford, and is now Chief Data Scientist at Metis Data Science Bootcamp in New York.

Podcast Audio Links:
Link to podcast Episode 13 audio
Podcast’s RSS feed for podcast subscription apps
Podcast on Stitcher
Podcast on iTunes

Podcast Video Playlist:
Youtube playlist of interview videos

More about the Data Science Learning Club:
Data Science Learning Club Welcome Message
Learning Club Activity 13: Show & Tell
Data Science Learning Club Meet & Greet

Links to topics mentioned by Debbie in the interview:
Metis Data Science Training
[more coming soon]

]]>
https://www.becomingadatascientist.com/2016/07/14/becoming-a-data-scientist-podcast-episode-13-debbie-berebichez/feed/ 0
Becoming a Data Scientist Podcast Episode 05: Clare Corthell https://www.becomingadatascientist.com/2016/02/14/becoming-a-data-scientist-podcast-episode-05-clare-corthell/ https://www.becomingadatascientist.com/2016/02/14/becoming-a-data-scientist-podcast-episode-05-clare-corthell/#respond Mon, 15 Feb 2016 04:13:03 +0000 https://www.becomingadatascientist.com/?p=900 Renee Teate interviews Clare Corthell, founding partner of summer.ai and creator of the Open Source Data Science Masters curriculum, about becoming a data scientist. Podcast Audio Links: Link to podcast Episode 5 audio Podcast's RSS feed for podcast subscription apps]]>

Renee Teate interviews Clare Corthell, founding partner of summer.ai (now Luminant Data) and creator of the Open Source Data Science Masters curriculum, about becoming a data scientist.

Podcast Audio Links:
Link to podcast Episode 5 audio
Podcast’s RSS feed for podcast subscription apps
Podcast on Stitcher
Podcast on iTunes

Podcast Video Playlist:
Youtube playlist of interview videos

More about the Data Science Learning Club:
Data Science Learning Club Welcome Message
Learning Club Activity 5: Naive Bayes Classification
Data Science Learning Club Meet & Greet

Resources/topics mentioned by Clare in the interview:

Management Science and Engineering
Markov Chains
Science, Technology, and Society at Stanford

A Challenge to Data Scientists (blog post Renee mentioned)

Mattermark
Product Management
Machine Learning

Open Source Data Science Masters
Nate Silver’s book The Signal and the Noise

Linear Algebra (on Khan Academy)

Bill Howe’s Introduction to Data Science Coursera Course

Recurrent Neural Nets
Bayesian Networks

python

Google Prediction API

data cleaning

Open Source Data Science Masters on GitHub (pull requests welcome!)

summer.ai (Update 2/15 – Clare’s company is now Luminant Data, Inc.)
@ClareCorthell on twitter

Other links:

SlideShare Slides about Open Source Data Science Masters

Talk Clare gave at Wrangle Conference about AI Design for Humans

]]>
https://www.becomingadatascientist.com/2016/02/14/becoming-a-data-scientist-podcast-episode-05-clare-corthell/feed/ 0
Becoming a Data Scientist Podcast Episode 03: Shlomo Argamon https://www.becomingadatascientist.com/2016/01/18/becoming-a-data-scientist-podcast-episode-03-shlomo-argamon/ https://www.becomingadatascientist.com/2016/01/18/becoming-a-data-scientist-podcast-episode-03-shlomo-argamon/#respond Mon, 18 Jan 2016 06:08:17 +0000 https://www.becomingadatascientist.com/?p=846 In Episode 3 of the Becoming a Data Scientist Podcast, we meet Shlomo Argamon, who is the founding director of the Master of Data Science program at Illinois Institute of Technology. He talks to us about his path to data science, including research in robotic vision and natural language processing, we discuss the traits of a good data science student, and he gives some advice for those of us learning data science. ]]> Note: The video is the interview only. The audio podcast has the intro, interview, and data science learning club activity explanation.

In Episode 3 of the Becoming a Data Scientist Podcast, we meet Shlomo Argamon, who is the founding director of the Master of Data Science program at Illinois Institute of Technology. He talks to us about his path to data science, including research in robotic vision and natural language processing, we discuss the traits of a good data science student, and he gives some advice for those of us learning data science.

Podcast Audio Links:
Link to podcast Episode 3 audio
Podcast’s RSS feed for podcast subscription apps
Podcast on Stitcher
Update 1/19: You should be able to find it on iTunes now!

Podcast Video Playlist:
Youtube playlist of interview videos

More about the Data Science Learning Club:
Data Science Learning Club Welcome Message
Learning Club Activity 3: Business Questions and Communicating Data Answers [to be updated Monday]
Data Science Learning Club Meet & Greet

Here are the links to things Shlomo references in the video:

Illinois Institute of Technology – Professional Master of Data Science Degree

punchcards

machine vision
robotic mapping
Google Scholar Search for Shlomo Argamon’s publications related to robotics
“Passive map learning and visual place recognition” Doctoral Dissertation [ps.gz from yale]

probability theory
probability distributions
statistical inference
bayesian statistics

Kaggle competitions

Natural Language Processing (NLP)
Google Scholar Search for Shlomo Argamon’s publications related to language
“Automatically Categorizing Written Texts by Author Gender” [Moshe Koppel, Shlomo Argamon, and Anat Rachel Shimoni]

Weka
scikit-learn
Natural Language Toolkit (nltk)

sentiment analysis

Ethics in Data Science at IIT
Becoming a Data Scientist – A Challenge to Data Scientists (re: bias)

@ShlomoArgamon on Twitter

]]>
https://www.becomingadatascientist.com/2016/01/18/becoming-a-data-scientist-podcast-episode-03-shlomo-argamon/feed/ 0