twentytwentyone domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home1/moderna7/public_html/wp-includes/functions.php on line 6131Recorded at Tom Tom Fest Applied Machine Learning Conference in Charlottesville, VA on April 11, 2019.
Episode 17 Audio
@therriaultphd on twitter
Data and Democracy
This O’Reilly ebook by Andrew Therriault explores how political data science helps to drive everything from overall strategy and messaging to individual voter contacts and advertising.
Data Security for Data Scientists by Andrew Therriault on Medium
Ten practical tips for protecting your data (and more importantly, everyone else’s!)
World premiere
#becomingadatascientist podcast by the famous and fabulous, with guest, Andrew Therriault of @facebook @BecomingDataSci#AMLCville #datascience #machinelearning @UVADSI @TomTomFest pic.twitter.com/2EwjsUwULL
— Data Science Connect (@DataScienceATL) April 11, 2019
]]>Our first ever live audience for Becoming a Data Scientist podcast at #AMLCville! With @therriaultphd! pic.twitter.com/JA1RpiKq0u
— Data Science Renee (@BecomingDataSci) April 11, 2019
Here is the link to the previous post, which has a pdf version of the slides that’s almost identical, and a video from when I gave this talk at Tom Tom Fest Applied Machine Learning Conference in 2018.
Here’s the blog post that was the start of researching this topic, for me.
Here’s the twitter list of people who talk about Ethics & Law in AI/ML.
And here’s the flipboard magazine where I collect articles on this topic.
Enjoy!
]]>I plan to come back and add more info here in the future, but for now, here is a PDF version of my slides:
My Journey from Advancement Data Analyst to Data Scientist
My interest in this topic started long ago, and I partially based this talk off of my blog post “A Challenge to Data Scientists” from 2015. There are a ton of links throughout, and I included the slide notes so you have those along with the presentation (I’m not sure why all of the URLS aren’t automatically linked, so you have to copy and paste some, sorry.) I’m prepping for another presentation right now and don’t have time to write a whole lot about it – so without further ado, here are the PDF files with the slides and the notes:
Just noticed the link above still doesn’t contain all of the notes and links… I need to figure out how to save that to PDF in the right format from Powerpoint. For now, here’s the full slide + notes view with all links. Just don’t print it – it’s almost 100 pages long!
SLIDES WITH NOTES & LINKS
Update: There’s a video on YouTube of this talk and the panel that followed!
]]>Here’s a video of me explaining the analysis:
A few notes as I skim through:
Here are all of the episodes, so you can go back and listen to any you missed!
You can download the HTML versions of my Jupyter notebooks, and also play with the Tableau dashboards at these links:
“Clean” version of the Jupyter notebook
Full messy analysis Jupyter notebook
Listen monitoring Tableau dashboard
Interactive episodes by week Tableau dashboard
If you have suggestions for how to do the code in a more sensible way than how I rushed and did it, or if you have any questions, feel free to add suggestions in the comments below!
]]>Episode Audio (mp3) – also available on iTunes, Stitcher, etc.
(note, there is no video for this episode)
On the panel:
]]>The talk was recorded and video should be out within a few weeks!
Here are the slides: Becoming a Data Scientist – Advice from my Podcast Guests
and the slide notes.
Update 10/26: Here is the recording of my talk, with a playlist of other talks from PyData DC!
]]>In this episode, I talk a little about the podcast, I talk about my own background, and I introduce the Data Science Learning Club. Enjoy!
(Note: Episode 1, the first interview episode, comes out Monday 12/21!)
Podcast Audio Links:
Link to podcast Episode 0 audio
Podcast’s RSS feed for podcast subscription apps
(I will distribute this out to sites like iTunes and Stitcher soon)
Podcast Video Playlist:
Youtube playlist where I’ll publish future videos
More about the Data Science Learning Club:
Blog post about Data Science Learning Club
Learning Club Activity 0: Set up your development environment
Data Science Learning Club Meet & Greet
Here are the links with more info of things I reference in the video:
turtle logo programming language
carmen sandiego
lemmings
SimCity
JMU Integrated Science and Technology (ISAT)
Visual Basic/VB.NET/ASP.NET
MS Access
PL/SQL
Oracle Data Warehouse
IBM Cognos
CGEP UVA Systems Engineering
Systems Engineering
Linear Algebra at Khan Academy
Stochastic Simulation
Optimization
Cognitive Systems Engineering
Principles of Data Visualization for Exploratory Data Analysis
Machine Learning
Naive Bayes
K-Means
Pattern Recognition and Machine Learning (class textbook)
Summer of Data Science
API and Market Basket Analysis
Jupyter
Docker and Jupyter
Doing Data Science by Cathy O’Neill and Rachel Schutt
O’Reilly Data Science Books
(I’ll post more specific books later)
Thanks to Orlando and Herbierto for having me on!
(P.S. I did put up the post about Data Sources on DataSciGuide)
]]>Check it out! https://flipboard.com/@becomingdatasci/becoming-a-data-scientist-5ktft1lky
]]>Principles of Data Visualization for Exploratory Data Analysis [presentation – pdf]
Principles of Data Visualization for Exploratory Data Analysis [paper – [pdf]
Check out the references in both documents for some good resources. I’ll include some links in the post below, too. I had a lot more material from my research that I wanted to include and just didn’t have time to in a 15-minute presentation! The professor was happy about the topic I picked because she’s teaching a class on Data Visualization next semester, so I think that worked out in my favor :)
These two books by Stephen Few covered the very basics of visualization for human perception:
Show Me the Numbers: Designing Tables and Graphs to Enlighten
Now You See It: Simple Visualization Techniques for Quantitative Analysis
Blog posts about related topics:
Six Revisions: Gestalt Laws
eagereyes: Illustration vs Visualization
Detailed visualization of NBA shot selection
Publications and articles:
IEEE Transactions on Visualization and Computer Graphics
Toward a Perceptual Science of Multidimensional Data Visualization: Bertin and Beyond by Marc Green, Ph. D.
Scagnostics by Dang and Wilkinson
Generalized Plot Matrix (GPLOM) by Im, McGuffin, Leung
UpSet: Visualization of Intersecting Sets by Lex, Gehlenborg, Strobelt, et al.
…and there are more resources in the paper and presentation files! (and if you’re REALLY interested in this topic, post a comment and I will add even more links I have bookmarked)
I also did a project using some data from my day job related to university fundraising and major gift prospects, but unfortunately I can’t share that study here because I don’t have permission to do so. It included some cool visuals like bubble charts, and also an interesting analysis of movement through the prospect pipeline using Markov Chains. I learned a lot doing that one!
It was nice to end my final semester of grad school with two data-related projects! (Yes, I’m finally graduating! Masters of Systems Engineering! woo hoo!)
]]>This podcast about machine learning is educational and, though academic, is pretty accessible to people interested in learning more about the field even if you’re new to it. It is executive produced and co-hosted by Katherine Gorman along with co-host Ryan Adams, an Assistant Professor of Computer Science at Harvard.
They start out by interviewing attendees and presenters from the NIPS (Neural Information Processing Systems) conference, including Hannah Wallach and Max Welling, among others.
Today, I listened to this episode with Charles Sutton, who covered some interesting topics such as using machine learning and natural language processing on computer code for tasks such as understanding how different programmers involved in open source projects name variables, and suggesting naming conventions to new project participants. The larger goals of the research were really interesting to me, and Charles Sutton was really clear and easy to understand, even though he was touching on some heavy concepts.
I also like how the show answers questions submitted by listeners each episode.
This show has two director/developer/data scientists from Ushahidi, Chris Albon and Jonathan Morgan, who talk about recent data science items in the news, and chat about the implications and add their opinions. I like that they link to the news articles on the podcast site so you can read up on what they’re referring to.
Honestly, I didn’t enjoy this one as much as I did the others. The episode I listened to started out with each of them explaining what beer and wine they were drinking, and how much they had had, and some inside joking and laughing, which already made me cringe a bit (their latest episode is called “morning drinking edition”), but I wanted to hear them out. They talk about plenty of interesting topics, but I had already read most of the articles they referred to via twitter. They had some good insights, including a discussion about Uber and data collection (and data selling) by companies in general, and some interesting food for thought about what all of that means for us in the future, but overall, I found their “bro-y” banter a bit annoying.
However, if you don’t get a chance to keep up with data science in the news, or you enjoy feeling like you’re hanging out with some guys from school drinking and chatting about data science topics, then definitely give it a listen. I imagine a lot of people would like their style more than I did – just not my thing.
TED Radio Hour is an NPR production where they take TED talks and group them by topic, then Guy Raz interviews the speakers and basically refactors the talks so they tell an overarching story as a group and sound good on radio. The episode I wanted to point you to is “Solve for X” because it made me think about math in a fun way, and they do incorporate some talks about machine learning algorithms into this one as well. This podcast is one of my go-tos when I want to learn something interesting that is presented in a fun and curious way.
Invisibilia isn’t a tech podcast, but does sometimes talk about technology. The episode, “Our Computers, Ourselves” tells the stories of some interesting people that “let technology go to their heads”, I guess you can say. The only thing I don’t like about the show is that sometimes think the intelligent co-hosts Alix and Lulu purposely make themselves come across ditzier than necessary, but overall I highly recommend checking this one out. Also take a listen to the episode “How to Become Batman” that might change your mind about how expectations impact outcomes, even in scientific research.
Last but not least, I happened upon NPR’s Snap Judgment podcast with Glynn Washington because it had an episode called “Artificial Intelligence” that came up in a search. It turned out not to be about AI in the sense that developers think about AI, but it has some great storytelling involving human-computer interactions, and really made me think about the human side of all of this technology we are creating. This is a fun one that also gets deep, check it out!
Here are some that I have not yet listened to:
Tell me if you listen to any of these and what you think, or if you have any additional recommendations – I still have a few weeks of commuting to school remaining, and would love to learn what you love to listen to!
Update 4/24/15:
I have now listened to an episode of each of the podcasts I linked at the end of the post above, and here’s what I thought:
Data Stories: I listened to the episode where hosts Enrico Bertini (who commented on this post!) and Moritz Stefaner interviewed Jen Christiansen from Scientific American. She shared information about her history designing information visualizations for different publications and leading design teams, and detailed the process of creating visuals for Scientific American, which means balancing between making the graphics accessible to a general audience, while also satisfying the scientific readers. Really interesting! I will be listening to more Data Stories in the future.
Linear Digressions: This is the one from Udacity with Katie Malone and Ben Jaffe, and I listened to the 2 episodes about Hidden Markov Models, where they invited a guest (who was a listener!) to explain his work with HMMs. It was really interesting to learn about, with the right balance of technical info and accessibility for beginners, and I will listen to more episodes! The only issue I had with this one was the sound quality. I was surprised it was really hard to hear the guest at times, since they were supposedly in the studio of a company that produces online videos. So, my only suggestion to them would be to work on the volume levels and audio in general to make it more professional sounding.
The Data Skeptic: I listened to the episode about Computer-Based Personality Judgments, where Kyle Polich interviewed Youyou Wu about her research into predicting personality traits using Facebook likes, then he and his wife Linhda responded to the “magic sauce” results based on their own profiles. It was an interesting topic, and I really like how they ask the guest to suggest further reading of their own and others’ work, and then link to everything on their blog. I had a hard time finding info like Kyle’s last name, so I hope they put more info on their website’s home page and make the site a little more inviting, because the podcast itself seems very accessible and fun!
Additionally, I found two more podcasts on Matt Fogel’s post about data science and machine learning podcasts:
Learning Machines 101: I started listening to the episode about “How to Learn Statistical Regularities using MAP and ML Estimation” and honestly I didn’t make it more than 5 minutes into the recording. First, the intro is very long and you don’t get to the content until almost 2 1/2 minutes in, then it sounds like the host is just lecturing by reading from a piece of paper and over-enunciating so it sounds like you are being “talked at”, and it was not engaging to me because of that speaking style. So, I’m sure it has some interesting content, but I just couldn’t focus on it while driving. Sorry!
I haven’t had a chance yet to listen to the O’Reilly Data Show Podcast with Ben Lorica, but the little I heard sounded promising and I’m going to get back to it when I drive to class (for the last week, yay!) next week, so I’ll update here again later.
]]>So hello, I’m Renee Marie Parilak Teate, and I’m becoming a data scientist :)
I’m about to give a talk/Q&A for our local Girl Develop It! Central Virginia chapter via Google Hangout on March 18 at 7pm EST as part of their “Day in the Life of” series featuring women in technical roles. Here’s the link:
Day in the Life of a Data Analyst PLUS “What is Data Science?” with Renee Teate
I would love for any ladies following this blog to attend and say hi!
Update
I think the meetup went well! (Other than issues with Google Hangouts On Air not allowing as many people as I read it would allow… I need to figure out how to include “viewers” that aren’t in the limited set of “video participants” for the future.) I was nervous and ad-libbing a lot of it, trying to balance between making it understandable for beginners, but including some info for the more advanced viewers. It’s tough giving a talk like this for a general audience. I had fun, though! I’ve gotten good feedback from a few attendees, and we also discussed a future meetup focusing on Geospatial/GIS data, so that was exciting!
Here is the powerpoint from the data science part of my talk: What is Data Science? (PDF)
The books and some of the courses we talked about are listed here, and there are links to the ones I have reviewed: Learning page
For those of you that weren’t able to attend, there is a recording of the meetup here: YouTube
I’ll list more links in the comments as I come across them this week!
]]>I guess they liked the talk, which was 30-40 minutes, because we ran right up to the time another group needed to use the classroom, and several students stayed afterward to ask more questions in the hallway. Their feedback ranged from “I had never really considered data science, but your talk had me interested, and now I want to look into it!” to another student who already won a machine learning competition in his computer science class. Students shared their data project ideas with me, and asked about whether they were on the right track to possibly have a future in data science. One student took a photo of the books I suggested reading, and I hope several more of them come here afterwards to download the slides with all of the references and links. (I wish I had them done far enough in advance to have this post ready!)
If you are here and you attended my talk, please leave any feedback or questions in the comments below! I may be giving it again, so suggestions are encouraged!
Here is a link to the slides (PDF, so no animations, but the links should work): What is Data Science?
And the companion notes with extra info: Companion Notes
I was excited to get to share my love of data, and get the students thinking about “big data”, how much data they generate, privacy issues, the vast possibilities of data analysis and data science, and a possible future in this field!
Students: I have removed my full name and personal email from the slides for publication here, so just leave a message below in the comments if you want to get in touch, and I’ll pass my info on to you individually. (Don’t type your email in the comments, I should be able to see it behind the scenes.)
]]>