Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Activity: Web Scraping & APIs - Activity Description & Resources
#1
I'll update this with more info later, but in the meantime, here are some learning resources!

Web Scraping:
Web Scraping (wikipedia)
What is Web Scraping? video
Web Scraping Tutorial video (python)
Webscraping with R video
Chapter on Web Scraping in "Automate the Boring Stuff with Python" book
beautiful soup python library
Rvest R library for web scraping by Hadley Wickham on github

APIs:
API (wikipedia)
REST API concepts and examples video
Becoming a Data Scientist posts related to APIs
An Introduction to APIs on Zapier
Importing Data into R DataCamp course (final chapter includes HTTP requests)
requests HTTP python library
Working with APIs on r-bloggers
The Becoming a Data Scientist Podcast Data Science Learning Club is now sponsored by Data CampSee this thread for more info and a coupon. (must be logged-in to view)
Reply
#2
Really great Activity, I like web scraping!  Big Grin
I have a few links to share too:
Mashape - API Marketplace
Programmable Web API Directory
The blog thinktostart.com has several tutorials on analyzing different platforms via their API:  I think also the Twitter API would be very interesting.

Since you mentioned the package rvest, I hope it's OK to link to my blog post Use rvest to scrape NFL weather data
Also my blog post Finding data sets Part 1: General data sources because I mention APIs and web scraping.

For AJAX driven websites (that load their data with javascript) it might be necessary to use something like http://phantomjs.org/ for scraping the data.

If anyone has questions regarding web scraping I might be able to help.
You can follow my learning club progress and get R tips here.
Reply
#3
Awesome, thanks for sharing these additional links, Verena!
The Becoming a Data Scientist Podcast Data Science Learning Club is now sponsored by Data CampSee this thread for more info and a coupon. (must be logged-in to view)
Reply
#4
Just an FYI to anyone interested in the LinkedIn with R activity, the Rlinkedin R package associated with the posting has since lost most of its capabilities (i.e. most functions no longer work) due to changes in LinkedIn's API as of May 2015. See Author's Github page for Rlinkedin Here

I initially tried this activity and the most functionality I could get using the package was accessing my basic profile info (current position, jobs held, education, etc). 

Thanks for all the additional ideas, however, Verena!
Reply
#5
Also, I just read about this today on Kaggle which may be of interest to others:
2015 NFL Play-by-Play Dataset (nflscrapR R package)

Description:
The dataset made available on Kaggle contains all the regular season plays from the 2015-2016 NFL season. The dataset contain 46,129 rows and 63 columns. Each play is broken down into great detail containing information on; game situation, players involved and results. Detailed information about the dataset can be found in the nflscrapRdocumentation.
Detailed NFL Play-by-Play Data 2015
Reply
#6
Thanks for the hint, that's definitely interesting for me!
You can follow my learning club progress and get R tips here.
Reply
#7
Here's some code w/Pandas for the Twitter API

https://www.opendatascience.com/blog/usi...-245843805
The Becoming a Data Scientist Podcast Data Science Learning Club is now sponsored by Data CampSee this thread for more info and a coupon. (must be logged-in to view)
Reply
 


Forum Jump:


Users browsing this thread: 1 Guest(s)

About Becoming A Data Scientist

BecomingADataScientist.com is a blog created by Renee Teate to track her path from "SQL Data Analyst pursuing an Engineering Master's Degree" to "Data Scientist". She created this club so participants can work together and help one another learn data science. See her other site DataSciGuide for more learning resources.

Sponsored by DataCamp!