Scraping extracts the data hidden in documents such as PDFs and web pages and makes it usable for further analysis. Scraping does not have to be complicated, especially if you are using the Google Chrome browser which has add-ons that make data scraping simple. The School of Data has a course that takes you through the ‘5 minute’ process to transform trapped data into a usable format.
- School of Data: Scraping - School of Data’s course on scraping data: http://schoolofdata.org/handbook/courses/scraping/. This element of the School of Data’s course goes through exactly how to scrape data using the Google Chrome extension: http://schoolofdata.org/handbook/recipes/scraper-extension-for-chrome/
- Data Journalism Handbook - the Data Journalism Handbook goes through the process of data scraping with helpful tips, tools and screenshots to demonstrate the more complicated aspects of scraping: http://datajournalismhandbook.org/1.0/en/getting_data_3.html
- ScraperWiki - this allows you to easily scrape data from a variety of different programming languages without having to set up a programming environment on your computer and so saves a lot of time: https://scraperwiki.com/
- ReadThemAll - this allows you to download many files at once which can speed up the process of scraping data: http://www.downthemall.net/