Introduction to Data Visualization, Web Scraping, and Text Analysis in R Summer School


Summer School

Aims

Many researchers rely on data that are obtained from a wide variety of online sources, including web sites, social media, and external data providers. This course introduces you to procedures for collecting, preparing, analysing, and visualising such data. Participants will learn about core ideas in data visualization, web scraping, and text analysis while gaining practice writing, debugging, and tracking changes to code in R.

The main objectives of this course are the following:

  • To be able to write code in R in order to obtain, prepare, analyse, and visualise data obtained from online sources
  • To be able to monitor and manage the various steps of data collection and analysis for both integrity and replication purposes
  • To help you become a more productive (taking less time to analyse your data) and careful (making fewer mistakes) scientist

Information

There are four sessions of 4 hours each taking place on two days. Sessions will include a mix of brief lectures, coding demonstrations, and in-class exercises. You will need to bring a laptop to these sessions on which you have the necessary rights to install software. Students will work with data sets supplied for the course, as well as obtain their own data from the Internet by applying what they have learned in the course.
Day 1 (Monday, 3 July)
1. Introduction
•    Create, edit, and compile an R-markdown file that contains both a free text discussion of your data analysis, your code, and any output from that code (including plots)
•    Build an R-markdown file that collects data from an online source, performs a few basic manipulations, and plots the results
•    Use git (version control software) to track changes to this markdown file over time
2. Acquiring, preparing, and visualizing data
•    Write code to acquire data from files located on the web or stored on your local computer, load them into R, and clean the data in preparation for further analysis.
•    Learn the powerful yet simple "grammar" for visualizing data implemented in the ggplot2 R package
•    Learn many of the psychological principles behind effective data visualization
Day 2 (Friday, 7 July)
3. Obtaining data from web sites and social media
•    Acquire data from online sources, including web pages and the Twitter API, and automate its collection
•    Further practice preparing, analyzing, and visualizing these data in the context of your own research interests
4. Text and sentiment analysis
•    Process large amounts of unstructured data (e.g. text documents), extract important features (e.g., the occurrence of special words), and summarize results
•    Perform basic types of text analysis, including sentiment tagging and topic identification

Assessment

Sessions are both iterative and cumulative, hence attendance for all four sessions is mandatory. During sessions, you will work on exercises allowing you to practice new skills. These exercises will not be graded, but their completion is mandatory. Between sessions, you will complete additional exercises based on your own research interests. Students will also review and replicate each other's code.

Additional info

Students are expected to satisfy the following entry requirements:

•    Prior experience writing code in the R programming language (please visit www.jasonmtroos.com/learning-r for a list of resources for learning R).


•    Use of a laptop computer with current versions of R, RTools (Windows only), and RStudio already installed
------------------------------

For the timetable of this course, please click here.

------------------------------

To register, ERIM participants can take the following steps:
1. Go to SIN Online and log in with your ERNA credentials if required.
2. Click in the checkbox next to the course title and click Save Changes.
3. Your registration is complete. You will receive an automatic confirmation e-mail.

External (non-ERIM) participants are welcome to this course. To register, please fill in the registration form and e-mail it to summerschool@erim.eur.nl by 4 weeks prior to the start of the course. Please note that the number of places for this course is limited.

This course is free of charge for ERIM members (faculty members, PhD candidates and Research Master students). For external participants, the course fee is 250 euro per ECTS credit.