Web Data Scraping Using R


Speaker


Abstract

The biggest data source is the Internet. Web Scraping describes the technique that extracts information from websites. The R environment for statistical computing offers various tools that allow to read data from the Web and to organize these data (text and numeric data) for subsequent statistical analyses. A comprehensive platform for the corresponding R packages is the Omega Project (http://www.omegahat.org/). This talk shows some applications we are currently working on at our Institute. These applications include Web data scraping from Google Trends, Wikipedia, political platforms, Google Maps, and others. It is shown how these data can be analyzed by means of statistical techniques such as, for example, Text Mining, Time Series Analysis, and Multidimensional Scaling.
 
The Seminars in Econometrics Series is supported by the Tinbergen Institute, ERIM and the Journal of Applied Econometrics.
http://www.econometric-institute.org/seminars
 
Contact information:

Michel van de Velden

Kees Bouwman

Email

Email