Web Data Scraping Using R


Patrick Mair
  • Speaker
Department of Statistics and Mathematics, Vienna University of Economics and Business Administration

Event Information

Type
Research Seminar
Programme
Finance
Date
Thu. 18 Nov. 2010
Contact
Time
16:00-17:00 hours
E-mail
Location
Tinbergen Building H10-31
Number


Abstract

The biggest data source is the Internet. Web Scraping describes the technique that extracts information from websites. The R environment for statistical computing offers various tools that allow to read data from the Web and to organize these data (text and numeric data) for subsequent statistical analyses. A comprehensive platform for the corresponding R packages is the Omega Project (http://www.omegahat.org/). This talk shows some applications we are currently working on at our Institute. These applications include Web data scraping from Google Trends, Wikipedia, political platforms, Google Maps, and others. It is shown how these data can be analyzed by means of statistical techniques such as, for example, Text Mining, Time Series Analysis, and Multidimensional Scaling.
 
The Seminars in Econometrics Series is supported by the Tinbergen Institute, ERIM and the Journal of Applied Econometrics.
http://www.econometric-institute.org/seminars
 
Contact information:
 

Michel van de Velden

Kees Bouwman

Email

Email

 
Michel van de Velden
Associate Professor of Statistics
  • Coordinator
Kees Bouwman
Kees Bouwman
  • Coordinator