Content Analysis


Aims

This course is designed to help participants learn modern text analysis techniques, with relevant content ranging from planning a study to publication. The topics will include a streamlined set of data science prerequisites, gathering and working with data, understanding how text becomes data, hands–on work with common techniques, and familiarization with some advanced techniques. The skill requirements assume essentially no prior training, though reasonable spreadsheet skills and some familiarity with one of the commonly–used commercial statistical systems is helpful.

After completing the course, participants should have a sound understanding of fundamental skills underlying text analysis and machine learning workflows, knowledge of particular sets of skills and their return on investment for researchers, hands–on experience with a number of common techniques, and a road map for selecting and learning further skills to carry out their research.

Information

1.       Prerequisites.To get started, participants will learn the basic skills for using modern statistical ecosystems (i.e. Python, with some examples in R and Stata), particularly those for data handling and using the many external packages that make sophisticated techniques very accessible. These topics include basic variable types, working with dates, importing and exporting data, and using external packages.

2.       Getting and handling data.Participants will work through examples reading in, preparing, and saving structured data (e.g., from typical archival datasets). In addition, they will work through a step–by–step example using web scraping to recover semi–structured data and generate structured data.

3.       Hands–on techniques.Using sample data, participants will transform text into quantitative data (i.e. feature engineering) and run some common forms of text analysis. Because feature engineering is a common foundation of many kinds of techniques, we will look at how these methods work, and their relative strengths and weaknesses. The text analyses include human–coded variables, dictionary methods, pre–trained machine learning models, and machine learning models for categorization and topic modeling.

4.       Familiarization.For a number of advanced techniques, I want participants to know that they exist, know what problems they can solve, have a brief example to walk through, and know where to look for more information outside of the course. These topics include automated data collection, APIs, SQL, and advanced machine learning.

Assessment

Each segment includes hands–on assignments that use provided code to accomplish data, text analysis, and machine learning tasks, with the flexibility to customize or experiment.

Additional info

The timetable for this course can be found in the EUR course guide.

ERIM PhD candidates and Research Master students can register for this course via Osiris Student.

External (non-ERIM) participants are welcome to this course. To register, please fill in the registration form and e-mail it to the ERIM Doctoral Office by four weeks prior to the start of the course. For external participants, the course fee is 260 euro per ECTS credit.