Skip to content Skip to navigation

Natural Language Processing & Text-Based Machine Learning in the Social Sciences (SOC 281, SYMSYS 195T)


Digital communications (including social media) are the largest data sets of our time, and most of it is text. Social scientists need to be able to digest small and big data sets alike, process it and extract psychological insight. This applied and project-focused course introduces students to a Python codebase developed to facilitate text analysis in the social sciences (see -- knowledge of Python is helpful but not required). The goal is to practice these methods in guided tutorials and project-based work so that the students can apply them to their own research contexts and be prepared to write up the results for publication. The course will provide best practices, as well as access to and familiarity with a Linux-based server environment to process text, including the extraction of words and phrases, topics and psychological dictionaries. We will also practice the use of machine learning based on text data for psychological assessment, and the further statistical analysis of language variables in R. Familiarity with Python is helpful but not required. Basic familiarity with R is expected. The ability to wrangle data into a spreadsheet-like format is expected. A basic introduction to SQL will be given in the course. Familiarity with SSH and basic Linux is helpful but not required. Understanding of regression is expected.

Course ID: 
222 637
Letter or Credit/No Credit