One core topic in quantitative social science is the measurement of the polarization or segregation of groups. The vast quantity of digital data such as Internet browsing histories, item-level purchase data, or text, allows us to get a picture of interests, opinions, and related behavior. The challenge is that parsing this high-dimensional data requires methods different from the standard, existing practices of measurement.

In this talk, Matthew Taddy discussed how ideas from machine learning can be used to build a new set of metrics for measuring segregation in high dimensions. His talk focused on how these methods were applied to measure the partisanship of speech in the United States Congress from 1872 to the present, and compared the results with the conclusions drawn from more simplistic, bias-prone measures.