Imputation of User Generated Data

Political Methodology

Voting

Internet

Public Opinion

Presenter(s)

Garret Binding

University of Zurich

Author(s)

Garret Binding

University of Zurich

Thomas Willi

University of Zurich

Panel Voting Advice Applications and Social Computing: VAAs as Driving Force and Mirror of Social Change

Date, Time and Location

Friday 14:00 - 15:40 CEST (08/09/2017) Building: BL11 Harriet Holters hus, Floor: 3, Room: HH 301

Abstract

The digitalisation of society has created new avenues for research in political sciene. One of these avenues is the availability of ever larger datasets containing political data on individuals. Voting Advice Applications are one example where user generated datasets are created. Their sample sizes are much larger than those of conventional and comparable representative surveys and may cover a broader spectrum of the population. These attributes are very attractive to social scientists. However, questions of representativeness loom. These datasets are user generated and access is limited to those with adequate technical means. An assessment of non-representativeness is often hindered by the non-availability of data within the large dataests: users choose to not answer many questions, leading to a large share of missing data. We address this problem within the missing data framework proposed by Little and Rubin (2002). We proceed by comparing the results from established imputation techniques with those from machine learning techniques in a simulation study. We believe that establishing the validity of machine learning techniques is important for future research as the number of large datasets available to social scientists increases. Finally, we apply both approaches of imputation to VAA data collected prior to the 2013 German election. Our contribution is twofold. First, we evaluate whether machine learning techniques can be used for missing data imputation in large user generated datasets. Second, by imputing missing data we can reassess the problem of non-representativeness in VAA data.

Install the app

Imputation of User Generated Data

Abstract