ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Back to Panel Details
Back to Panel Details

Introduction to Statistics for Political and Social Scientists Workshop

Florian Weiler
florian.weiler@rug.nl

Rijksuniversiteit Groningen

Florian Weiler is a senior researcher at the University of Basel, where he teaches statistics and content courses. He earned his doctoral degree at ETH Zurich.

Before joining the University of Basel, he worked as a lecturer in Quantitative Politics at the University of Kent, and as a postdoctoral researcher at the University of Bamberg. 

Florian's main research interests are in the fields of environmental politics and interest group research.


Course Dates and Times

Monday 25 February – Friday 1 March, 09:00–12:30 and 14:00–16:30 (finishing slightly earlier on the Friday)
25 hours in total

Prerequisite Knowledge

No statistical knowledge is required, though we will rely strongly on R during lab sessions. If you are not familiar with this software, take Thorsten Schnapp's short Introduction to R course beforehand.


Short Outline

This course introduces you to the basic ideas of descriptive and inferential statistics. The first two days cover the basics: variables, randomisation, centrality and variability, probability distributions, point and interval estimates. On the third day we will cover hypothesis testing, and in the last two days we discuss correlation, simple and multiple linear regression models. Finally, we will look at model assumptions and violations, and how regression models can be improved.

The course and the lab session will be taught in R, so you must be prepared to invest quite a bit of effort into learning the programme at the same time as the basic statistical concepts.

Tasks for ECTS Credits

2 credits (pass/fail grade) Attend at least 90% of course hours, participate fully in in-class activities, and carry out the necessary reading and/or other work prior to, and after, class.

3 credits (to be graded) As above, plus complete a final assignment (similar to the lab sessions, but covering some material from every day of the course).

4 credits (to be graded) As above, plus complete a 5000-word paper applying the techniques learned during the course to a dataset of your choice.


Long Course Outline

How can we detect voting irregularities?
What are the conditions for the onset (or cessation) of civil war?
How do democracies choose electoral systems?
In what sense (if any) does democracy (or trade) facilitate international cooperation?

Quantitative political methodology addresses these questions, and many others, by developing statistical methods that combine data analysis with political science theory.

This course is an introduction to the tools used in basic quantitative political methodology. The first half covers introductory (univariate) statistics; the second half focuses on regression models.

Our days are split into a three-hour teaching block in the morning, and a two-hour lab session in the afternoon. During the lab sessions, we will cover the same topics as in the morning lectures, but you will get hands-on experience in working with the methods learned in the morning.

Each day, we work on one problem set, using R. First, you'll get some time to solve the questions alone or in teams. Then, to keep everyone on the same page, we will go over the problems together and solve them, based on participants’ answers.

Day 1
After clarifying basic terminology, we define what variables are, and then discuss sampling techniques (simple random sample, cluster sampling, stratified sampling) and randomisation. Then we discuss descriptive statistics, for example how to use tables and graphs to summarise (and better understand) the data, and also the centre and the variability of variables. I will also present bivariate descriptive statistics (in table from and as graphs).

Day 2
We start by discussing probability distributions for discrete and continuous variables, with particular focus on normal distribution. Then we talk about sampling distributions, and how to use sample data to estimate population parameters (such as point and interval estimates). We also cover the choice of the sample size.

Day 3
We focus mostly on hypothesis testing. I describe the logic of significance tests, distinguish type I and type II errors, and teach you how to employ statistical tests to compare two groups.

Day 4
We start by covering associations between categorical variables, how to detect patterns of association, and how to measure association in contingency tables. Then I introduce simple linear regression. We cover ordinary least squares, interpreting linear models, model assumptions and violations, and graphical representation.

Day 5
We discuss multiple regression analysis, the concept of control variables, and the difference between correlation and causation. Then we will try to improve the model by revisiting the model assumptions, and establish how to detect (and correct) potential model violations. We will also cover interaction effects.

The course is taught in R, a powerful and versatile computing environment. R has the huge advantage that is is free, open software, and you can work on your own computer if you want to during the classroom sessions. Computers are of course provided in the lab sessions.

In the first lab session, I will introduce some basics of R, but if you are not yet familiar with R, please take Thorsten Schnapp's short Introduction to R course before signing up for this one.

The course is for beginners and no prior statistical (or computing) knowledge is required. However, we cover many topics in a relatively short period of time. Based on past years' experience, participants find this course very challenging. You should, therefore, be prepared to work intensively throughout each day.

Students on this course usually make great progress, and come to understand the most important principles of statistics. They learn to work with data and implement basic analyses, like running linear regression models, and acquire the knowlege to sign up for courses covering more complex statistical techniques.

By the end of the course, you should be familiar enough with R to use it independently for your own projects.

Day Topic Details
Day 1 Sampling; Descriptive Statistics

Morning (3 hours)

  • Variables
  • Randomization
  • Sampling
  • Tables and graphs to describe data
  • Centre and variability of data
  • Bivariate descriptive statistics
  • Sample statistics and population parameters

Afternoon (2 hours)

  • Lab: Data management, descriptive statistics, basic plots
Day 2 Probability Distributions; Statistical Inference

Morning (3 hours)

  • Probability distributions
  • The normal distribution
  • Point and interval estimation
  • Confidence intervals

Afternoon (2 hours)

  • Lab: Probability distributions, sampling distributions, statistical inference
Day 3 Hypothesis Testing; Comparison of Two Groups

Morning (3 hours)

  • Significance and hypotheses tests
  • Types of errors
  • Comparing proportions and means

Afternoon (2 hours)

  • Lab: Hypothesis testing, group comparison
Day 4 Association between Categorical Variables; Linear Regression

Morning (3 hours)

  • Contingency tables
  • Chi-squared test of independence
  • Detecting and measuring association
  • Correlation
  • Least squares
  • The linear regression model
  • Assumptions and violations

Afternoon (2 hours)

  • Lab: Contingency tables, simple linear regression
Day 5 Multivariate Relationships; Multiple Regression

Morning (3 hours)

  • Association and Causality
  • Control variables
  • The multiple regression model
  • Interaction effects
  • Improving the model

Afternoon (2 hours)

Lab: Multivariate regression, interaction effects

Day Readings
Day 1

Agresti & Finlay, Ch. 1, 2, 3; OpenIntro, Ch. 1

Day 2

Agresti & Finlay, Ch. 4, 5; OpenIntro, Ch. 2, 3

Day 3

Agresti & Finlay, Ch. 6, 7; OpenIntro, Ch. 4, 5, 6

Day 4

Agresti & Finlay, Ch. 8, 9; OpenIntro, Ch. 7

Day 5

Agresti & Finlay, Ch. 10, 11; OpenIntro, Ch. 8

Software Requirements

If you want to use your own computer, please download R Version 3.5.0 or higher if you don't already have it. 

I also recommend downloading RStudio.

Both programmes are installed in the computer labs in Bamberg.

Hardware Requirements

Any fairly modern computer able to run R should be good enough for this course.

If you use your own computer it will need an internet connection because we will be downloading R packages.

Literature

For the course

Agresti, Alan and Barbara Finlay (2008): Statistical Methods for the Social Sciences (4th Edition), Upper Saddle River: Prentice Hall.

Diez, David M., Christopher D. Barr, and Mine Cetinkaya-Rundel (2015). OpenIntro Statistics (FREE COPY, 3rd Edition)

Further reading

  • Dalgaard, Peter (2002) Introductory Statistics with R, New York: Springer
  • Fox, John (2008) Applied Regression Analysis and Generalized Linear Models, London: Sage
  • Fox, John and Sanford Weisberg (2012) An R Companion to Applied Regression (2nd Edition), London, Sage
  • Gujarati, Damodar and Dawn C. Porter (2009) Basic Econometrics (5th edition), New York: McGraw Hill
  • Wooldridge, Jeffrey (2013) Introductory Econometrics: A Modern Approach (5th edition), Mason: South-Western

Recommended Courses to Cover Before this One

<p><strong>Winter School</strong></p> <p>Introduction to R</p>


Additional Information

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed in due time.

Note from the Academic Conveners

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, contact the instructor before registering.