ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

SA102 - Introduction to the Use of R

Instructor Details

Instructor Photo

Martin Mölder

Institution:
University of Tartu

Instructor Bio

Martin Mölder (PhD in comparative politics) is a researcher Johan Skytte Institute of Political Studies at the University of Tartu, Estonia.

His main research focus is political parties, their ideological and political positions, and the functioning of party systems. He also teaches, among other things, quantitative methods.

Martin has extensive background in the use of R for data management and statistical analysis in the social sciences.

He has taught the following courses at the ECPR Summer School in Methods & Techniques:

  • R Basics 2016 & 2017
  • Intermediate R: Capacities for Analysis and Visualisation 2017, 2018 & 2019
  • Advanced Topics in Applied Regression 2019

  @martinmolder


Course Dates and Times

Thursday 28 - Saturday 30 July

10:00-12:00 and 14:00-17:00

15 hours over 3 days

Prerequisite Knowledge

No particular prerequisite knowledge required. The course is intended for people who have had little to no contact with R before, so the only requirement is the willingness to learn R.

Short Outline

R is the software of choice for quantitative analysis due to its flexibility and near limitless capabilities. There is virtually no analysis that cannot be done in R. This class will give participants, who have not had experience with this software before, a broad introduction into the basic topics that one would encounter while using R. We will cover everything from the basics of R as a programming language – if we know how to speak its language, it will give us exactly what we need – to data processing, simple plotting and basic analysis. We will not just be talking about R, but we will use it throughout the sessopms as this is the only way to learn. Doing so, the course aims to make the unjustifiably steep-looking learning curve of R into a quick slide down a more gentle slope, by the end of which the participants will be able to continue using and learning R on their own.

Long Course Outline

A basic knowledge in R is an asset both within and without academia, as it is a widely used and highly valued tool for data processing, analysis and visualisation. With just a little effort to learn, it can make your life as an empirical social scientist much easier. It can be used for everything from classical regressions through multilevel and structural equation modelling to network analysis and text analysis. And it has the capabilities to turn your analyses into professional looking web pages or pdf documents with fully customizable data visualisations.  

 

The power of R comes from the fact that it is not a pre-packaged piece of software, but a programming language specifically tuned for quantitative analysis. This also makes R at first bit harder to learn and to use than point-and-click software. One has to learn to speak to R in its language, which can take a little effort, but when one can, it is rewarding – it is possible to tell it to do whatever is desired and it will do it.

 

The purpose of this class is to provide the essential knowledge of R as a programming language and an overview of basic operations one might encounter while using R. We will not focus much on methods of data analysis – this will be the content of most of the other classes in this summer school. Instead, this class aims to give a solid foundation to the most important topics that come prior to implementing R as a tool for data analysis proper. This foundational knowledge would allow you to subsequently use it for whatever purpose you have in mind.

 

We will divide the three rather condensed days into 15 different topics, which cover the solid basics of using R.

 

The first day will be devoted to getting started with R and doing basic operations with data. We will cover issues related to installing R and the different user interfaces that are provided. In this course we will be using RStudio as this is one of the most user friendly environments for using R. We will look at how to install additional packages to R to expand its functionality and how to set up your work through a script file. Basic mathematical operation with R as well as its nature as an object oriented programming language will be introduced. We will look at what the different kinds of “objects” are and how one can work with them. From there, we move on to reading different kinds of data files into R and to saving them afterwards. When we have our data in, we will stop for a bit on getting the basic overview of what our data looks like and how to summarise it. We will end the first day by looking into some basic manipulations with data objects and cover the distinction between wide and long data formats – this will be very useful when you later get to plotting or certain kinds of analysis.

 

On the second day we will continue with getting to know our data in R. We will begin the day with an overview of the basic plotting functionalities of R. From there we will move on to dealing with missing data, sorting your dataset, recoding variables and selecting subsets of data. After this we are ready to have a look at some of the basic mechanisms of data manipulation – loops (which iterate an operation according to certain conditions) and conditional statements (the familiar AND, NOT, OR), which are essential in structuring the work-flow of loops. Thereafter, we will look at how to write our own functions and how these can make one’s life with R a lot easier. We will end the day with an introduction to the apply family of functions, which are some of the essential tools for working with data in R.

 

On the third day we will end our introduction to preparing our data and move on to having a first look at how to do some of the basic analyses. R provides a lot of functionality for manipulating data objects and we will first be looking at two such packages – plyr and dplyr – which will make transforming and summarising your data much faster. From there we will move to the basic functions of data analysis and have a look at how to do t-tests, ANOVA, correlations and basic regression in R. The results of all analysis in R are put into model objects and so we will devote a bit of time to familiarising ourselves with them and with how to get the information that we need out of them. We will end the day with basic topics that relate to many analyses that are done – data distributions, data simulation (i.e. generating data that matches certain characteristics) and re-sampling (generating “new” sets of data from the data that we have for the purposes of evaluating the uncertainty of our results).

 

The main purpose of this class is to convey practical skills and knowledge in R and the only way to learn this is by doing. Therefore, we will be spending most of the time in the sessions working with R on these topics – writing code and going over examples that have been prepared to illustrate the topics that we cover. You will save this code and will be able to use parts of it for whatever similar kinds of problems you will be tackling with R in the future. For each topic, I will also refer to a textbook, which will give you the necessary context and reference material. Reading the indicated chapters before will greatly facilitate what we will be doing in the classes.


At the end of this class you will be ready to start using R on your own – you will know the basics, but you will also have a better understanding of what you do not yet know – so you will be able to ask the right questions to continue learning and using R on your own.

Day-to-Day Schedule

Day-to-Day Reading List

Software Requirements

R and Rstudio.

Hardware Requirements

Participants need to bring their own laptops with software installed.

Literature

There is a wealth of materials about R in the form of reference materials, official documentation, textbooks and online blogs and forums. In most cases, if you have a problem with R, then googling the right question will easily give you the right answer (assuming that you know what the right question is). Nevertheless, here is a list of sources and materials that you can consult.

 

R Home Page: https://www.r-project.org/

Quick-R: http://www.statmethods.net/

R Bloggers: http://www.r-bloggers.com/

 

R reference card: https://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf

Burns, Patrick 2011, “The R Inferno”, http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

 

Cotton, Richard 2013, “Learning R”, O’Reilly.

Lander, Jared 2013, “R for Everyone: Advanced Analytics and Graphics”, Addison-Weslay.

Teetor, Paul 2011, “R Cookbook”, O’Reilly.

Abedin, Jaynal 2014, “Data Manipulation With R”, Packt Publishing.  

Conway, Drew and White, John Myles 2012, “Machine Learning For Hackers”, O’Reilly. A good overview of using R with practical and interesting examples.

Black, Kelly 2014, “R Object-oriented Programming”, Packt Publishing. The book starts easy, but soon gets into more complex topics related to R and programming. A good overview of what programming in R can entail.

Grolemund, Garrett 2014, “Hands-On Programming with R”, O’Reilly. Introduction to R through dice, cards and slot machines. For those who like gambling.  

Additional Information

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed in due time.

Note from the Academic Convenors

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, contact the instructor before registering.


Share this page