ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

R Basics

Course Dates and Times

Thursday 26 July

13:30-15:00 / 15:30-17:00

Friday 27 July and Saturday 28 July

09:00-10:30 / 11:00-12:30 and 13:30-15:00 / 15:30-17:00

Akos Mate

aakos.mate@gmail.com

Centre for Social Sciences

R is a programming language that is extremely versatile and rapidly becoming the top choice for data analysis tasks both in academia and in the industry. This short course is focused on giving participants practical knowledge, by showing how R can be a powerful tool in every step of the data analysis process. We will cover importing data (in various formats), cleaning and manipulating data, visualizations and finally statistical analysis.

Since the popularity of R is due to its ever-expanding package ecosystem we will place a special emphasis on how to get information on packages, how to get relevant R help. In addition, we will cover how to create reports, export results and generally how to do reproducible research in R. We will use RStudio to carry out these tasks.

While R is a language developed for statistical analysis, we will not cover the statistical concepts in depth beyond to how to implement them in R.


Instructor Bio

Akos Mate is a research fellow at the Centre for Social Sciences in Hungary. His key research area is the political economy of the European Union and its members’ fiscal governance.

He uses a wide variety of methods in his research, particularly automated text analysis (and attached various machine learning approaches), network analysis and more traditional econometric techniques.

@aakos_m

It is not a stretch to say that R has become one of the main data analysis tools used both in and outside of academia. R is  open source programming language, developed for statistical computing that developed an extremely active user base with an expanding universe of packages.

The guiding logic of the course is to give practical knowledge for the whole data analysis workflow:

  1. Importing data
  2. Data manipulation/cleaning
  3. Data visualisation
  4. Analysis
  5. Reporting the results

It might be strange to switch from SPSS or Stata to R, but the benefits outweigh the efforts of climbing the learning curve. The base R allows us to read different data files into R, manipulate them, create various visualizations and run statistical analysis of any sorts (from basic descriptives to time series analysis, or multilevel regressions). The real value in learning R is that it integrates the research workflow into one environment. It can also be adapted to a broad range of research, from party politics data to ecological modelling.

Day 1 will start with a general introduction to R and RStudio. We will learn how to start coding and how to set up RStudio to make our workflow as seamless as possible. RStudio is an Integrated Development Environment (IDE) that puts together the R console, a text editor where we write the code and an object viewer where we can view the data objects that we created.

The general introduction to R will cover how to use R for basic mathematical calculations, how to create different objects. This part is key as we will cover the base R syntax, how to create/access/remove objects, how to merge vectors into data frames. These are essential operations for the following sessions. The first day we will have a look at how to load data into R from various sources that are commonly encountered, such as .txt, .csv, Excel sheets, Stata, SPSS and SAS save files. After getting data into R, we will perform some basic operations to have a sneak-peek at the data. This includes the usual descriptive statistics and creating histograms and scatterplots.

Day 2 will be dedicated to data manipulation and data cleaning. This is an essential part (which usually takes up the majority of the time) of every analysis. The materials will cover how to set up data in R, what is the difference between the wide and long data format and the more recent push for 'tidy data' in the R community. Since R is a programming language we can exploit it to the fullest extent by using loops for menial, time consuming tasks. During the second day the course will introduce writing loops and functions in R and the 'apply' function family.

Similarly to Day 1, all the activities are accompanied with some degree of data visualisation, since it is often better to show a figure than a disorienting half page table. At this point we will have enough results to think about getting them out of R. The course uses RMarkdown to show how to create pdf or html output of our work. In addition, there are several packages that are developed for getting results out of the R console.

Day 3 is focused on three things: analyzing the data, creating informative plots and some brief demostrations of areas that might be interesting to the participants. The data analysis part will include difference in means tests (t-test), ANOVA and OLS regression. Where applicable we will also support the analysis with plotting the results. As this is not a statistics course, we will not go into details about what these tests and analyses do. We will see how to extract key information that is relevant to interpreting the results.

Finally if time permits there will be some brief introduction to other areas of use: web scraping, text analysis, dealing with network data. If there are any particular interests during the course I try to cover those at the last day as well.

No prior experience with R (or any other programming language) is required. The goal of the course is to introduce R in an accessible fashion, with a heavy emphasis on practical, applicable knowledge.

Day Topic Details
1 Getting to know R and the basics

Setting up RStudio, the basic R syntax, loading various data into R and having a quick look at the data. Some first steps on visualising data in R.

2 Data manipulation and visualisation, getting the results out of R

Data manipulation with base R and with the ’tidyverse’ package family. How to subset, merge, clear data, how to deal with missing data. Introduction to loops and writing functions in R to make life easier. Using Rmarkdown to generate a report of our analysis.

3 Analysing data in R (t-test, ANOVA, regressions), more visualisation and quick sneak-peek of web scraping, text analysis and network data

Analysing the data with different approaches. Examining the resulting R objects and how to access relevant parts of them.

Day Readings
Throughout course

The basic reference book that we will be using in this class is:

 

Adler, Joseph 2010, “R In A Nutshell”, O’Reilly.

 

The chapters refer to the chapters in this book.

Saturday

Chapter 16 “Analysing Data”, Chapter 17 “Probability Distributions”, Chapter 18 “Statistical Tests”, Chapter 20 “Regression Models” (up to p. 386)

Friday

Chapter 9 “Functions”, Chapter 13 “Preparing Data”, Chapter 14 “Graphics”

Thursday

Chapter 3 “A Short R Tutorial”, Chapter 4 “R Packages”, Chapter 5 “Overview Of The R Language”, Chapter 6 “R Syntax”, Chapter 7 “R Objects”, Chapter 12 “Saving, Loading And Editing Data” (up to p. 161)

Note

Main books to be used:

  • Adler, Joseph 2012 (2nd ed): R in a nutshell, O’Reilly (1st edition is fine too)
  • Wickham, Hadley and Grolemung, Garrett 2017: R for data science, O’Reilly (available online: http://r4ds.had.co.nz/)
  • Healey, Kieran forthcoming: Data visualization – A practical introduction (draft version), Princeton University Press (available online: http://socviz.co/)
1

Adler: Chapter 5 ’Overview of the R language’, Chapter 6 ‘R syntax’, Chapter 7 ‘R Objects’ // Wickham: Chapter 3 ‘Data visualization’

2

Adler Chapter 9 ‘Functions’, Chapter 12 ‘Preparing data’ // Wickham: Chapter 12 ‘Tidy data’, Chapter 27 ‘R Markdown’ // Healey: Chapter 4 ‘Show the right numbers’

3

Adler: Chapter 16 ‘Analyzing data’, Chapter 17 ‘Probability distributions’, Chapter 18 ‘Statistical tests’, Chapter 20 ‘Regression Models’ (up to page 412) // Healey: Chapter 6 ‘Work with models’

Software Requirements

Both R and RStudio are free to use. Please make sure that both R (https://cloud.r-project.org/) and RStudio (https://www.rstudio.com/products/rstudio/download/#download) is installed and working on your laptop. They work on Windows, macOS and Linux as well.

R version 3.43 (or newer)

RStudio version 1.1.383 (or newer)

Hardware Requirements

Participants need to bring their own laptops with software installed.

Literature

R is one of the fastest growing languages for statistical analysis and it has a great online community that generates huge amount of content. Below are some useful online and print resources to continue with learning R.

Useful online resources:

https://www.statmethods.net/ - R tutorials of all sorts

https://rweekly.org/ - R related blog post and news aggregator site

https://www.r-bloggers.com/ - R blog aggregator site

The #rstats hashtag on Twitter (package developers, R industry veterans post regularly, and Twitter being a social network, it is easy to engage with others)

https://stackoverflow.com/ - Forum for problem solving with R (Googling your error message will usually land you here)

https://www.rstudio.com/resources/ and https://www.rstudio.com/resources/cheatsheets/ - Webinars and cheat sheets curated by RStudio devs.

Print (in addition to the books used during the course):

Grolemund, Garrett 2014, “Hands-On Programming with R”, O’Reilly

Wickham, Hadley 2014, ”Advanced R”, Chapman and Hall/CRC (online companion: http://adv-r.had.co.nz/)

Matloff, Norman 2011, ”The Art of R Programming: A Tour of Statistical Software Design”, No Starch Press

Teetor, Paul 2011, “R Cookbook”, O’Reilly