ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Intermediate R: Capacities for Analysis and Visualisation

Course Dates and Times

Monday 31 July - Friday 4 August

09:00-12:30

Please see Timetable for full details.

Martin Mölder

martin.molder@ut.ee

University of Tartu

This is a class for people who know the basics of R, but want to expand their knowledge about what and how it is possible to do with this remarkable software. Taking the first steps in R might be quick and sometimes enjoyably painful, but in order to unlock those capacities of R that make it the most valued programming language for statistical analysis takes a bit more help and effort. The purpose of this class is to show how some of the most common statistical techniques that could be of interest to a social scientist (like regressions with non-continuous outcomes, structural equation or multilevel models) can be implemented in R. We will not talk about the statistical properties of these methods themselves, just about how they can be done in R. This will be the core of the class. We will begin with a few more general topics, including how to work effectively with R, and the class will end with a longer look at more advanced data visualisation with R and ggplot2. 

 


Instructor Bio

Martin Mölder (PhD in comparative politics) is a researcher Johan Skytte Institute of Political Studies at the University of Tartu, Estonia.

His main research focus is political parties, their ideological and political positions, and the functioning of party systems. He also teaches, among other things, quantitative methods.

Martin has extensive background in the use of R for data management and statistical analysis in the social sciences.

He has taught the following courses at the ECPR Summer School in Methods & Techniques:

  • R Basics 2016 & 2017
  • Intermediate R: Capacities for Analysis and Visualisation 2017, 2018 & 2019
  • Advanced Topics in Applied Regression 2019

  @martinmolder

Learning R inevitably takes time and often the next best step after learning the very basics of this programming language and its most important characteristics (which was covered in the Introduction to R class) is to get a good overview of what and how it is possible to do in R in terms of various analyses. The details and intricacies with regard to each specific topic and methods are almost limitless – and in the end, this is part of what makes R so great – but for getting to know R on the intermediate level these might be of secondary importance. If you have a basic knowledge of R, then the most important next step is to acquaint yourself with the range of options that are available. You do not need to know the details – they will come later – you just need to know what is possible, what to look for and expect, and how to ask the right questions. These will make the steps that follow, the steps that you are most likely to do on your own while working on your analyses, much simpler and smoother. And this is what the purpose of this class is.

The class will cover many topics, which in their substantive breath could fill several week-long classes. But what we will focus on in the class is simply their technical implementation in R. For this class you do not need to know in any meaningful depth what e.g. multilevel models, panel data analysis or any of the other covered methods are, although it would be very good to have at least a basic idea. We will just look at their implementation in R – what packages and functions to use and what data structure is implied – with only a few very basic comments about the nature of the method and the type of data that implies the latter. This is not a class on statistical methods as such, but a class on how they can be used in R. We will cover much technical ground and very little substance.

The class will begin with a more general issue of how to work with R most effectively. We will look at some principles on how to write good, comprehensible and efficient R code and how to structure your files and folders, so that your work would be smooth. Thereafter we will look at two related topics – simulations and bootstrapping – which are applicable across specific methods as means to learn and understand them (simulations) or to get a grasp of the amount of uncertainty in your data and the results you can get from the latter (bootstrapping).

In the next two and a half days we will go over the implementations of some of the most common methods of statistical analysis in R. We will look at how to do factor analysis and structural equation models, how to work with panel data and multilevel data and many related analyses (for a more detailed list, see the day to day schedule). The structure of all of these micro topics will be the same:

  • We will devote the very minimum amount of time for refreshing the basic logic of the method – what kinds of questions and what kind of data is it meant for?
  • We will look at the main packages in R that provide the functionality to implement this method.
  • We look at how to use them – what functions they provide, what arguments the latter require and what is the structure of the output that they give us.
  • When the analysis is done, it is important to get an overview of the results. We will look at how to get a good summary of your analysis and how to create the basic visualisations to evaluate your models.

More or less each model comes with its basic options for plotting, but sometimes you want more. Therefore, towards the end of the class we will take a longer look at how to use ggplot2, an R package, which allows you to make almost any kind of data visualisation that you should desire. ggplot2 is like a language within R, and to learn it fully would take more than one week-long class. So we will only be getting our foot through the door to see what is possible with this package and how as far as the main kinds of plots one would want to make are concerned – barcharts, linecharts, scatterplots and the like – as well as how to customise them to your liking and save them in pdf or some other format of your choosing.

Creating visualisations is one way – and perhaps the best way – to get your results out of R, but sometimes you also need good looking tables of model output. In that case, an R package called stargazer can save the day. Copy-pasting numbers from R to a table that you have manually created in your document is time (sometimes countless hours) that could be well spent doing something better. Stargazer gives you the possibility to export your model results from R directly into LaTeX or into html format, and the latter can easily be transferred to Word or any other software you might be using.

When you have covered the topics of this class – when you know how to work in R more effectively, what kinds of analyses it is possible to do and how and the ways in which you can get them out of R quickly and effortlessly – then you are already well on your way to becoming a proficiency user of R.

The class assumes a solid basic knowledge of R on the level comparable to the Introduction to R class offered in the ECPR summer school and is best taken as a continuation course for the latter. Additionally, a basic knowledge of the most common statistical methods used in the social sciences is expected (what are they, what do they do and when should they be used). 

Day Topic Details
Monday Structuring your workflow in R; simulations as a tool of analysis and learning.

How to set up R and the structure of your work in R for increased efficiency; how to simulate a data generating process in R and how this can help us learn R as well as a particular method; setting up bootstrapping in R for evaluating uncertainty.

Tuesday Beyond simple OLS: R’s capacities for multilevel and panel data, survival/event history analysis, generalised linear models.

What can R give us if we want to go beyond a simple linear model; how to implement basic multilevel analyses; how to implement analyses where your variable of interest in not continuous; what are the possibilities for doing analyses when you observe the same units (e.g. people/countries) over time.

Wednesday Seeing the unseen: factor and principal components analysis, structural equation models, and multidimensional scaling in R.

How is it possible to analyse unobserved structures in your data with R; how to empirically aggregate your variables and create lower dimensional representations; how to perform basic analyses that involve unobserved variables.

Thursday Visualisation with R and ggplot2

How to use R and ggplot2 to create high quality and professional visualisations of your data and your analyses?

Friday Exporting your results with stargazer; participants’ choice

Often the last step in working with R is getting your results from R to your document. Thus, in the end, we will focus on how to do it quickly and effectively (with minimum manual effort) with a package called stargazer.

The content of the very last session we can agree on during the class so that we can cover topics that are of most interest to you.

Day Readings

This will be a hands-on class, which we will spend going over R code and examples that I have prepared specifically for this class. This will be the main material for the class and will be made available when the summer school starts. Most of the reading materials that I have specified here are reference manuals for the R packages that we will be using. They are not the most fun to read in terms of style, but as they describe all the functionality of a package, they are essential sources of information about them. When expanding you knowledge of R on your own, you will not be able to get around reading them. It will this be very important if you familiarised yourself with them, especially for the packages that we will be using. Some of them are quite long and extensive – and it is not necessary to read through all of their content. But it would be good to have a look and find the most important functions and go over their descriptions. All of these reference manuals have the same structure, so if you have looked at a few of them, you will know what to look for elsewhere. 

 

Most of the topics that we cover are also included in many of the overview books of R that are available. A comprehensive list of them is given further below.

Monday
Tuesday
Wednesday
Thursday
Friday

Software Requirements

All of the classes require the use of a computer. The participants are required to bring their own laptops to class.

R and RStudio should be installed.

 

Hardware Requirements

No special hardware requirements.

Literature

There is a wealth of materials about R in the form of reference materials, official documentation, textbooks and online blogs and forums. In most cases, if you have a problem with R, then googling the right question will easily give you the right answer (assuming that you know what the right question is). Nevertheless, here is a list of sources and materials that you can consult.

 

Burns, Patrick 2012, Tao Te Programming (More about programming in general, but is useful for R as well.)

R Home Page: https://www.r-project.org/

Quick-R: http://www.statmethods.net/

R Bloggers: http://www.r-bloggers.com/

R reference card: https://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf

Burns, Patrick 2011, The R Inferno, http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

Adler, Joseph 2012, R in A Nutshell, O’Reilly.

Fox, John and Weisberg, Sanford 2011, An R Companion to Applied Regression, Second Edition, Sage Publications.

Verzani, John 2014, Using R for Introductory Statistics, Chapman and Hall.

Crawley, Michael 2013, The R Book, Wiley.

Teetor, Paul 2011, R Cookbook, O’Reilly,

Lander, Jared 2014, R for Everyone: Advanced Analytics and Graphics, Addison-Wesley.

Cotton, Richard 2013, Learning R, O’Reilly.

Abedin, Jaynal 2014, Data Manipulation With R, Packt Publishing. 

Conway, Drew and White, John Myles 2012, Machine Learning For Hackers, O’Reilly. A good overview of using R with practical and interesting examples.

Black, Kelly 2014, R Object-oriented Programming, Packt Publishing. The book starts easy, but soon gets into more complex topics related to R and programming. A good overview of what programming in R can entail.

Grolemund, Garrett 2014, Hands-On Programming with R, O’Reilly. Introduction to R through dice, cards and slot machines. For those who like gambling. 

Data visualisation is a world of its own in terms of materials and literature and if one wants to go further into the basic principles of visualisation of some of the nuances that we cover in class, then the following would be a good start:

The Joy of Stats http://www.gapminder.org/videos/the-joy-of-stats/ Hans Rosling has some presentations that are worthwhile watching and a documentary, which emphasise and show the importance of effective data visualization.

The books by Edward Tufte are classics in data visualisation in the broadest sense of the term and form the context for any kind of display of information.

Tufte, Edward R. 1992, The Visual Display of Quantitative Information, Graphics Press.

Tufte, Edward R. 2003, Envisioning Information, Graphics Press.

Tufte, Edward R. 2006, Beautiful evidence, Graphics Press.

Cleveland, William S. 1985, The Elements of Graphing Data, Wadsworth Advanced Books and Software.

Wilkinson, Leland 2006, The Grammar of Graphics, Springer Science & Business Media.

Wickham, Hadley 2009, ggplot2. Elegant Graphics for Data Analysis, Springer.

Chang, Winston 2013, R Graphics Cookbook, O’Reilly.

Abedin, Jaynal and Mittal, Hrishi V. 2014, R Graphs Cookbook, Packt Publishing.

Lillis, Alexander David 2014, R Graph Essentials, Packt Publishing.

Unwin, Antony 2015, Graphical Data Analysis with R, Chapman and Hall.

Zeileis, A., Hornik, K., and Murrell, P. 2009, "Escaping RGBland: selecting colors for statistical graphics”, Computational Statistics & Data Analysis, 53(9), pp. 3259-3270.

 

Recommended Courses to Cover Before this One

Summer School

  • Introduction to R
  • Effective Data Management with R

Winter School

  • Introduction to R