Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Monday 29 July – Friday 2 August
14:00–15:30 and 16:00–17:30 (ending slightly earlier on Friday)
This is a class for people who know the basics of R but want to expand their knowledge about what and how it is possible to do with this versatile software.
Taking the first steps in R might be quick and sometimes enjoyably painful, but in order to unlock those capacities of R that make it the most valued programming language for statistical analysis takes a bit more help and effort.
The purpose of this class is to show how some of the most common statistical techniques that could be of interest to a social scientist (like regressions with non-continuous outcomes, structural equation or multilevel models) can be implemented in R. We will not talk about the statistical properties of these methods, just about how they can be done in R. This is the core of the class. We begin with a few more general topics, including how to work effectively with R, and the class ends with a longer look at more advanced data visualisation with R and ggplot2.
2 credits Take an active role in class and go through the required materials for each day.
3 credits As above, plus complete and submit short daily exercises which reflect essential practical knowledge for the topic.
4 credits As above, plus submit one complete data analysis task written in R. This should include data import, cleaning and management, the main analytical tasks, including data visualisation, and the export of your results out of R. Ideally this is something that you are or have been working on, but not wholly or partially within R. The deadline for submitting this task will be several weeks after the end of the class.
Martin Mölder (PhD in comparative politics) is a researcher Johan Skytte Institute of Political Studies at the University of Tartu, Estonia.
His main research focus is political parties, their ideological and political positions, and the functioning of party systems. He also teaches, among other things, quantitative methods.
Martin has extensive background in the use of R for data management and statistical analysis in the social sciences.
He has taught the following courses at the ECPR Summer School in Methods & Techniques:
Important! The content below might describe more than we can cover in a week. Bear in mind that this can, to some extent, be changed according to your interests. When you have registered, I will contact you to ask about topics in which you are most and least interested and, I hope, we'll be able to shape part of the class according to your needs.
Learning R takes time, and often the next best step after learning the very basics of this programming language and its most important characteristics (covered in R Basics) is to get a good overview of what analyses it is possible to do in R, and how to do them.
The details of each topic and method are almost limitless – and in the end, this is part of what makes R so great – but for getting to know R at an intermediate level, these might be of secondary importance. You do not need to know the details – they will come later – you just need to know what is possible, what to look for and expect, and how to ask the right questions. These will make the steps that follow, the steps that you are most likely to do on your own while working on your analyses, much simpler and smoother. And this is the purpose of this course.
The course covers many topics, which in their substantive breadth could fill several week-long courses. But what we focus on in class is simply their technical implementation in R. For this course you do not need to know in any meaningful depth what, for example, multilevel models, panel data analysis or any of the other covered methods are, although it would be good to have at least a basic idea. We will simply look at their implementation in R – what packages and functions to use and what data structure is implied – with only a few very basic comments about the nature of the method and the type of data. This is not a course in statistical methods as such, but in how they can be used in R. We cover much technical ground and very little substance.
The course begins with the general issue of how to work most effectively with R. We look at some principles of how to write good, comprehensible and efficient R code and how to structure your files and folders, so that your work is smooth. Thereafter we look at two related topics – simulations and bootstrapping – which are applicable across specific methods as means to learn and understand them (simulations) or to get a grasp of the amount of uncertainty in your data and the results you can get from the latter (bootstrapping).
In the next two and a half days we will go over the implementations of some of the most common methods of statistical analysis in R. We will look at how to do factor analysis and structural equation models, how to work with panel data and multilevel data and many related analyses (for a more detailed list, see the day-to-day schedule). The structure of all these micro-topics will be the same:
More or less each model comes with its basic options for plotting, but sometimes you want more. Therefore, towards the end of the course, we take a longer look at how to use ggplot2, an R package which allows you to make almost any kind of data visualisation you desire.
ggplot2 is like a language within R, and to learn it fully would take more than a week. So we will only dip a toe in the water to see what’s possible with this package and how to make the basic kind of plots – barcharts, linecharts, scatterplots and the like – as well as how to customise them to your liking and save them in pdf or some other format of your choice.
Often, the end user of your data or your analyses needs to perform basic data operations, analyses and visualisations on their own. For this purpose, I will quickly introduce Shiny, an R package that allows you to make, with relatively little effort, interactive web pages where users can have a look at the underlying data on their own. There are many other ways in R to bring your results to the web, but Shiny is perhaps the most common, and a good place to start.
Creating visualisations is one way – and perhaps the best way – to get your results out of R, but sometimes you also need good-looking tables of model output. In that case, an R package called stargazer can save the day. Copy-pasting numbers from R to a table that you have manually created in your document is time (sometimes countless hours) that could be well spent doing something better. Stargazer lets you export your model results from R directly into LaTeX or html format, and the latter can easily be transferred to Word or any other software you might be using.
Sometimes it makes sense to integrate your code and its output, including data visualisations, with text and other media (like images and other online sources of information). R offers R markdown and R notebooks, which let you create html, pdf or word documents that include your R code, as well as any text you want to add, all within R.
When you have covered the topics of this class – when you know how to work in R more effectively, what kinds of analyses it is possible to do and the ways in which you can get them out of R quickly and effortlessly – you will be well on your way to becoming a proficient user of R.
I will assume you have solid basic knowledge of R on a level comparable to Akos Mate's R Basics course, for which this course is best taken as a continuation.
I also expect a very basic knowledge of the most common statistical methods used in the social sciences: what are they, what do they do, and when they should be used.
Each course includes pre-course assignments, including readings and pre-recorded videos, as well as daily live lectures totalling at least two hours. The instructor will conduct live Q&A sessions and offer designated office hours for one-to-one consultations.
Please check your course format before registering.
Live classes will be held daily for two hours on a video meeting platform, allowing you to interact with both the instructor and other participants in real-time. To avoid online fatigue, the course employs a pedagogy that includes small-group work, short and focused tasks, as well as troubleshooting exercises that utilise a variety of online applications to facilitate collaboration and engagement with the course content.
In-person courses will consist of daily three-hour classroom sessions, featuring a range of interactive in-class activities including short lectures, peer feedback, group exercises, and presentations.
This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc.). Registered participants will be informed at the time of change.
By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, please contact us before registering.
|1||Structuring your workflow in R; simulations as a tool of analysis and learning.||
How to set up R and the structure of your work in R for increased efficiency; how to simulate a data generating process in R and how this can help us learn R as well as a particular method; setting up bootstrapping in R for evaluating uncertainty.
|2||Beyond simple OLS: R’s capacities for multilevel and panel data, survival/event history analysis, generalised linear models.||
What can R give us if we want to go beyond a simple linear model; how to implement basic multilevel analyses; how to implement analyses where your variable of interest in not continuous; what are the possibilities for doing analyses when you observe the same units (e.g. people/countries) over time.
|3||Seeing the unseen: factor and principal components analysis, structural equation models, and multidimensional scaling in R.||
How is it possible to analyse unobserved structures in your data with R; how to empirically aggregate your variables and create lower dimensional representations; how to perform basic analyses that involve unobserved variables.
|4||Visualisation with R and ggplot2; Shiny and web applications.||
How to use R and ggplot2 to create high quality and professional visualisations of your data and your analyses? How do Shiny web apps works and what is it possible to do with them.
|5||Exporting your results with stargazer; R markdown; R notebooks; participants’ choice.||
Often the last step in working with R is getting your results from R to your document. Thus, in the end, we will focus on how to do it quickly and effectively (with minimum manual effort) with a package called stargazer.
The content of the very last session we can agree on in the first session, so we cover topics of most interest to you.
This is a hands-on course, which we will spend going over R code and examples I have prepared specifically for this class. This will be the main material, made available when the course starts.
Most reading materials specified here are reference manuals for the R packages we will be using. They are not the most fun to read in terms of style, but since they describe all the functionality of a package, they are essential sources of information about them.
When expanding your knowledge of R on your own, it's very important to familiarise yourself with these reference texts, especially for the packages we will be using. Some of them are quite extensive, and you don't have to read all their content. But it would be good to have a look, find the most important functions, and go over their descriptions.
All these reference manuals have the same structure, so if you have looked at a few of them, you will know what to look for elsewhere.
Most of the topics we cover are also included in many R overview books, a comprehensive list of which I give below.
glm (stats package)
princomp (stats package)
factanal (stats package)
cmdscale (stats package)
R and RStudio.
Please bring your own laptop.
There is a wealth of materials about R in the form of reference materials, official documentation, textbooks and online blogs and forums. In most cases, if you have a problem with R, then googling the right question will easily give you the right answer (assuming that you know what the right question is). Nevertheless, here is a list of sources and materials to consult.
Burns, Patrick 2012, Tao Te Programming
More about programming in general, but is useful for R as well.
Burns, Patrick 2011 The R Inferno
Adler, Joseph 2012, R in A Nutshell (O’Reilly)
Fox, John and Weisberg, Sanford 2011, An R Companion to Applied Regression, Second Edition (Sage Publications)
Verzani, John 2014, Using R for Introductory Statistics (Chapman and Hall)
Crawley, Michael 2013, The R Book (Wiley)
Teetor, Paul 2011, R Cookbook (O’Reilly)
Lander, Jared 2014, R for Everyone: Advanced Analytics and Graphics (Addison-Wesley)
Cotton, Richard 2013, Learning R (O’Reilly)
Abedin, Jaynal 2014, Data Manipulation With R (Packt Publishing)
Conway, Drew and White, John Myles 2012, Machine Learning For Hackers (O’Reilly)
A good overview of using R with practical and interesting examples.
Black, Kelly 2014, R Object-oriented Programming (Packt Publishing)
The book starts easy, but soon gets into more complex topics related to R and programming. A good overview of what programming in R can entail.
Grolemund, Garrett 2014, Hands-On Programming with R (O’Reilly)
Introduction to R through dice, cards and slot machines. For those who like gambling.
Data visualisation is a world of its own in terms of materials and literature, and if you want to go further into the basic principles of visualisation of some of the nuances that we cover in class, the following would be a good start:
The Joy of Stats It's worthwhile watching some of these Hans Rosling presentations and documentary, which emphasise the importance of effective data visualisation.
Edward Tufte's books are classics on data visualisation in the broadest sense, and form the context for any kind of information display.
Tufte, Edward R. 1992, The Visual Display of Quantitative Information (Graphics Press)
Tufte, Edward R. 2003, Envisioning Information (Graphics Press)
Tufte, Edward R. 2006, Beautiful evidence (Graphics Press)
Cleveland, William S. 1985, The Elements of Graphing Data (Wadsworth Advanced Books and Software)
Wilkinson, Leland 2006, The Grammar of Graphics (Springer Science & Business Media)
Wickham, Hadley 2009, ggplot2: Elegant Graphics for Data Analysis (Springer)
Chang, Winston 2013, R Graphics Cookbook (O’Reilly)
Abedin, Jaynal and Mittal, Hrishi V. 2014, R Graphs Cookbook (Packt Publishing)
Lillis, Alexander David 2014, R Graph Essentials (Packt Publishing)
Unwin, Antony 2015, Graphical Data Analysis with R (Chapman and Hall)
Zeileis, A., Hornik, K., and Murrell, P. 2009, 'Escaping RGBland: selecting colors for statistical graphics', Computational Statistics & Data Analysis, 53(9), pp. 3259–3270.
<p style="text-align:left"><strong>Summer School</strong></p> <p style="text-align:left">Introduction to R</p> <p style="text-align:left">Effective Data Management with R</p> <p style="text-align:left"><strong>Winter School</strong></p> <p style="text-align:left">Introduction to R</p>