A basic knowledge in R is an asset both within and without academia, as it is a widely used and highly valued tool for data processing, analysis and visualisation. With just a little effort to learn, it can make your life as an empirical social scientist much easier. It can be used for everything from classical regressions through multilevel and structural equation modelling to network analysis and text analysis. And it has the capabilities to turn your analyses into professional looking web pages or pdf documents with fully customizable data visualisations.
The power of R comes from the fact that it is not a pre-packaged piece of software, but a programming language specifically tuned for quantitative analysis. This also makes R at first bit harder to learn and to use than point-and-click software. One has to learn to speak to R in its language, which can take a little effort, but when one can, it is rewarding – it is possible to tell it to do whatever is desired and it will do it.
The purpose of this class is to provide the essential knowledge of R as a programming language and an overview of basic operations one might encounter while using R. We will not focus much on methods of data analysis – this will be the content of most of the other classes in this summer school. Instead, this class aims to give a solid foundation to the most important topics that come prior to implementing R as a tool for data analysis proper. This foundational knowledge would allow you to subsequently use it for whatever purpose you have in mind.
We will divide the three rather condensed days into 15 different topics, which cover the solid basics of using R.
The first day will be devoted to getting started with R and doing basic operations with data. We will cover issues related to installing R and the different user interfaces that are provided. In this course we will be using RStudio as this is one of the most user friendly environments for using R. We will look at how to install additional packages to R to expand its functionality and how to set up your work through a script file. Basic mathematical operation with R as well as its nature as an object oriented programming language will be introduced. We will look at what the different kinds of “objects” are and how one can work with them. From there, we move on to reading different kinds of data files into R and to saving them afterwards. When we have our data in, we will stop for a bit on getting the basic overview of what our data looks like and how to summarise it. We will end the first day by looking into some basic manipulations with data objects and cover the distinction between wide and long data formats – this will be very useful when you later get to plotting or certain kinds of analysis.
On the second day we will continue with getting to know our data in R. We will begin the day with an overview of the basic plotting functionalities of R. From there we will move on to dealing with missing data, sorting your dataset, recoding variables and selecting subsets of data. After this we are ready to have a look at some of the basic mechanisms of data manipulation – loops (which iterate an operation according to certain conditions) and conditional statements (the familiar AND, NOT, OR), which are essential in structuring the work-flow of loops. Thereafter, we will look at how to write our own functions and how these can make one’s life with R a lot easier. We will end the day with an introduction to the apply family of functions, which are some of the essential tools for working with data in R.
On the third day we will end our introduction to preparing our data and move on to having a first look at how to do some of the basic analyses. R provides a lot of functionality for manipulating data objects and we will first be looking at two such packages – plyr and dplyr – which will make transforming and summarising your data much faster. From there we will move to the basic functions of data analysis and have a look at how to do t-tests, ANOVA, correlations and basic regression in R. The results of all analysis in R are put into model objects and so we will devote a bit of time to familiarising ourselves with them and with how to get the information that we need out of them. We will end the day with basic topics that relate to many analyses that are done – data distributions, data simulation (i.e. generating data that matches certain characteristics) and re-sampling (generating “new” sets of data from the data that we have for the purposes of evaluating the uncertainty of our results).
The main purpose of this class is to convey practical skills and knowledge in R and the only way to learn this is by doing. Therefore, we will be spending most of the time in the sessions working with R on these topics – writing code and going over examples that have been prepared to illustrate the topics that we cover. You will save this code and will be able to use parts of it for whatever similar kinds of problems you will be tackling with R in the future. For each topic, I will also refer to a textbook, which will give you the necessary context and reference material. Reading the indicated chapters before will greatly facilitate what we will be doing in the classes.
At the end of this class you will be ready to start using R on your own – you will know the basics, but you will also have a better understanding of what you do not yet know – so you will be able to ask the right questions to continue learning and using R on your own.