The course aims to equip early career social scientists with basics of programming for the purpose of data acquisition, management, and analysis. Through acquisition of coding knowledge and basic best practices, the participants will obtain a valuable transferable skill into their portfolio. In addition to academic application of this knowledge, knowledge of coding will enable participants to succeed on the labor market, should they decide to seek opportunities beyond academia. Finally, even those of the participants, whose future work will not allow for the use of programming knowledge, will benefit from being able to make an informed opinion on important developments in the social science research such as the increased use of big data, machine learning algorithms and network science techniques.
The course takes a pragmatic ”learning by doing” approach, centered around developing programming solutions for common problems faced by data-focused social scientists. The aim is not, naturally, to develop a full-fledged programming skill-set, but rather to acquire develop coding skills in a way that result in high level of synergies with quantitative research skills. Summer school participants with an advanced background in coding are advised to take into account that the main intention of the course is to help everyone get up to speed and thus much of the course time is devoted to topics they might find trivial.
Python has been chosen as the programming language to be used in the class, due to its ”easy to pick up, hard to master” learning curve, which makes it more convenient of a choice for a beginner than more abstract programming languages. Additionally, the well-established ecosystem of community-developed libraries makes Python programming arguably a must have took in the skill-set of data-based social scientists. We will use the Jupyter Notebook environment, which allows for combining Python coding with R syntaxes and Markdown, allowing for powerful synergies for data driven social science research.
The topics covered will include types of variables, functions and parameters, conditionals, ”when” and ”for” cycles, as well as advanced issues such as a lambdas, list comprehension, and slicing. Work with structured (xml, .csv, json) , semi-structured (HTML pages), and unstructured (text) types of data will be covered using native Pythonic functions as well as those offered by popular libraries, such as Beautiful Soup. The course will also cover introduction introduced to popular statistical packages NumPy and Pandas as an alternative to dedicated statistical software. To complete the basic package of data skills, the I/O operations will be covered along with some ways how to write to and read from web-based service Google Docs.
In addition to this general knowledge, the course aims to develop an ability to use Python to incorporate ”scrapped” data from websites or to use APIs of various web services (such as Twitter) into the list of possible data sources to be exploited in participant’s future research projects. This entails both the technical understanding of the coding of ”spiders”/”crawlers”, but also understanding the scope of possibilities that are open to a researcher without special ”big data” resources.
The course does not come with any prescribed mandatory readings due to its focus on practice, however participants should devote a couple of minutes to read the blog post linked in the reading list section for the first day. Reading it should help them get a realistic idea of what they are getting into and perhaps provoke some preliminary thinking on integration of Python coding into their own research projects.