ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

SA105 - Python Programming for Social Scientists: Web Data, Scraping and Other Useful Programming Tricks

Instructor Details

Instructor Photo

Brian Fabo

Institution:
Central European University

Instructor Bio

Brian Fabo is a Research Fellow at CEU's School of Public Policy, and a Researcher at the Centre for European Policy Studies in Brussels.

His research focuses on the use of new (web) data in social science research, and skills supply and demand in the Post-Fordist economies, particularly in the CEE region. When he has time, Brian likes to write programs for automatic data collection and processing, and engage in general data crunching.

Brian has taught Python at the Central European University, University of Amsterdam and at OECD.

  @BrianosaurRex


Course Dates and Times

Thursday 28 - Saturday 30 July

10:00-12:00 and 14:00-17:00

15 hours over 3 days

Prerequisite Knowledge

No specific knowledge is requested, but participants would benefit from prior experience with coding in some form (high school Turbo Pascal classes, use of syntax in SPSS, Stata, or R are all fine).

Short Outline

With the growth of interest phenomena such as new (big) data in social science, data visualisation, and interdisciplinary cooperation (the ”data science” framework), social scientists are increasingly expected to master at least some degree of programming. Regardless of choice of profession outside or in the academia, the ability to code is an important trait that helps graduates stand out from the crowd.

 

Nonetheless, the barriers to entry are often high for a social scientist to start coding. This course takes a pragmatic approach to lower these barriers by leading the participants along for the first steps to coding, attempting to explain the wider logic being the code, while focusing on ”real life” applications of programming. After going through this course, participants should be able to further develop their programming skills through individual effort.

Long Course Outline

The course aims to equip early career social scientists with basics of programming for the purpose of data acquisition, management, and analysis. Through acquisition of coding knowledge and basic best practices, the participants will obtain a valuable transferable skill into their portfolio. In addition to academic application of this knowledge, knowledge of coding will enable participants to succeed on the labor market, should they decide to seek opportunities beyond academia. Finally, even those of the participants, whose future work will not allow for the use of programming knowledge, will benefit from being able to make an informed opinion on important developments in the social science research such as the increased use of big data, machine learning algorithms and network science techniques.

 

The course takes a pragmatic ”learning by doing” approach, centered around developing programming solutions for common problems faced by data-focused social scientists. The aim is not, naturally, to develop  a full-fledged programming skill-set, but rather to acquire develop coding skills in a way that result in high level of synergies with quantitative research skills. Summer school participants with an advanced background in coding are advised to take into account that the main intention of the course is to help everyone get up to speed and thus much of the course time is devoted to topics they might find trivial.

 

Python has been chosen as the programming language to be used in the class, due to its ”easy to pick up, hard to master” learning curve, which makes it more convenient of a choice for a beginner than more abstract programming languages. Additionally, the well-established ecosystem of community-developed libraries makes Python programming arguably a must have took in the skill-set of data-based social scientists. We will use the Jupyter Notebook environment, which allows for combining Python coding with R syntaxes and Markdown, allowing for powerful synergies for data driven social science research.

 

The topics covered will include types of variables, functions and parameters, conditionals, ”when” and ”for” cycles, as well as advanced issues such as a lambdas, list comprehension, and slicing. Work with structured (xml, .csv, json) , semi-structured (HTML pages), and unstructured (text) types of data will be covered using native Pythonic functions as well as those offered by popular libraries, such as Beautiful Soup. The course  will also cover introduction introduced to popular statistical packages NumPy and Pandas as an alternative to dedicated statistical software.  To complete the basic package of data skills, the I/O operations will be covered along with some ways how to write to and read from web-based service Google Docs.  

 

In addition to this general knowledge, the course aims to develop an ability to use Python to  incorporate ”scrapped” data from websites or to  use APIs of various web services (such as Twitter) into the list of possible data sources to be exploited in participant’s future research projects. This entails both the technical understanding of the coding of  ”spiders”/”crawlers”, but also understanding the scope of possibilities that are open to a researcher without special ”big data” resources.


The course does not come with any prescribed  mandatory readings due to its focus on practice, however participants should devote a couple of minutes to read the blog post linked in the reading list section for the first day. Reading it should help them get a realistic idea of what they are getting into and perhaps provoke some preliminary thinking on integration of Python coding into their own research projects.

Day-to-Day Schedule

Day-to-Day Reading List

Software Requirements

Anaconda (Python 2.7) http://continuum.io/downloads, we will use the IPython Notebook

Hardware Requirements

None.

Literature

Gries, Paul, Jennifer Campbell, and Jason Montojo. 2013. Practical Programming: An Introduction to Computer Science Using Python 3. 2nd ed. Pragmatic Bookshelf.

 

McKinney, Wes. 2012. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. Beijing: O’Reilly Media.

Additional Information

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed in due time.

Note from the Academic Convenors

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, contact the instructor before registering.


Share this page