ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Python Programming for Social Sciences: Collecting, Analysing and Presenting Social Media Data

Course Dates and Times

Monday 17 – Friday 21 February 2019, 14:00 – 17:30 (finishing slightly earlier on Friday)
15 hours over five days

Taehee Kim

taehee.kim@uni-oldenburg.de

Carl Von Ossietzky Universität Oldenburg

This course provides a basis in Python programming and its application in particular to online social media data.

Python is one of the most popular and versatile script languages, and has a large user community. It has become increasingly popular among social scientists because of its attractive features: ease to learn, flexibility in handling a massive dataset, and fast calculation.

A large set of libraries helps users solve complex problems – making Python particularly attractive for those who need to handle massive amounts of diversely structured data collected online.

The course covers:

  • fundamental principles of programming and their implementation in Python
  • collecting social media data (Twitter)
  • handling large amounts of data in diverse formats
  • using Python libraries – essential for data analysis

The course involves hands-on exercises in collecting, managing, and analysing data using Python.

Tasks for ECTS Credits

2 credits (pass/fail grade): Attend at least 90% of the course, participate fully in in-class activities, and carry out the necessary reading and/or other work prior to, and after, class.

3 credits (to be graded): As above, plus complete small daily programming tasks. These must be submitted prior to the course the following morning.

4 credits (to be graded): As above, plus one of the tasks below, due within a week of the end of the course:

A: conduct a small independent project, collecting and analysing social media data, and summarising the results in a short paper

B: solve a programming task.


Instructor Bio

Taehee Kim is a postdoctoral researcher at Carl von Ossietzky University in Oldenburg.

Her research interests include political behaviour, computational social science methods, network analysis, and Japanese politics.

Nowadays, very large and diverse kinds of data are becoming available to researchers. Online social media data in particular has great potential to provide a new approach to social science questions.

However, these kinds of data have diverse data structures, which often differ from traditional social science data. Some are provided by a structured way through application program interfaces (API): e.g. Twitter and Facebook; others could be semi-structured data such as web pages. Moreover, those data could be in different formats and the available data size much larger than before.

Although widely used software packages such as R, STATA, and Matlab are practical for statistical analysis, they are of only limited use for gathering, transforming, managing, and analysing new, massive and diversely structured types of data.

As an alternative to these packages, many scholars have begun to use Python, a popular, versatile script language with a large user community. Python has become popular because it is open source, easy to learn (even for beginners), and allows researchers to handle massive datasets quickly.

A large, rapidly expanding set of libraries helps users solve complex problems with ease. These include Tensorflow and Keras: newly developed libraries for deep learning.

Familiarity with Python language opens up new possibilities for conducting your research in a more efficient way.

The course covers:

  • fundamental principles of programming and their implementation in Python language
  • collecting social media data
  • handling large amounts of data in diverse formats
  • using Python libraries for data analysis.

The course will include concrete examples of how to collect, manage and analyse social media data, especially Twitter.

Course structure

First, I introduce the basic concepts of programming. You will learn types of data, operators, conditions, loops, functions, data structure, and objected-oriented programming.

Then you will learn how to implement the programming in Python language. I will set programming tasks for you to solve, to teach you how to program in an efficient way. 

After basic programming, I will introduce a couple of methods for obtaining social media data, such as scraping web pages, and using API – in particular Twitter’s. I will also introduce useful Python libraries for data collection, such as urllib and Beautiful Soup.

I will demonstrate several analytical methods for text data:

  • how to obtain basic statistics of the collected data
  • how to conduct a simple text analysis
  • text classification using supervised machine learning algorithms (Naive Bayes and Support Vector Machine).

You will learn basic regular expressions to handle text data and Python libraries for the analysis such as Numpy, Pandas, NLTK, scikit-learn etc.

I will ask you to submit a small programming assignment every day. The task will be directly related to the content of the corresponding day and should take one or two hours maximum.

Required literature

Lubanovic, Bill. 2014
Introducing Python: Modern Computing in Simple Packages
O’Reilly Media

Mitchell, Ryan. 2015
Web Scraping with Python: Collecting Data from the Modern Web
O’Reilly Media

Installation and setup
Please install and configure Python and PyCharm on your laptop before the course starts, using these step-by-step instructions.

Experience in other languages
You should have some experience with basic programming/data analysis in other languages, e.g. R, Matlab, STATA. In other words, you should be able to write basic codes in the corresponding languages: e.g. assigning a value to a variable, writing for loop, if condition. 

If you do not fulfil the above requirements
If you have problems following the installation instructions, or do not have experience in other languages, take the course WA108, Basics of Programming in Python.

Day Topic Details
Monday Introduction of Python and Basic Principles of Programming
Tuesday Programming in Python
Wednesday Collecting Online Data: Utilising APIs and Web Scraping
Thursday Analysing Data: Basic Statistics, Visualisation
Friday Analysing Data: Text Analysis and Machine Learning
Day Readings
Monday

Distributed materials during the course, Lubanovic (2014) ch 1-2

Tuesday

Distributed materials during the course, Lubanovic (2014) ch 3-7

Wednesday

Distributed materials during the course, Mitchell (2015) ch 1-4

Thursday

Distributed materials during the course

Friday

Distributed materials during the course

Software Requirements


Please prepare the following free, open-source environments on your laptop using these step-by-step instructions

Python 3: version > 3.5. 
Among several possibilities, I recommend using Anaconda to install Python.
We will use PyCharm (Community version) as a Python editor.

Please also apply for a Twitter developer account. When you get approved, you can create Twitter apps, which you need to access Twitter API. The reviewing process can take anything from a couple of days to weeks. If you do not get approved by the time course starts, I can give you an access during the course. You will be given further instructions after you register.

Hardware Requirements

Please bring your own laptop with Python and PyCharm installed, as described in the software section.

Literature

Swaroop, C. H. 2013
A Byte of Python

Raschka, Sebastian and Vahid Mirjalili. 2017
Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2nd ed
PACKT Publishing

Jürgens, Pascal and Andreas Jungherr. 2016
A Tutorial for Using Twitter Data in the Social Sciences: Data Collection, Preparation, and Analysis

Russell, Matthew A. 2013
Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More 2nd ed
Sebastopol, CA: O’Reilly Media

Gutted, John V. 2013
Introduction to Computation and Programming Using Python: Revised and Expanded Edition
The MIT Press

Recommended Courses to Cover Before this One

Summer School

Introduction to R

Winter School

Basics of Programming in Python
Introduction to R