ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Back to Panel Details
Back to Panel Details

Python Programming for Social Sciences: Collecting, Managing and Analysing Social Media Data

Taehee Kim
taehee.kim@uni-oldenburg.de

Carl Von Ossietzky Universität Oldenburg

Taehee Kim is a postdoctoral researcher at Carl von Ossietzky University in Oldenburg.

Her research interests include political behaviour, computational social science methods, network analysis, and Japanese politics.


Course Dates and Times

Monday 25 February – Friday 1 March, 09:00–12:30
15 hours over 5 days

Prerequisite Knowledge

Some experience with basic programming/data analysis in other languages, e.g. R, STATA, or Matlab is advantageous.


Short Outline

This course provides a solid basis in Python programming and its application in particular to online social media data.

Python is one of the most popular and versatile script languages, and has a large user community. It has become increasingly popular among social scientists because of its attractive features: ease to learn, flexibility in handling a massive dataset, and fast calculation.

A large set of libraries helps users solve complex problems – making Python particularly attractive for those who need to handle massive amounts of diversely structured data collected online.

The course covers:

  • fundamental principles of programming and their implementation in Python
  • collecting social media data
  • handling large amounts of data in diverse formats, in particular with databases
  • using Python libraries – essential for data analysis.

The course involves hands-on exercises in collecting, managing, and analysing data using Python.

Tasks for ECTS Credits

2 credits (pass/fail grade): Attend at least 90% of the course, participate fully in in-class activities, and carry out the necessary reading and/or other work prior to, and after, class.

3 credits (to be graded): As above, plus complete small daily programming tasks. These must be submitted prior to the course the following morning.

4 credits (to be graded): As above, plus conduct your own small independent project, collecting and analysing social media data, and summarising the results in a short take-home paper, due within one week of the end of the course.

 


Long Course Outline

Nowadays, very large and diverse kinds of data are becoming available to researchers. Online social media data in particular has great potential to provide a new approach to social science questions.

However, these kinds of data have diverse data structures, which often differ from traditional social science data. Some are provided by a structured way through application program interfaces (API): e.g. Twitter and Facebook; others could be semi-structured data such as web pages. Moreover, those data could be in different formats and the available data size much larger than before.

Although widely used software packages such as R, STATA, and Matlab are practical for statistical analysis, they are of only limited use for gathering, transforming, managing, and analysing new, massive and diversely structured types of data.

As an alternative to these packages, many scholars have begun to use Python, a popular, versatile script language with a large user community. Python has become popular because it is open source, easy to learn (even for beginners), and allows researchers to handle massive datasets quickly.

A large, rapidly expanding set of libraries helps users solve complex problems with ease. These include Tensorflow and Keras: newly developed libraries for deep learning.

Familiarity with Python language opens up new possibilities for conducting your research in a more efficient way.

The aim of this course is to introduce Python programming language, and to provide a solid basis in Python programming and its applications, in particular for text data obtained from online social media.

The course covers:

  • fundamental principles of programming and their implementation in Python language
  • collecting social media data
  • handling large amounts of data in diverse formats, including in databases
  • using Python libraries for data analysis.

The course will include concrete examples of how to collect, manage and analyse social media data.

You will conduct practical exercises in the whole process of data collection, management, and analysis for your own research project.

Course structure

First, I introduce the basic concepts of programming and database systems. You will learn types of data, operators, conditions, loops, functions, data structure, and objected-oriented programming. You will learn the differences between relational and non-relational database systems.

Then you will learn how to implement the programming and database in Python language. I will set programming tasks for you to solve, to teach you how to program in an efficient way. We will use the relational database SQLite3, or possibly the non-relational database MongoDB, if there is sufficient demand from students. You will learn basic regular expressions in Python for handling text data.

After basic programming, I will introduce a couple of methods for obtaining social media data, such as scraping web pages, and using API – in particular Twitter’s.

During the course, you will collect data according to your research interests, and build a database with it. I will introduce useful Python libraries for data collecting, such as urllib and BeautifulSoup. I will demonstrate several analytical methods, in particular for text data, introducing Python libraries such as Numpy, Pandas and Scipy for the analysis. I will show you how to obtain basic statistics from the collected data, to carry out simple text analysis, and text classification techniques. We will also cover several machine learning algorithms.

I will ask you to submit a small programming assignment every day. The task will be directly related to the content of the corresponding day and will take one or two hours maximum.

By the end of this course, you will have acquired a solid knowledge of Python programming and basic methods for collecting, managing and analysing data obtained from online materials.

These types of data, in particular, are large and have diverse forms. The course teaches you some essential principles of programming common to many languages, so your knowledge of Python programming will make it easier for you to learn other programming languages in the future.

Required literature

Lubanovic, Bill. 2014 Introducing Python: Modern Computing in Simple Packages O’Reilly Media

Mitchell, Ryan. 2015 Web Scraping with Python: Collecting Data from the Modern Web O’Reilly Media

Day Topic Details
Monday Introduction of Python and Basic Principles of Programming
Tuesday Programming in Python
Wednesday Concept of Database and its application in Python SQLite and MongoDB
Thursday Collecting Online Data : Utilizing APIs and Web Scraping
Friday Analysing Data : Text Analysis and Machine Learning
Day Readings
Monday

Distributed materials during the course, Lubanovic (2014) ch 1-2

Tuesday

Distributed materials during the course, Lubanovic (2014) ch 3-7

Wednesday

Distributed materials during the course, Lubanovic (2014) ch 8

Thursday

Distributed materials during the course, Mitchell (2015) ch 1-4

Friday

Distributed materials during the course

Software Requirements

Python 3.6 and PyCharm (IDE) prior to the course. These are free.

Python 3.6: There are several ways to install Python, here I recommend Anaconda

As for Python IDE, the course uses PyCharm. Download the free 'Community' version

Please apply for a Twitter developer account (you must already have a normal Twitter account). When you get approved, you can create Twitter apps, which you'll need to use Twitter API. I will distribute further instructions before the course.

 

Hardware Requirements

Please bring your own laptop.

Literature

Russell, Matthew A. 2013 Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More 2nd ed. Sebastopol, CA: O’Reilly Media

Raschka, Sebastian and Vahid Mirjalili. 2017 Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2nd ed. PACKT Publishing

Guttag, John V. 2013 Introduction to Computation and Programming Using Python: Revised and Expanded Edition MIT Press

Recommended Courses to Cover Before this One

<p><span style="color:#00000a">Introduction to R</span></p>


Additional Information

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed in due time.

Note from the Academic Conveners

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, contact the instructor before registering.