Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Member rate £492.50
Non-Member rate £985.00
Save £45 Loyalty discount applied automatically*
Save 5% on each additional course booked
*If you attended our Methods School in the last calendar year, you qualify for £45 off your course fee.
Monday 25 February – Friday 1 March, 09:00–12:30
15 hours over 5 days
This course provides a solid basis in Python programming and its application in particular to online social media data.
Python is one of the most popular and versatile script languages, and has a large user community. It has become increasingly popular among social scientists because of its attractive features: ease to learn, flexibility in handling a massive dataset, and fast calculation.
A large set of libraries helps users solve complex problems – making Python particularly attractive for those who need to handle massive amounts of diversely structured data collected online.
The course covers:
The course involves hands-on exercises in collecting, managing, and analysing data using Python.
Tasks for ECTS Credits
2 credits (pass/fail grade): Attend at least 90% of the course, participate fully in in-class activities, and carry out the necessary reading and/or other work prior to, and after, class.
3 credits (to be graded): As above, plus complete small daily programming tasks. These must be submitted prior to the course the following morning.
4 credits (to be graded): As above, plus conduct your own small independent project, collecting and analysing social media data, and summarising the results in a short take-home paper, due within one week of the end of the course.
Taehee Kim is a postdoctoral researcher at Carl von Ossietzky University in Oldenburg.
Her research interests include political behaviour, computational social science methods, network analysis, and Japanese politics.
Nowadays, very large and diverse kinds of data are becoming available to researchers. Online social media data in particular has great potential to provide a new approach to social science questions.
However, these kinds of data have diverse data structures, which often differ from traditional social science data. Some are provided by a structured way through application program interfaces (API): e.g. Twitter and Facebook; others could be semi-structured data such as web pages. Moreover, those data could be in different formats and the available data size much larger than before.
Although widely used software packages such as R, STATA, and Matlab are practical for statistical analysis, they are of only limited use for gathering, transforming, managing, and analysing new, massive and diversely structured types of data.
As an alternative to these packages, many scholars have begun to use Python, a popular, versatile script language with a large user community. Python has become popular because it is open source, easy to learn (even for beginners), and allows researchers to handle massive datasets quickly.
A large, rapidly expanding set of libraries helps users solve complex problems with ease. These include Tensorflow and Keras: newly developed libraries for deep learning.
Familiarity with Python language opens up new possibilities for conducting your research in a more efficient way.
The aim of this course is to introduce Python programming language, and to provide a solid basis in Python programming and its applications, in particular for text data obtained from online social media.
The course covers:
The course will include concrete examples of how to collect, manage and analyse social media data.
You will conduct practical exercises in the whole process of data collection, management, and analysis for your own research project.
Course structure
First, I introduce the basic concepts of programming and database systems. You will learn types of data, operators, conditions, loops, functions, data structure, and objected-oriented programming. You will learn the differences between relational and non-relational database systems.
Then you will learn how to implement the programming and database in Python language. I will set programming tasks for you to solve, to teach you how to program in an efficient way. We will use the relational database SQLite3, or possibly the non-relational database MongoDB, if there is sufficient demand from students. You will learn basic regular expressions in Python for handling text data.
After basic programming, I will introduce a couple of methods for obtaining social media data, such as scraping web pages, and using API – in particular Twitter’s.
During the course, you will collect data according to your research interests, and build a database with it. I will introduce useful Python libraries for data collecting, such as urllib and BeautifulSoup. I will demonstrate several analytical methods, in particular for text data, introducing Python libraries such as Numpy, Pandas and Scipy for the analysis. I will show you how to obtain basic statistics from the collected data, to carry out simple text analysis, and text classification techniques. We will also cover several machine learning algorithms.
I will ask you to submit a small programming assignment every day. The task will be directly related to the content of the corresponding day and will take one or two hours maximum.
By the end of this course, you will have acquired a solid knowledge of Python programming and basic methods for collecting, managing and analysing data obtained from online materials.
These types of data, in particular, are large and have diverse forms. The course teaches you some essential principles of programming common to many languages, so your knowledge of Python programming will make it easier for you to learn other programming languages in the future.
Required literature
Lubanovic, Bill. 2014 Introducing Python: Modern Computing in Simple Packages O’Reilly Media
Mitchell, Ryan. 2015 Web Scraping with Python: Collecting Data from the Modern Web O’Reilly Media
Some experience with basic programming/data analysis in other languages, e.g. R, STATA, or Matlab is advantageous.
Day | Topic | Details |
---|---|---|
Monday | Introduction of Python and Basic Principles of Programming | |
Tuesday | Programming in Python | |
Wednesday | Concept of Database and its application in Python SQLite and MongoDB | |
Thursday | Collecting Online Data : Utilizing APIs and Web Scraping | |
Friday | Analysing Data : Text Analysis and Machine Learning |
Day | Readings |
---|---|
Monday |
Distributed materials during the course, Lubanovic (2014) ch 1-2 |
Tuesday |
Distributed materials during the course, Lubanovic (2014) ch 3-7 |
Wednesday |
Distributed materials during the course, Lubanovic (2014) ch 8 |
Thursday |
Distributed materials during the course, Mitchell (2015) ch 1-4 |
Friday |
Distributed materials during the course |
Python 3.6 and PyCharm (IDE) prior to the course. These are free.
Python 3.6: There are several ways to install Python, here I recommend Anaconda
As for Python IDE, the course uses PyCharm. Download the free 'Community' version
Please apply for a Twitter developer account (you must already have a normal Twitter account). When you get approved, you can create Twitter apps, which you'll need to use Twitter API. I will distribute further instructions before the course.
Please bring your own laptop.
Russell, Matthew A. 2013 Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More 2nd ed. Sebastopol, CA: O’Reilly Media
Raschka, Sebastian and Vahid Mirjalili. 2017 Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2nd ed. PACKT Publishing
Guttag, John V. 2013 Introduction to Computation and Programming Using Python: Revised and Expanded Edition MIT Press
Introduction to R