ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Advanced Social Network Analysis and Visualization with R

Course Dates and Times

Monday 6 August - Friday 10 August

09:00-10:30 / 11:00-12:30

 

Balázs Vedres

vedresb@ceu.hu

Central European University

The aim of this course is to give scalable computational tools to researchers interested in investigating social network research questions.  The course will introduce R (via R notebooks) as a key tool to analyze networks: to import and manage network data, to describe a network, to visualize networks from a dozen to tens of thousands of nodes, and to use statistical tests for network hypotheses at node or dyadic levels. We will discuss centralities, community detection, blockmodeling, brokerage, rewiring-based null models, and multiplex networks.  Datasets from diverse research projects will be provided: collaboration and communication networks, world trade, animal social networks.

Tasks for ECTS Credits

  • Participants attending the course: 2 credits (pass/fail grade) The workload for the calculation of ECTS credits is based on the assumption that students attend classes and carry out the necessary reading and/or other work prior to, and after, classes.
  • Participants attending the course and completing one task (see below): 3 credits (to be graded)
  • Participants attending the course, and completing two tasks (see below): 4 credits (to be graded)
  • For one additional credit: Take-home paper (2-3000 words), testing a network hypothesis with data, with a visualization of the network, and a statistical test of the hypothesis. Datasets and possible research questions will be made available in the course.
  • For two additional credits: Take-home paper as described above, plus completion of two assignments, handing in outputs (one page each).

Instructor Bio

Balázs Vedres' research furthers the agenda of understanding historical dynamics in network systems, combining insights from network science, historical sociology, and studies of complex systems in physics and biology.

His contribution is to combine historical sensitivities to patterns of processes in time with a network analytic sensitivity to patterns of connectedness cross-sectionally. A key element of this work was the adoption of optimal matching sequence analysis to historical network data.

Balázs' research has been published in top journals of sociology, with two recent articles in the American Journal of Sociology exploring the notion of structural folds: creative tensions in intersecting yet cognitively diverse cohesive communities. His recent research follows video game developers and jazz musicians as they weave collaborative networks through their projects and recording sessions.

He is the recipient of several awards and prizes, and is the founder and director of CEU's Center for Network Science.

  @balazsvedres

Social network analysis benefits greatly from computational tools, and the R statistical language is a suitable environment to describe, visualize, and analyze network datasets of various sizes.  Social network analysis and network science is an area where intensive learning across disciplines is happening.  This course make new ideas from many disciplines – statistical physics, ecology, computer science – available to the social scientist, through libraries in R, and via the interactive possibilities of R notebooks available in R studio. We will use libraries such as ‘network’, ‘sna’, and ‘igraph’.  

The course is a good choice for those not taking the first week course (Social Networks: Theoretically Informed Analysis with UCINET), as we will re-visit the main concepts covered there briefly. In this case an awareness of the basic concepts of social network analysis is a plus.

This course is also a good choice to those who took the social network course in the first week using UCINET.  The course adds scalability of the R framework not present in UCINET (to analyze larger networks, many networks, or fork the analytic strategy to various alternative methods and indices).  The course also adds multiple approaches to hypothesis testing and comparisons to random baseline null hypotheses, sophisticated visualizations, and a broader range of algorithm choice.  The course also adds outlook to methods from network science, not present in UCINET (such as the replication of the Watts-Strogatz small world rewiring experiment) .

The first session will introduce R and programming basics, data import, and network descriptives.  We will discuss object types and programming  constructs relevant to network analysis. Various ways to import data to R will be explored. We will also learn about the generation of random graphs with diverse methods.  We will learn about various data structures – sociomatrices, edgelists, nodelists – and ways to transform data among these formats. We will import data from varous sources – Excel sheets, CSV and other delimited text formats, UCINET and other special data formats.

The second session is about what makes social networks distinctive: a strong tendency to closure and positive degree correlation (assortativity).  We will compare human social networks to animal social networks (and to decidedly not social network systems) to test the idea that animal social networks are indeed social.  We will discuss various centrality measures, and also centralization at the graph level. We will compare centrality measures, and discuss the theoretical imagery behind these measures, with detailed explanation of the formulas for the various indices.  

In the third session we will introduce measures for brokerage (structural autonomy), and also various brokerage roles at the boundaries of groups.  We will use rewirings to generate the random baseline expectation for the relative frequency of brokerage roles, and diagnose an organizational network of a thousand workers to detect bottlenecks in information flow in between divisions.

The fourth session looks at blocklmodeling and cohesive community detection, with an emphasis on goodness of fit measures, and comparisons with random baselines.  We will compare several methods of community detection, and discuss underlying assumptions about the nature of network data and processes of tie formation.

The last session is dedicated to advanced visualizations of network data. We discuss layout algorithms, and parameter choices for these – both for small and large graphs.  We also introduce  node density heat maps for extremely large graphs. We learn to convey information on graphs using node size and color, edge size and color. We discuss graph creation and export for various purposes: research article, PowerPoint, poster, interactive online presentation.

Basic knowledge of statistics and probability (probability density functions, tests of statistical significance, ordinary least squares estimator). Basic knowledge of data entry and manipulation (for example using Excel sheets, CSV, TXT data formats). 

A basic knowledge of R (objects, vectors, matrices) and awareness of the basic concepts in programming (for and while loops, if-else conditional statements) are a plus, but not required. These basics will be covered briefly, to the extent that enables independent learning.

Day Topic Details
Monday R basics, network data import and descriptives

R intro, data formats, basic network descriptives, basic graph drawing, random graphs.

Tuesday Strong and weak ties, assortativity, centralities and centralizations

What makes social networks distinctive? Testing Granovetter’s strong ties hypothesis and Newman’s positive degree correlation hypothesis with human, animal social networks, and not social networks. Centrality measures and graph centralizations.

Wednesday Brokerage and rewiring -based statistical testing

Brokerage measures and brokerage roles; statistical testing for network data using random rewirings as null model.

Thursday Community detection and blockmodeling

Methods to identify communities, equivalent blocks; methods to test the goodness of fit of cohesive or equivalent blockings. Comparison of disjunct and overlapping community partitions.

Friday Network visualization in-depth

Graph layout algorithms and their parameters, scalability by graph size, information conveyed by node and edge attributes. Visualizing for article, presentation, poster, or online. Density heatmaps.

Day Readings
Monday

César A. Hidalgo (2016): Disconnected, fragmented, or united? a trans-disciplinary review of network science. Applied Network Science (2016) 1:6

Tuesday

Mark S. Granovetter (1973): The Strength of Weak Ties. American Journal of Sociology, 78(6)

M. E. J. Newman and Juyong Park (2003): Why social networks are different from other types of networks. (https://arxiv.org/abs/cond-mat/0305612v1)

Wednesday

Burt, Ronald S. 1995. Structural Holes: The Social Structure of Competition. Harvard University Press. Chapter 1

Roger V. Gould and Roberto M. Fernandez (1989): Structures of Mediation: A Formal Approach to Brokerage in Transaction Networks. Sociological Methodology (19)

Thursday

James Moody and Douglas R. White (2003): Structural Cohesion and Embeddedness: A Hierarchical Concept of Social Groups. American Sociological Review 68(1)

Friday

No readings

Software Requirements

We will use R studio.  https://www.rstudio.com/products/rstudio/download/ R studio, which includes R. Use the newest version.

Hardware Requirements

Participants to bring their own laptop. Windows, Mac, Linux. 

Info on requirements from the support pages:

https://support.rstudio.com/hc/en-us/articles/201853926-What-are-your-system-recommendations-for-the-RStudio-IDE-

“RStudio itself doesn't require a lot of computational power, so your requirements are going to be dependent on how you're using R. The number of cores, speed of the cores and the amount of RAM that you need is highly dependent on the work/analysis you will be doing. R itself is single threaded, and as such, you won't benefit from additional cores unless you are familiar with the various libraries that parallelize work and are then able to leverage multiple cores. If you are new to R and data analysis, it is unlikely that you would use more than 1 of your cores and more than 1 GB of RAM for most of your analyses.  However, if you intend to be analyzing larger data sets (>1GB) then it would be wise to invest in more RAM.  Generally speaking, most people don't leverage the parallelization in R, and so you are better off with fewer cores that are faster than more cores that are slower.”

Literature

A useful reading is the handbook: Wasserman, Stanley, and Katherine Faust (1994): Social Network Analysis. Cambridge University Press: Cambridge.

Also, you can read more about R: Torgo, Luis (2011): Data Mining with R. Chapman & Hall.

Recommended Courses to Cover Before this One

Summer School

Social Networks: Theoretically Informed Analysis with UCINET