Social Network Analysis - AA - Article - Session 12

How To Get Started with Social Network Analysis
A Complete Beginner’s Guide to Getting Up and Running Making Beautiful Network

Graphs
Preface
This guide is intended to get complete beginners in social network analysis up to
speed on terminology and concepts to create and analyze their first network graph.
This is a long article. Feel free to bookmark this article and come back to it from time
to time as you are first learning social network analysis. An outline of this article is
also provided below:
Part 1: Background
1. Why Should I Care About Social Network Analysis?
2. What Does a Social Network Graph Look Like?
3. What Tools Do I Need To Get Started?

Part 2: Terms and Concepts
1. Nodes and Edges
2. Edge Direction
3. Edge Weight
4. Centrality Measures
5. Network-Level Measures
6. Path-Level Measures
7. Connected Components and Bridges
8. Hubs and Authorities
9. Dyads and Cliques
Part 3: Application
1. Implementation
2. Download and Install Gephi
3. Dataset
4. Loading Network Data into Gephi
5. Using Layout Functions
6. Calculating Network-Level Measures
7. Calculating Centrality Measures
8. Calculating Edge Weights
9. Using Color in Network Graphs

10. Using Size in Network Graphs
11. Changing Background Color
12. Recap
Why Should I Care About Social Network Analysis?
Social network analysis (SNA), also known as network science, is a field of data
analytics that uses networks and graph theory to understand social structures. SNA
techniques can also be applied to networks outside of the societal realm.
In order to build SNA graphs, two key components are required: actors and
relationships. A common application of SNA techniques is with the internet. Web
pages on the internet often link to other webpages — either on their own website or
another website. These links can be considered relationships between actors (web
pages). This is actually a key component of search engine architecture.
Networks are all around us — such as road networks, internet networks, and online
social networks like Facebook. While this article focuses on social network analysis
(keyword: social), learning these techniques will give you valuable tools in your
toolbelt to provide insight on a variety of data sources.
What Does a Social Network Graph Look Like?
A social network graph contains both points and lines connecting those dots — similar
to a connect-the-dot puzzle. The points represent the actors and the lines represent
the relationships. An example of a social network graph can be seen below (taken
from my article “Community Detection of ISIS Twitter Accounts”).
Example Social Network Graph
What Tools Do I Need To Get Started?
Like many things in data science, there a variety of tools you can use to conduct SNA.
This guide focuses on a specific set of tools in order to get you started making network
graphs and conducting analysis on them. In no way are these the only or best tools
available.
Gephi
This guide will use Gephi, a free software for Mac, PC, and Linux, in order to build
network graphs and run some analytics on them. Gephi provides a GUI interface
(seen below) and will not require any coding to use. Gephi can be downloaded here.
Gephi Overview Page
Python/Excel
In order to build network graphs in Gephi, a specific data format must be used. In
order to fit our data into the correct format a tool must be used to create CSV files.
With simple data, Excel should suffice. However, when using large amounts of data or
data that must have its relationships extracted it is recommended to use Python.
Don’t fret if you do not have any Python skills — you should still be able to build some
basic networks.
Data Source
You will also need a data source for your network. Network data have two
requirements: actors and relationships. Some data will require these relationships to
be extracted, and others it will be more explicit in the dataset. I recommend using
datasets from Kaggle to get started. Some recommendations are listed below:
1. Marvel Universe Social Network
2. Wikipedia Article Network
3. Deezer Social Network
Nodes and Edges
Up until now, I have referred to both actors and relationships. In network science,
actors are referred to as nodes (the dots on the graph) and relationships as edges (the
lines on the graph). You will see me use this terminology throughout the rest of this
article.
Examples for nodes and relations
Nodes can represent a variety of ‘actors’. In internet networks, nodes can represent
web pages. In social networks, nodes can represent people. In supply chain networks,
nodes can represent organizations. In foreign relations networks, nodes can represent
countries. While nodes can represent a variety of things, they are all the thing that has
a relationship with another thing.
Edges can represent a variety of ‘relationships’. In internet networks, edges can
represent hyperlinks. In social networks, edges can represent connections. In supply

chain networks, edges can represent the transfer of goods. In foreign relations
networks, edges can represent policies. Like nodes, edges can represent a variety of
things.
Nodes and edges are a key concept in networks, so make sure you have a good
understanding of them before tackling the other concepts.
Edge Direction
There are two types of edges: directed and undirected. It will be necessary to decipher
what type of edge your data contains when building a network graph.
Directed edges are applied from one node to another with a starting node and an
ending node. For example, when a twitter user tags another twitter user in a tweet,
that relationship is directed. The user who wrote the tweet (starting node) applied
that relationship to the user who they tagged (ending node). The tagged user has not
necessarily reciprocated that relationship. Another example of a directed edge are
payments. If a customer (starting node) pays a coffee shop (ending node) for a coffee,
that relationship is not necessarily reciprocated because the coffee shop has not also
paid the customer.

Undirected edges are the opposite of directed edges. These relationships are
reciprocated by both parties without a clear starting node and ending node. For
example, if two people are friends on Facebook, that relationship is undirected. This is
because it can be said that Person A is friends with Person B, but it can also be said
that Person B is friends with Person A. Another example of an undirected edge is
Meetup groups. This is because it can be said that Person A is in a group with Person
B, but it can also be said that Person B is in a group with Person A.

Edge Weight
An Edge’s weight is the number of times that edge appears between two specific
nodes. For example, if Person A buys a coffee from a coffee shop 3 times, the edge
connecting Person A and the coffee shop will have a weight of 3. However, if Person B
only buys coffee from the coffee shop once, the edge connecting Person B and the
coffee shop will have a weight of 1.
Centrality Measures
Centrality is a collection of metrics used to quantify how important and influential a
specific node is to the network as a whole. It is important to remember that centrality
measures are used on specific nodes within the network, and do not provide
information on a network level. There are several centrality measures, but this guide
will cover degree, closeness, and betweenness.
Degree
A node’s degree is the number of edges the node has. In an undirected
network(see edge direction section), there is only one measure for degree. For
example, if node A has edges connecting it to Node B and Node D, then node A’s
degree is 2.
However, in a directed network, there are actually three different degree measures.
Because these edges have a starting and end node, the in-degree (number of edges the
node is an end node of), out-degree(number of edges a node is a starting node of), and
degree (number of edges a node is either a starting node or end node of) can be
calculated.
Closeness
Closeness measures how well connected a node is to every other node in the network.
A node’s closeness is the average number of hops required to reach every other node
in the network. A hop is the path of an edge from one node to another. For example,
as seen in the diagram below, Node A is connected to Node B, and Node B is
connected to Node C. For Node A to reach Node C it would take two hops.
Betweenness
Betweenness measures the importance of a node’s connections in allowing nodes to

reach other nodes (in a hop). A node’s betweenness is the number of shortest paths
the node is included in divided by the total number of shortest paths. This will provide
the percentage of shortest paths in the network that the node is in.
Network-Level Measures
Metrics can also be calculated on the network level, evaluating the entire network
instead of a single node in it. Like centrality measures, there are a variety of network-
level measures. This guide will cover size and density.
Network Size
Network size is the number of nodes in the network. The size of a network does not
take into consideration the number of edges. For example, a network with nodes A, B,
and C has a size of 3.

Network Density
Network density is the number of edges divided by the total possible edges. For
example, a network with Node A connected to Node B, and Node B connected to Node
C, the network density is 2/3 because there are two edges out of a possible 3.
Path-Level Measures
Path-level measures provide information for a path between one node and another
node. Paths follow edges between nodes, known as hops. There are also many
different path-level measures, but this article will cover length and distance.
Length
Length is the number of edges between the starting and ending nodes, known as hops.
In order to calculate the length between two nodes, a path must be predetermined.
Distance
Distance is the number of edges or hops between the starting and ending nodes
following the shortest path. Unlike length, the distance between two nodes uses only
the shortest path — the path that requires the least hops.
Connected Components and Bridges
Not all nodes in a network will necessarily be connected to each other. A connected
component is a group of nodes that are connected to each other, but not connected to
another group of nodes. Another way of thinking of this is a group of connected nodes
that have no path to a node from another group. Depending on the network, there can
be many connected components, or even only one. The below diagram shows a
network with two connected components.

A bridge is a node that when removed, creates a connected component. Another way
of thinking about it is that a bridge is a node that is the sole connection of a group of
connected nodes to another group of connected nodes.

Hubs and Authorities
Hubs and Authorities are node classifications used in directed networks. A hub is a
node that has many edges pointing out of it. You can also think of a hub as a node that
is the starting node of many edges. An authority, on the other hand, is a node that has
many edges pointing to it. You can also think of authority as a node that is the ending
node of many edges. There is not a pre-defined number of edges that makes a node a
hub or an authority and will depend on the network. In addition, remember that not
all nodes in a directed network will be a hub or an authority.
Dyads and Cliques
Dyads and Cliques are pairings of nodes connected by edges. A dyad is a pairing of
two nodes, while a clique is a pairing of three or more nodes. While a dyad or clique
may be a connected component, they can also be part of a larger connected
component.
Implementation
Now that you have an understanding of social network analysis terms and concepts,
this guide will walk you through applying these techniques to a dataset using the
Gephi software.
Download and Install Gephi
First, download and install the Gephi software for the operating system your machine
is running. Gephi is available for Mac, PC, and Linux and can be downloaded here.
Dataset
For this guide, we will be using the Marvel Universe Social Network dataset from
Kaggle. While this dataset is already laid out with a node and edge list, when working
with datasets not structured as a network this will require some data transformation
skills. I recommend using Python and Pandas in these situations.
The dataset can be downloaded here.
After downloading the dataset, there will be three csv files: nodes, edges, and network.
Open the file nodes.csv in Excel.

nodes.csv in Excel
The nodes file contains a list of all the nodes in the network. This file has two
columns: node and type. This network contains two different types of nodes that
represent different actor types. If you are familiar with object-oriented programming,
you can think of the node type as a class and nodes as objects. The two types of nodes
in this network are heroes and comics.
There is no data preparation needed to import this node list into Gephi, so we will
close the file.
Open the file edges.csv in Excel.

edges.csv in Excel
The edges file also contains two columns: hero and comic. Each row in this table
represents a single edge. The hero node and comic node are the two nodes connected
by the edge.
In Gephi, an edges table requires the column headers of ‘source’ and ‘target’. In an
undirected network it does not matter which node is in which column. However, in a
directed network the source column contains the starting node and the target column
contains the ending node. Rename column A to ‘source’ and column B to ‘target’.
Then save the file.

edges.csv with renamed column headers
Loading Network Data into Gephi
Now that the node and edge lists are properly formatted for Gephi, it is time to load
the data!
Open the Gephi software. You should see the below screen.
Click on ‘New Project’. If you do not see the welcome screen, go to file>new project.
Then, click the Data Laboratory tab.

The data laboratory tab is where we will load in our edge and node list files. To import
a list click the import spreadsheet button.
Then navigate to the folder containing the datasets and open the nodes file.
An import wizard will then step you through correctly importing the node list. Set
Separator to Comma, Import as to Nodes table, and Charset as UTF-8. Then click
next.
After clicking next, the wizard will provide additional setting configurations. Set Time
representation to Intervals. For Imported columns, check the node and type boxes
and set their data types to string. Then, click finish.
There is one more step in importing the nodes list. Set Graph Type to Undirected and
Edges merge strategy to Sum. Ensure that it is set up to append to the existing
workspace. Then, click OK.

You should now see some data in the data laboratory window! Next we need to import
the edges list.
To import the edges list, click on Import Spreadsheet and open the edges.csv file. In
the import wizard set Separator to Comma, Import as to Edges table, and Charset to
UTF-8. Then click next.

Set Time representation to Intervals then click Finish.
Then, set Graph Type to Undirected and Edges merge strategy to Sum. Choose
append to existing workspace. Click OK.

Congrats! You have just imported the node and edge lists! In the data library, you can
switch your view between these two lists by clicking on Nodes or Edges in the top left-
hand corner.
Now that the data has been imported it is time to view the graph. Click on the
overview tab.
Using Layout Functions

You might be disappointed in the graph that was visualized. It will likely look like the
black mess below.
In order to make the graph more readable, we will need to use a layout function to
change the position of nodes in the graph.

There are a variety of layout functions in Gephi, however, in this guide, we will be
using the ForceAtlas 2 function. Select this function and then click run. You will see
the nodes move in real-time, and you can stop the function when you like the position
of the nodes.
After running the layout function your graph should look something like the one
shown below. You can continue to play with other layout functions if you wish to get a
better node position. This is by no means the best node layout for this graph. In
addition, you can change the parameters of layout functions. While this guide uses the
stock ForceAtlas 2 parameters, changing them can give you better control over the
node positions.
Calculating Network-Level Measures
This guide previously covered the network-level measures of Size and Density. Let's
calculate what the network size and density of this Marvel network are.
The network size is easy to find. In the upper right-hand corner is a pane called
Context. This window provides the number of nodes and edges in the graph. Because
a network’s size is the number of nodes in it, the network size of our Marvel network
is 19,090.
To find the network density, we will take our first dive into the statistics window. Click
on the statistics tab shown below.
You should then see the below window.

The statistics window contains many measures that can be calculated on the network.
To find the network density, click run for Graph Density.
Select undirected, and then click OK.

A new window will then pop up showing the results. This Marvel network was a
density of 0.001
You can save this report by clicking the save button in the bottom left-hand corner, or
close it by clicking the close button in the bottom right-hand corner.
Calculating Centrality Measures
Recall that centrality measures are on a node-level, and not a network-level. However,
centrality measures can also be averaged to get a network-level metric. In Gephi, you
calculate centrality measures as a network-level average, which then also inputs the
centrality measure on a node-level into the data laboratory tab.
Node Degree
To calculate node degree, click run on the average degree algorithm in the statistics
window.
The report will provide you with the average degree for the network, as well as a
distribution graph. While these can be useful in some applications, we are more
interested in the degree on a node-level. Click close on the report.

To see the degree for each node in the network, go back to the data laboratory window
and click on the node table. You will see a new column in the data titled degree.
Node Closeness and Betweenness
Calculating node closeness and betweenness is a similar process as calculating node
degree. In the statistics window, click run on the network diameter algorithm.
Select undirected and click OK. Depending on the specs of your machine this may
take a little bit to calculate.
Like with the node measure, Gephi will provide a network-level report. Click close on
this report and go to the data laboratory.

In the data laboratory, you will find additional columns in the node table including
the node betweenness and closeness.

Calculating Edge Weights
Edge weights are auto-calculated in Gephi and can be found in the edge list in the
data laboratory.
Using Color in Network Graphs
Currently, our graph nodes and edges are black, providing no additional information.
Both nodes and edges can be color-coded in Gephi to provide additional information.
Coloring options can be found in the appearance window.
To color-code the nodes of the graph based on the node degree, click on the nodes
button and the color palette button in the appearance window.

There are three options to encode information in the color of nodes: Unique,
Partition, and Ranking. If you want to change the color of all nodes of the graph to the
same color, use the Unique window. Partition will break the nodes into color-coded
groups. Ranking will color-code the nodes on a scale.
Lets color the nodes by their degree. To do this, click on the ranking section and select
degree.
A color scale will be used to color the nodes. To select a new scale, click on the color
selector button to the right of the color scale.

You can select any color scale to use. Then click apply.
As you can see in the above image, coloring our nodes also colored our edges. You can
change the color of edges to a specific color using the Unique color tab for edges, or
apply a ranking or partitioning color scale to them.
Using Size in Network Graphs
You may also notice that the majority of the graph is colored red. This is because most
nodes in the graph have a low degree. Zooming in will show that some nodes are
yellow or blue.
To make these nodes easier to see in the graph let’s scale the size of the nodes to the
node degree as well. To do this click on the nodes and size buttons in the appearance
window.
Then, click on ranking and select degree. Change the minimum size to 1 and the
maximum size to 100.
Then click apply.
As can be seen in the above graph, we can now better see what nodes have a high
degree.
Changing Background Color
Let's also change the background from white to black. Depending on the colors used
in a graph, either color may look better and is often up to personal preference. To
change the color to black, press the lightbulb button.

Recap
That is the end of this guide. While this should you get started making your first
network graph using the Marvel dataset, I encourage you to continue playing around
with this graph in Gephi. There are many more measures you can calculate and other
appearances you could use.
Your next step should be to take another dataset and try to reproduce these steps on
that data. Finally, you can try to collect your own data and transform it into network
data.
I hope this guide was useful to you! Feel free to reach out if you have any questions!

Social Network Analysis - AA - Article - Session 12

Uploaded by

Copyright:

Available Formats

Social Network Analysis - AA - Article - Session 12

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Social Network Analysis - AA - Article - Session 12

Uploaded by

Copyright:

Available Formats

How To Get Started with Social Network Analysis

A Complete Beginner’s Guide to Getting Up and Running Making Beautiful Network

This guide is intended to get complete beginners in social network analysis up to

also provided below:

1. Why Should I Care About Social Network Analysis?

2. What Does a Social Network Graph Look Like?

3. What Tools Do I Need To Get Started?

1. Nodes and Edges

7. Connected Components and Bridges

8. Hubs and Authorities

9. Dyads and Cliques

2. Download and Install Gephi

4. Loading Network Data into Gephi

5. Using Layout Functions

6. Calculating Network-Level Measures

7. Calculating Centrality Measures

8. Calculating Edge Weights

9. Using Color in Network Graphs

11. Changing Background Color

Why Should I Care About Social Network Analysis?

techniques can also be applied to networks outside of the societal realm.

relationships. A common application of SNA techniques is with the internet. Web

pages). This is actually a key component of search engine architecture.

toolbelt to provide insight on a variety of data sources.

What Does a Social Network Graph Look Like?

from my article “Community Detection of ISIS Twitter Accounts”).

Example Social Network Graph

What Tools Do I Need To Get Started?

Gephi Overview Page

1. Marvel Universe Social Network

2. Wikipedia Article Network

3. Deezer Social Network

Nodes and Edges

a relationship with another thing.

Edges can represent a variety of ‘relationships’. In internet networks, edges can

represent hyperlinks. In social networks, edges can represent connections. In supply

understanding of them before tackling the other concepts.

necessarily reciprocated that relationship. Another example of a directed edge are

paid the customer.

that Person B is friends with Person A. Another example of an undirected edge is

B, but it can also be said that Person B is in a group with Person A.

coffee shop will have a weight of 1.

Centrality is a collection of metrics used to quantify how important and influential a

specific node is to the network as a whole. It is important to remember that centrality

will cover degree, closeness, and betweenness.

A node’s degree is the number of edges the node has. In an undirected

as seen in the diagram below, Node A is connected to Node B, and Node B is

Betweenness measures the importance of a node’s connections in allowing nodes to

level measures. This guide will cover size and density.

and C has a size of 3.

Connected Components and Bridges

network with two connected components.

connected nodes to another group of connected nodes.

may be a connected component, they can also be part of a larger connected

Download and Install Gephi

skills. I recommend using Python and Pandas in these situations.

The dataset can be downloaded here.

Open the file nodes.csv in Excel.

in this network are heroes and comics.