Assignment 10 - Network Visualization
Assignment 10 - Network Visualization
Make sure you submit all screenshots with a clearly visible menu bar including the
date and timestamp.
Objective:
The objective of this exercise is to develop skills on how to visualize and analyze large
networks using Gephi. This exercise focuses on building a social network visualization
using Gephi. For this exercise, we will work on a large Facebook dataset.
Prerequisites:
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-
2133151.html
2. Install Gephi
http://gephi.github.io/users/download/
3. In case Gephi does not work after completing steps 1 & 2, follow the steps as
below:
a. Open your Gephi Installation folder (probably C:\Program Files (x86)\Gephi-
0.8.2 ) and locate etc folder
b. Within the etc folder, you will find a file named Gephi.conf. Open the file
with notepad
c. Search for “#jdkhome="/path/to/jdk"
d. Remove #, otherwise the code will be considered as a comment and hence
will not be executed
e. Replace the text /path/to/jdk inside double quotations with the directory
address of your java folder. In order to find the java folder, go to windows
drive (Probably C:), go to Program Files and find Java folder. Inside the java
folder you will find a folder that starts with “jdk1.7”. Copy the path of this
folder and paste it instead of /path/to/jdk in the Gephi.config file.
f. Save the config file, in case you are allowed to save it then save it
somewhere else and then replace it by the original file.
4. Open Gephi. If the Welcome pop-up screen appears then click on cancel button.
5. Download Javascript Gexf viewer, to view the Gephi file in the web browser. After
downloading the file make sure that you place the extracted folder in desktop or in
any other know location and later your exported Gephi file needs to be saved in
this folder.
https://github.com/raphv/gexf-js
About Gephi
Concept: Gephi is an open-source software for network visualization. It can read many
file formats including Gephi, GEXF, GML, GDF, CSV and many others. As part of this
exercise, we would be using csv files.
Before starting with Gephi, we should familiarize ourselves with 2 terms – Nodes and
Edges.
To import data from excel or csv file into Gephi, you will usually need to prepare following
2 files:
Click on New Project in the welcome screen as highlighted in the screen shot below or
alternatively follow the menu path File -> New project
2. Data Laboratory Tab : Import, Examine & Edit network data (Nodes & Edges
table)
3. Preview Tab : Configure the rendering settings, for instance color, label
sizes etc. and preview the visualization.
To upload the data, click on the data laboratory tab as highlighted below
Gephi automatically tries to match the imported columns with suitable data types. You
can configure it according to the required usage. Here, we change the facebook_id from
Long to String.
6. A new popup called Import report is displayed. Change the Graph type from Mixed
to Undirected and click on “OK”.
You will notice there are multiple graph types supported by Gephi. Since we are using a
Facebook network for this assignment, we will choose undirected i.e., both accounts
(nodes) are friends with each other. Undirected relationships indicate a connection that
flows in both directions, while a directed relationship moves in a single direction that
always originates in one node and moves toward another node.
In case of directed graphs, edges may also be weighted, to show the strength of the
connection between nodes. In a supply chain network where a vendor A supplies to both
companies B and C, the strength (width) of the edge will show where the stronger
relationship exists. If A supplies B 3 products, while A supplies C a total of 9 products, we
should weight the A to C connection three times that of the A to B connection.
A. Twitter Network – directed, you can follow ppl without them following you.
B. Railway Network – undirected, railways and trains trains could travel east down
a railroad, and then later travel west; therefore going both directions
C. Pedestrian Pathway Network – undirected, while people often stick to the right
side of the pathways (in the US) the pathway often goes in both directions. This
allows pedestrians to come and go on the same path.
You should now be able to view the contents of the node file by clicking on the nodes tab
under Data Laboratory tab.
Follow the same steps as followed while uploading the nodes file.
3. A popup will appear. Make sure that the file is being imported as “Edges table”.
Confirm that the file selected correctly displays all the data and click “Next”.
5. A new popup called Import report is displayed. Make sure to select Graph type as
Undirected and check “Append to existing workspace” option. Click on “OK”.
An Issues after import process window will be displayed. A successful import should not
have any errors. Go ahead and close the report.
You can view the contents of the edges file by clicking on edges tab, under data
laboratory.
Notice that Gephi has already provided us with a default visualization based on our data
set once you click on the overview tab as highlighted below
This data set has a huge number of nodes and edges. Our first step will be to determine
the average number of degrees of the nodes using Average Degree parameter present
in the right panel under Network Overview within Statistics tab.
Note: In case the statistic tab is not visible right side of the window then go to context
menu Window -> Statistics. After this step, the statistic should be visible on the right-
side panel.
Click on Run beside the Average Degree parameter. This will assign weights to the node
ids based on the number of edges or relationships present irrespective of the type of entry
i.e., whether it is duplicate or distinct. It will count all the entries against that node id in the
edges table.
Once you click on run, you will get the degree report.
After this step click on the nodes tab under data laboratory tab. Notice that new
columns have been added to the nodes table with weights assigned to different records
as highlighted below.
Observe there are many accounts with a wide range of degrees (friends). We want to
filter out the nodes with less than 100 degrees. Go ahead and close the Degree Report.
After this step, go to the Filters tab, next to the Statistics tab and select Topology. Drag
the Degree Range filter to the Drag filter here under the Queries section below.
Increase the lower bound from 1 to 100 as shown below. Click on Filter.
Notice that the number of nodes and edges have decreased significantly.
Question 6: Paste a screenshot of the filtered network visualization in the overview
tab.
Go to the Appearance section on the left and under the Nodes tab select Partition.
Make sure you have selected the Color icon. Select page_type from the dropdown. Click
on Apply.
Question 7: How many categories are there under page_type? Rank them in correct
order.
4 categories
Rank:
1. Government
2. Company
3. Politician
4. Tv show
Similarly, go to the Appearance section again and this time select the Size icon. Under
the Nodes tab select Ranking. Select degree from the dropdown. Set the Min size and
Max size to 20 and 200 respectively. Click on Apply. After this step you will notice that
the node size changes.
Choose algorithm Fruchterman Reingold under layout section on the left panel.
Increase the Area to 50000.0 and Speed to 2.0. And then click on Run. Let it run for a 1
minute and then once the visualization is expanded and clearly visible, stop the algorithm
by clicking on the stop button.
Notice that the visualization is changed now. Clustering has now become more
prominent, and nodes seem to have re-organized based on page type (color).
Question 9: Which accounts have the highest degree for each page_type?
1. Government US Army
2. Company Facebook
3. Politician Barack Obama
4. Tv show today Show
Hint: Find the node with the biggest circle, right click on it, and then click on select in data
laboratory. Go the data laboratory and mention the selected label in the answer.
Click on Run beside the Modularity parameter under the Community Detection section
of the Statistics Tab. Modularity is a measure of the strength of the network graph's
division into clusters. High modularity means there are dense connections between nodes
Judd D. Bradbury Page 18
Data Visualization
in the same cluster and few connections to nodes in different clusters. Gephi uses the
Louvain method for modularity. This method is used uncover communities in large
networks (millions of nodes) quickly.
Once you click on run, you will get the modularity report.
Question 10: How many communities are created in your dataset?
Results:
Modularity: 0.387
Number of Communities: 8
Go back to the Appearance section and under the Nodes tab select Partition. Make
sure you have selected the Color icon. Select Modularity Class from the dropdown this
time. Click on Apply.
Judd D. Bradbury Page 19
Data Visualization
YIFAN HU: Choose algorithm Yifan Hu under layout section on the left panel. Change
the Optimal Distance to 1000.0 and Relative Strength to 1.0. And then click on Run.
Question 11: Paste a screenshot of the network visualization in the overview tab.
FORCEATLAS2: Choose algorithm ForceAtlas 2 under layout section on the left panel.
Scroll down to the Behavior Alternatives and check Linlog mode and Prevent Overlap.
And then click on Run. Let it run for a 1 minute and then once the visualization is
expanded and clearly visible, stop the algorithm by clicking on the stop button.
Question 12: Paste a screenshot of the network visualization in the overview tab.
Question 13: Name the account (label) that is part of a modularity class with the
lowest number of members.
Go to context menu File -> Save and then save the file as
StudentName_Website.Gephi
Also export the file as a Graph file ( *.GEXF file), in order to view it in the web browser.
Note: You will have to do this step for each of the .gexf files
Right click on Index.html in JS folder and then select open with any web browser of
your choice. You should be able to view the Gephi file on the browser.
Note: Use Mozilla Firefox, if it doesn’t open in your regular browser
If you cannot view your visualization in a Firefox browser, you will need to go into the
Terminal Program for Apple, or the Command Prompt for Windows.
If it still does not work, then you might have some security issues in your system. To
overcome those, please download “Web Server for Chrome extension” :
https://chrome.google.com/webstore/detail/web-server-for-chrome/
ofhbbkphhbklhfoeikjpcbhemlocgigb?hl=en
After you install it, launch the app. You will see this window:
Go ahead and click on ‘Choose Folder’ and direct it to the directory where you have
stored your gexf master file. You can then click on the web server URL or go to
http://127.0.0.1:8887 to access your files locally.
Question 14: Display the exported .gexf file in the browser and paste the screen
shot.
1. Attach the assignment document (only with the answers) in Microsoft Word
2. Attach the .gephi file
3. Attach the .gexf file