0% found this document useful (0 votes)
36 views

Assignment 10 - Network Visualization

This document provides a comprehensive guide for visualizing and analyzing large networks using Gephi, specifically focusing on a Facebook dataset. It includes prerequisites for software installation, steps for importing data, visualizing networks, and filtering nodes based on degree. Additionally, it outlines how to save and display the Gephi file in a web browser and details the assignment submission process.

Uploaded by

kotoole8r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Assignment 10 - Network Visualization

This document provides a comprehensive guide for visualizing and analyzing large networks using Gephi, specifically focusing on a Facebook dataset. It includes prerequisites for software installation, steps for importing data, visualizing networks, and filtering nodes based on degree. Additionally, it outlines how to save and display the Gephi file in a web browser and details the assignment submission process.

Uploaded by

kotoole8r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Data Visualization

Network Data Visualization


As you proceed with the assignment, follow the written instructions. Screenshots
are provided ONLY as a reference.

Make sure you submit all screenshots with a clearly visible menu bar including the
date and timestamp.

Objective:

The objective of this exercise is to develop skills on how to visualize and analyze large
networks using Gephi. This exercise focuses on building a social network visualization
using Gephi. For this exercise, we will work on a large Facebook dataset.

Prerequisites:

1. Install Java SE Development Kit 8

http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-
2133151.html

Create an Oracle account if needed to sign in

Judd D. Bradbury  Page 1


Data Visualization

2. Install Gephi
http://gephi.github.io/users/download/

3. In case Gephi does not work after completing steps 1 & 2, follow the steps as
below:
a. Open your Gephi Installation folder (probably C:\Program Files (x86)\Gephi-
0.8.2 ) and locate etc folder
b. Within the etc folder, you will find a file named Gephi.conf. Open the file
with notepad
c. Search for “#jdkhome="/path/to/jdk"
d. Remove #, otherwise the code will be considered as a comment and hence
will not be executed
e. Replace the text /path/to/jdk inside double quotations with the directory
address of your java folder. In order to find the java folder, go to windows
drive (Probably C:), go to Program Files and find Java folder. Inside the java
folder you will find a folder that starts with “jdk1.7”. Copy the path of this
folder and paste it instead of /path/to/jdk in the Gephi.config file.
f. Save the config file, in case you are allowed to save it then save it
somewhere else and then replace it by the original file.

4. Open Gephi. If the Welcome pop-up screen appears then click on cancel button.

Judd D. Bradbury  Page 2


Data Visualization

5. Download Javascript Gexf viewer, to view the Gephi file in the web browser. After
downloading the file make sure that you place the extracted folder in desktop or in
any other know location and later your exported Gephi file needs to be saved in
this folder.

https://github.com/raphv/gexf-js

Judd D. Bradbury  Page 3


Data Visualization

Step 1: Download the provided CSV files from eLearning

About Gephi

Concept: Gephi is an open-source software for network visualization. It can read many
file formats including Gephi, GEXF, GML, GDF, CSV and many others. As part of this
exercise, we would be using csv files.

Before starting with Gephi, we should familiarize ourselves with 2 terms – Nodes and
Edges.

Node: A node is a unique identifier of an object within a data set.

Edge: An edge is a line or the relationship that connects two nodes

To import data from excel or csv file into Gephi, you will usually need to prepare following
2 files:

1. Node File – Containing the nodes and its attributes


a. The node file must include a column having name ‘ID’
b. ID column should contain unique entries
2. Edge File – Containing the edges and its attributes
a. The edge file must include columns with name ‘Source’ & ‘Target’ which
contains the start and the end nodes for each edge

Step 2: Import data into Gephi

Once Gephi is installed properly, the following screen should appear:

Judd D. Bradbury  Page 4


Data Visualization

Click on New Project in the welcome screen as highlighted in the screen shot below or
alternatively follow the menu path File -> New project

There are 3 main sections within Gephi:

1. Overview Tab : Setup the network visualization

Judd D. Bradbury  Page 5


Data Visualization

2. Data Laboratory Tab : Import, Examine & Edit network data (Nodes & Edges
table)
3. Preview Tab : Configure the rendering settings, for instance color, label
sizes etc. and preview the visualization.

To upload the data, click on the data laboratory tab as highlighted below

Import Nodes File:

1. Click on Import Spreadsheet under Data Laboratory tab

2. Select the csv file downloaded earlier called “facebook_accounts”.


3. A popup will appear. Make sure that the file is being imported as “Nodes table”.
Confirm that the file selected correctly displays all the data and click “Next”.

Judd D. Bradbury  Page 6


Data Visualization

4. Configure the import settings on the second page as shown below.

Gephi automatically tries to match the imported columns with suitable data types. You
can configure it according to the required usage. Here, we change the facebook_id from
Long to String.

5. Click on “Finish” to complete the import.

Judd D. Bradbury  Page 7


Data Visualization

6. A new popup called Import report is displayed. Change the Graph type from Mixed
to Undirected and click on “OK”.

You will notice there are multiple graph types supported by Gephi. Since we are using a
Facebook network for this assignment, we will choose undirected i.e., both accounts
(nodes) are friends with each other. Undirected relationships indicate a connection that
flows in both directions, while a directed relationship moves in a single direction that
always originates in one node and moves toward another node.

In case of directed graphs, edges may also be weighted, to show the strength of the
connection between nodes. In a supply chain network where a vendor A supplies to both
companies B and C, the strength (width) of the edge will show where the stronger
relationship exists. If A supplies B 3 products, while A supplies C a total of 9 products, we
should weight the A to C connection three times that of the A to B connection.

Question 1: Identify whether the following networks are directed or undirected:

A. Twitter Network – directed, you can follow ppl without them following you.

B. Railway Network – undirected, railways and trains trains could travel east down
a railroad, and then later travel west; therefore going both directions

C. Pedestrian Pathway Network – undirected, while people often stick to the right
side of the pathways (in the US) the pathway often goes in both directions. This
allows pedestrians to come and go on the same path.

Judd D. Bradbury  Page 8


Data Visualization

You should now be able to view the contents of the node file by clicking on the nodes tab
under Data Laboratory tab.

Question 2: Paste a screenshot of your nodes table.

Import Edges File:

Follow the same steps as followed while uploading the nodes file.

1. Click on Import spreadsheet icon under Data Laboratory tab.

2. Select the csv file downloaded earlier called “facebook_friends”.

3. A popup will appear. Make sure that the file is being imported as “Edges table”.
Confirm that the file selected correctly displays all the data and click “Next”.

Judd D. Bradbury  Page 9


Data Visualization

4. Click on “Finish” to complete the import.

5. A new popup called Import report is displayed. Make sure to select Graph type as
Undirected and check “Append to existing workspace” option. Click on “OK”.

Judd D. Bradbury  Page 10


Data Visualization

An Issues after import process window will be displayed. A successful import should not
have any errors. Go ahead and close the report.

You can view the contents of the edges file by clicking on edges tab, under data
laboratory.

Question 3: Paste a screenshot of your edges table.

Judd D. Bradbury  Page 11


Data Visualization

Step 4: Prepare Visualization

In order to create visualizations, click on the overview tab.

Notice that Gephi has already provided us with a default visualization based on our data
set once you click on the overview tab as highlighted below

Question 4: Paste a screenshot of the visualization in the overview tab.

Step 4: Filtering in Gephi

This data set has a huge number of nodes and edges. Our first step will be to determine
the average number of degrees of the nodes using Average Degree parameter present
in the right panel under Network Overview within Statistics tab.

Note: In case the statistic tab is not visible right side of the window then go to context
menu Window -> Statistics. After this step, the statistic should be visible on the right-
side panel.

Judd D. Bradbury  Page 12


Data Visualization

Click on Run beside the Average Degree parameter. This will assign weights to the node
ids based on the number of edges or relationships present irrespective of the type of entry
i.e., whether it is duplicate or distinct. It will count all the entries against that node id in the
edges table.

Once you click on run, you will get the degree report.

Question 5: What is the average degree of the nodes in this dataset?


Average Degree: 15.220

After this step click on the nodes tab under data laboratory tab. Notice that new
columns have been added to the nodes table with weights assigned to different records
as highlighted below.

Observe there are many accounts with a wide range of degrees (friends). We want to
filter out the nodes with less than 100 degrees. Go ahead and close the Degree Report.
After this step, go to the Filters tab, next to the Statistics tab and select Topology. Drag
the Degree Range filter to the Drag filter here under the Queries section below.

Judd D. Bradbury  Page 13


Data Visualization

Increase the lower bound from 1 to 100 as shown below. Click on Filter.

Judd D. Bradbury  Page 14


Data Visualization

Notice that the number of nodes and edges have decreased significantly.
Question 6: Paste a screenshot of the filtered network visualization in the overview
tab.

Judd D. Bradbury  Page 15


Data Visualization

Step 5.1: Visualization based on page_type

Go to the Appearance section on the left and under the Nodes tab select Partition.
Make sure you have selected the Color icon. Select page_type from the dropdown. Click
on Apply.

Judd D. Bradbury  Page 16


Data Visualization

Question 7: How many categories are there under page_type? Rank them in correct
order.

4 categories

Rank:

1. Government
2. Company
3. Politician
4. Tv show

Similarly, go to the Appearance section again and this time select the Size icon. Under
the Nodes tab select Ranking. Select degree from the dropdown. Set the Min size and
Max size to 20 and 200 respectively. Click on Apply. After this step you will notice that
the node size changes.

Choose algorithm Fruchterman Reingold under layout section on the left panel.
Increase the Area to 50000.0 and Speed to 2.0. And then click on Run. Let it run for a 1
minute and then once the visualization is expanded and clearly visible, stop the algorithm
by clicking on the stop button.

Notice that the visualization is changed now. Clustering has now become more
prominent, and nodes seem to have re-organized based on page type (color).

Question 8: Paste a screenshot of the updated network visualization in the


overview tab.

Judd D. Bradbury  Page 17


Data Visualization

Question 9: Which accounts have the highest degree for each page_type?

Page_type  highest degree

1. Government  US Army
2. Company  Facebook
3. Politician Barack Obama
4. Tv show  today Show

Hint: Find the node with the biggest circle, right click on it, and then click on select in data
laboratory. Go the data laboratory and mention the selected label in the answer.

Step 5.2: Visualization based on modularity

Coloring the nodes by modularity.

Click on Run beside the Modularity parameter under the Community Detection section
of the Statistics Tab. Modularity is a measure of the strength of the network graph's
division into clusters. High modularity means there are dense connections between nodes
Judd D. Bradbury  Page 18
Data Visualization

in the same cluster and few connections to nodes in different clusters. Gephi uses the
Louvain method for modularity. This method is used uncover communities in large
networks (millions of nodes) quickly.
Once you click on run, you will get the modularity report.
Question 10: How many communities are created in your dataset?

Results:

Modularity: 0.387

Modularity with resolution: 0.387

Number of Communities: 8

Go back to the Appearance section and under the Nodes tab select Partition. Make
sure you have selected the Color icon. Select Modularity Class from the dropdown this
time. Click on Apply.
Judd D. Bradbury  Page 19
Data Visualization

YIFAN HU: Choose algorithm Yifan Hu under layout section on the left panel. Change
the Optimal Distance to 1000.0 and Relative Strength to 1.0. And then click on Run.

Question 11: Paste a screenshot of the network visualization in the overview tab.

FORCEATLAS2: Choose algorithm ForceAtlas 2 under layout section on the left panel.
Scroll down to the Behavior Alternatives and check Linlog mode and Prevent Overlap.
And then click on Run. Let it run for a 1 minute and then once the visualization is
expanded and clearly visible, stop the algorithm by clicking on the stop button.

Question 12: Paste a screenshot of the network visualization in the overview tab.

Judd D. Bradbury  Page 20


Data Visualization

Question 13: Name the account (label) that is part of a modularity class with the
lowest number of members.

Home & Family

Step 5.3: Save Gephi File

Go to context menu File -> Save and then save the file as
StudentName_Website.Gephi

Also export the file as a Graph file ( *.GEXF file), in order to view it in the web browser.

Follow the below context menu path.

Judd D. Bradbury  Page 21


Data Visualization

A popup window appears, maintain the filename as StudentName_Website and select


file type as .GEXF file and then click Save.

Judd D. Bradbury  Page 22


Data Visualization

Step 6: Display the Gephi file in the browser

 Double click on the gexf_js_master folder created in the prerequisites section


under step 3 and then open the config.js files using text editors (Sublime,
Notepad++, Brackets etc.)
 Copy and Place all the .gexf files created from above steps into gexf_js_master
folder.
 Maintain your exported .gexf file as highlighted below

Note: You will have to do this step for each of the .gexf files

 Right click on Index.html in JS folder and then select open with any web browser of
your choice. You should be able to view the Gephi file on the browser.
 Note: Use Mozilla Firefox, if it doesn’t open in your regular browser

Judd D. Bradbury  Page 23


Data Visualization

If you cannot view your visualization in a Firefox browser, you will need to go into the
Terminal Program for Apple, or the Command Prompt for Windows.

If it still does not work, then you might have some security issues in your system. To
overcome those, please download “Web Server for Chrome extension” :

https://chrome.google.com/webstore/detail/web-server-for-chrome/
ofhbbkphhbklhfoeikjpcbhemlocgigb?hl=en

After you install it, launch the app. You will see this window:

Judd D. Bradbury  Page 24


Data Visualization

Go ahead and click on ‘Choose Folder’ and direct it to the directory where you have
stored your gexf master file. You can then click on the web server URL or go to
http://127.0.0.1:8887 to access your files locally.

Question 14: Display the exported .gexf file in the browser and paste the screen
shot.

Step 7: Attach assignments in eLearning

1. Attach the assignment document (only with the answers) in Microsoft Word
2. Attach the .gephi file
3. Attach the .gexf file

Judd D. Bradbury  Page 25

You might also like