Rome All-Stata Ppt1
Rome All-Stata Ppt1
Rome All-Stata Ppt1
USING STATA
Thomas Grund
University College Dublin
thomas.u.grund@gmail.com
Rome, 2016
http://nwcommands.org
/TUTORIALS AND SLIDES
Dialog boxes and Stata menus
Commands in the command line
Own simple .do files
Own advanced .do files
Own .ado files
Code in Mata
Object-oriented programming
Plugins
Networks? Never heard of it
Isnt this something about Facebook?
Some basic knowledge about networks
Am using it somehow in my work
Sunbelt hospitality suite!
Experienced network scholar
I am a network guru
NETWORK ANALYSIS
- Simple description/characterization of networks
- Calculation of node-level characteristics (e.g. centrality)
- Components, blocks, cliques, equivalences
- Visualization of networks
- Statistical modeling of networks, network dynamics
- .
Purpose-built
Excel/R extensions
C++/Python libraries
NWCOMMANDS
Software package for Stata. Almost 100 new Stata commands
for handling, manipulating, plotting and analyzing networks.
Ideal for existing Stata users. Corresponds to the R packages
network, sna, igraph, networkDynamic.
Designed for small to medium-sized networks (< 10000).
Almost all commands have menus. Can be used like Ucinet
or Pajek. Ideal for beginners and teaching.
Not just specialized commands, but whole infrastructure for
handling/dealing with networks in Stata.
Writing own network commands that build on the
nwcommands is very easy.
BOOK
Grund, T. and Hedstrm, P. (in preparation) Social
Network Analysis Using Stata. StataPress.
Twitter: nwcommands
Example datasets
. sysuse auto, clear
. help dta_examples
GETTING DATA
. import
. insheet
MANAGING VARIABLES
In Stata a variable is a column of data. Later on we will use
macros (what programmers normally call a variable).
Setting observations
. set obs 100
Generate variable:
. gen myvar = _n
Replace variable:
. replace myvar = 50 if _n > 50
MANAGING VARIABLES
Recode variables
. recode myvar (1/10=1) (11/20=2) (21/49=3)
Tabulate variable
. tab myvar
Summarize variable
. sum myvar
. help command
. help adopath
. help nwcommands
Data editor
. edit
RETURN VECTOR
Many commands, e.g. summarize, display output, but also leave
some information in the so-called return vector. When you write
sophisticated programs, it makes sense to return your result in the
return vector as well.
Or
. net from http://nwcommands.org
. net install nwcommands-ado
. nwinstall, all
UPDATE
Latest version of the software is: 1.6
In case you have a previous version you can update with:
. nwinstall, all
. nwinstall, all
. help nwcommands
THEORETICAL
MOTIVATION
How can we explain
something?
How can we explain
something?
Situational Transformational
Mechanism Behavioral Mechanism
Mechanism
Micro Micro
Macro Macro
Micro Micro
Macro Macro
Micro Micro
ANALYTICAL SOCIOLOGY
Adoption of innovation
Smoking behavior
Divorce
Suicide
...
Crime
International art fairs
NETWORK DYNAMICS
Changes 2005 - 2006
Yogev, T. and Grund, T. (2012) Structural Dynamics and the Market for Contemporary Art: The Case
of International Art Fairs. Sociological Focus, 54(1), 23-40.
CO-OFFENDING
IN YOUTH GANG
Grund, T. and Densley, J. (2012) Ethnic Heterogeneity in the Activity and Structure of a Black Street
Gang. European Journal of Criminology, 9(3), 388-406.
Grund, T. and Densley, J. (2015). Ethnic homophily and triad closure: Mapping internal gang structure
using exponential random graph models. Journal of Contemporary Criminal Justice, 31(3), 354370
MANCHESTER UTD 9/9/2006, Old Trafford
TOTTENHAM
Grund, T. (2012) Network Structure and Team Performance: The Case of English
Premier League Soccer Teams. Social Networks, 34(4), 682-690.
SOCIAL NETWORKS
Social
Friendship, kinship, romantic relationships
Government
Political alliances, government agencies
Markets
Trade: flow of goods, supply chains, auctions
Labor markets: vacancy chains, getting jobs
Organizations and teams
Interlocking directorates
Within-team communication, email exchange
NETWORKS ARE EVEN
MORE UNIVERSAL
Food webs
Internet
Power grids, airline networks
Metabolic networks
Neural networks
Economics networks
NETWORK PARADIGM
Not just composition of elements of a system that matters, but
also how the elements are arranged and related with each
other.
An individuals position in a network (social structure)
determines the opportunities and constraints this individual will
encounter.
Individuals change the social world of others. Individuals are
dependent and embedded in a web of relations.
288 28:
7=
2:4 2:=
ADJACENCY MATRIX
7 3 1 0 1 0 0 0 0 0
2 0 0 0 1 0 1 1
2 1
3 1 0 0 0 0 0 0
6
4 1 1 0 0 1 1 0
4
5 0 0 0 1 0 0 0
6 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0
5
1
7
ADJACENCY MATRIX
7 3 1 0 1 0 0 0 0 0
2 0 0 0 1 0 1 1
2 1
3 1 0 0 0 0 0 0
6
4 1 1 0 0 1 1 0
4
5 0 0 0 1 0 0 0
6 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0
5
1
7
ADJACENCY LIST
7 3
2 1
5
ADJACENCY LIST
7 3
2 1
5
NETWORK ANALYSIS
- Simple description/characterization of networks
- Calculation of node-level characteristics (e.g. centrality)
- Components, blocks, cliques, equivalences
- Visualization of networks
- Statistical modeling of networks, network dynamics
- .
Use network analysis for the right reasons. Not
because it is cool, but because you think it is likely to
help you answer your research question in a better
way.
Be very clear on the type of relation you are looking at and do not
throw obviously different types of relationships in the same pot and
treat them equally.
PITFALLS
Networks by themselves do not do anything. Distinguish
between pure structure and what might happen on the
structure.
Networks might be there, but they might not matter for what
you want to explain.
Syntax:
var4
var1 var3
var2
1 5
4
3
LIST ALL NETWORKS
These are the names of the
networks in memory. You can
refer to these networks by
their name.
. help netexample
MARRIAGE TIES BETWEEN
FLORENTINE FAMILIES
BUSINESS TIES BETWEEN
FLORENTINE FAMILIES
IMPORT NETWORK
A wide array of popular network file-formats are supported, e.g.
Pajek, Ucinet, by nwimport.
Files can be imported directly from the internet as well.
Similarly, networks can be exported to other formats with
nwexport.
SAVE/USE NETWORKS
You can save network data (networks plus all normal Stata
variables in your dataset) in almost exactly the same way as
normal data.
Instead of save, the relevant command is nwsave.
Instead of use, the relevant command is nwuse.
DROP/KEEP NETWORKS
Dropping and keeping networks works almost exactly like
dropping and keeping variables.
DROP/KEEP NODES
You can also drop/keep nodes of a specific network.
nwdrop webnwuse
nwkeep nwimport
nwclear nwexport
nwuse
nwsave
Import Sampsons monastery networks from this website:
http://vlado.fmf.uni-lj.si/pub/networks/data/ucinet/ucidata.htm
Drop the first four nodes from the networks: SAMPLK1, SAMPLK1,
SAMPLK3
Check the network size with nwset
Import any other Ucinet dataset from this website.
Edgelist format
Matrix format
nwtoedge
Edgelist format
Matrix format
nwtoedge
Edgelist format
Matrix format
nwfromedge
nwtoedge
nwfromedge
NODE ATTRIBUTES
acciaiuoli
salviati
ginori
medici
albizzi
barbadori
tornabuoni
ridolfi
guadagni
castellani lamberteschi
strozzi
bischeri
peruzzi
. webnwuse florentine
. nwplot flomarriage, lab
. nwplotmatrix flomarriage, lab
. nwplotmatrix flomarriage, sortby(wealth) label(wealth)
ANIMATION
. webnwuse klas12
. nwmovie klas12_wave1-klas12_wave4
. nwmovie _all, colors(col_t*) sizes(siz_t*) edgecolors(edge_t*)
nwplot
nwplotmatrix
nwmovie
48 3
10
10
55
103
20
27
48
146 36
32
49
8
44
42
seat = 0 seat = 1
wealth = 3 wealth = 146
sex = 1 sex = 2
delinq1 = 0 delinq1 = 4
_day2
Session X
Examine networks
Dyads
Triads
Simmelian ties
Components
SUMMARIZE
SUMMARIZE
DENSITY
The density of a network is defined as the proportion of actually
observed ties among the potentially observable ones.
Remember, in a directed, binary network with ) actors there
could be ) ) 1 ties. In an undirected network, there could
be ) ) 1 /2 ties.
5 5
@A)B,C2 = = 0.416
4 41 12
DENSITY
We could also calculate density from the dyad census. Remember,
M = mutual dyads, A = asymmetric dyads.
Actually observed ties are 2G + I
Potential ties are ) ) 1
2G + I
@A)B,C2 =
) )1
5 5
@A)B,C2 = = 0.416
4 41 12
RECIPROCITY
The reciprocity of a network is defined as the proportion of actually
reciprocated ties among the potentially reciprocable ones.
Remember, M = mutual dyads, A = asymmetric dyads.
Actually reciprocated dyads are 2G.
Potentially reciprocated dyads are 2G + I
2G
JAK,LJMK,C2 =
2G + I
2
JAK,LJMK,C2 = = 0.4
5
OBTAIN TIE VALUES
OBTAIN TIE VALUES
OBTAIN TIE VALUES
OBTAIN TIE VALUES
TABULATE NETWORK
TABULATE TWO NETWORKS
TABULATE NETWORK
AND ATTRIBUTE
seat = 0 seat = 1
nwsummarize
nwtabulate
Clear all data from current memory with nwclear.
Load the gang data from the nwcommands-Server using webnwuse.
List the networks in this file (either nwds or nwset will do).
Summarize all networks in this dataset with nwsummarize.
Clear all data from current memory with nwclear.
Load the gang data from the nwcommands-Server using webnwuse.
Show how many ties in the gang network are between nodes with
different Birthplace.
DYAD
A dyad is a pair of actors ,, - in the network, plus the
configuration of the tie variables 234 , 243 between them.
In a directed, binary network, there are ) ) 1 tie variables
located in ) ) 1 /2 dyads.
Dyads can be of three types:
M: mutual
A: asymmetric
N: null
DYAD
A dyad is a pair of actors ,, - in the network, plus the
configuration of the tie variables 234 , 243 between them.
In an undirected, binary network, there are ) ) 1 /2 tie
variables located in ) ) 1 /2 dyads.
Dyads can be of two types:
M: mutual
N: null
DYAD CENSUS
We can describe a network by counting the number of mutual,
asymmetric and null dyads. It is like taking a fingerprint of a
network.
isomorph
transitive non-transitive
k k
i j i j
TRIAD
A triad is cyclical when there is a tie ,, - , another one -, N and a
third one back N, , . Cyclicity indicates the absence of hierarchy!
non-cyclical cyclical
transitive non-transitive
k k
i j i j
TRIAD
We can describe a triad as before by counting the number of
mutual, asymmetric, and null dyads plus (where necessary) a
distinguishing letter.
1 1 1
1
TRANSITIVITY
The transitivity of a network gives you an idea about you how
locally connected the network is. It is defined as the proportion of
actually observed transitively closed triples ,, -, N of nodes among
the observed potentially closed paths of length 2 from , to j via N.
M = 1, A = 2, N = 0, Down M = 1, A = 1, N = 1 Down
120D 111D
Load the glasgow data from the nwcommands-Server using
webnwuse
Calculate the triad census for all networks
SIMMELIAN TIE
A Simmelian tie is a reciprocally connected pair with mutual
ties to third parties and hence it is an edge embedded in a
clique or triple (see Krackhardt 1998).
Tie between A and B is Simmelian when it is reciprocated (tie
from B to A) and then both A and B have a reciprocal
relationship to a third actor D.
B A
C F
D E
_simmelian = 0 _simmelian = 1
SIMMELIAN TIE
_simmelian = 0 _simmelian = 1
nwsimmelian
COMPONENTS
A component of a network is a subgraph of connected nodes
(that does not mean they all need to be connected with each
other).
Isolate nodes form their own component.
COMPONENTS
COMPONENTS
EXTRACT LARGEST
COMPONENT
largest
component
EXTRACT LARGEST
COMPONENT
nwcomponents
nwgen
Load the florentine data from the nwcommands-Server using webnwuse
Extract the largest component of the network flomarriage.
Plot the largest component only.
Session XI
Distance and paths
Distance distribution
Shortest paths
Bridges
Kevin Bacon ?
http://oracleofbacon.org/
Paul Erds ?
http://academic.research.micros
oft.com/VisualExplorer
DISTANCE
Length of a shortest connecting path defines the (geodesic)
distance between two nodes.
DISTANCE
How can we calculate the
distance?
ln ) ) = number of nodes
ln N N = average degree of nodes
DISTANCE
1 2
5
3 4
0 1 1 2 2
1 0 2 1 1
@,BCW)KAB = 1 2 0 3 3
1 2 2 0 3
2 1 3 1 0
0 1 1 2 2
1 0 2 1 1 1 2
1 2 0 3 3 5
1 2 3 0 3 3 4
2 1 3 1 0
DISTANCE DISTRIBUTION
Networks can have the same average shortest path length,
but still be vastly different from each other.
distance
0 1 1 2 2 10
1 0 2 1 1 8
1 2 0 3 3 6
freq
1 2 3 0 3
4
2
2 1 3 1 0 0
1 2 3 4
DISTANCE
DISTANCE
PATHS
pazzi pucci
acciaiuoli
salviati
ginori
medici
albizzi
barbadori
tornabuoni
ridolfi
medici? bischeri
peruzzi
PATHS
pazzi pucci
acciaiuoli
salviati
ginori
medici
albizzi
barbadori
tornabuoni
ridolfi
guadagni
castellani lamberteschi
strozzi
bischeri
peruzzi
PATHS
peruzzi tornabuoni
strozzi
salviati
ridolfi
pucci
pazzi
castellani
lamberteschi
guadagni
ginori
bischeri
barbadori albizzi
acciaiuoli
medici
PATHS
pazzi pucci
acciaiuoli
salviati
ginori
medici
albizzi
barbadori
tornabuoni
ridolfi
guadagni
castellani lamberteschi
strozzi
bischeri
peruzzi
mypath_1 = 0 mypath_1 = 1
PATHS OF SPECIFIC
LENGTH
nwgeodesic
nwpath
nwplot
What is the average shortest path length and
what is the distance distribution?
1 2
3 4
What is the average shortest path length and
what is the distance distribution?
pazzi pucci
salviati acciaiuoli
medici barbadori
strozzi
guadagni
peruzzi
bischeri
lamberteschi
BRIDGES
pazzi pucci
salviati acciaiuoli
medici barbadori
strozzi
guadagni
peruzzi
bischeri
lamberteschi
BRIDGES
pazzi pucci
salviati acciaiuoli
medici barbadori
strozzi
guadagni
peruzzi
bischeri
lamberteschi
BRIDGES
pazzi pucci
salviati acciaiuoli
medici barbadori
strozzi
guadagni
peruzzi
bischeri
lamberteschi
LOCAL BRIDGES
Local bridges are ties between two nodes in a network
that are the shortest route by which information might
travel from those connected to one to those connected to
the other.
If removed, a local bridge (between nodes A and B) would
increase the distance between two nodes A and B by at
least 2.
The length by which the removal of a local bridge
increases the distance between two nodes is called the
span of the local bridge.
LOCAL BRIDGES
LOCAL BRIDGES
LOCAL BRIDGES
pazzi pucci
salviati acciaiuoli
medici barbadori
strozzi
guadagni
peruzzi
bischeri
lamberteschi
LOCAL BRIDGES
pazzi pucci
salviati acciaiuoli
medici barbadori
strozzi
guadagni
peruzzi
bischeri
lamberteschi
LOCAL BRIDGES
pazzi pucci
salviati acciaiuoli
medici barbadori
strozzi
guadagni
peruzzi
bischeri
lamberteschi
LOCAL BRIDGES
pazzi pucci
salviati acciaiuoli
medici barbadori
strozzi
guadagni
peruzzi
bischeri
lamberteschi
nwbridge
Session XII
Network neighbors
Attributes of neighbors
FLORENTINE FAMILIES
salviati acciaiuoli
medici barbadori
strozzi
guadagni
peruzzi
bischeri
lamberteschi
salviati acciaiuoli
medici barbadori
strozzi
guadagni
peruzzi
bischeri
lamberteschi
Getting jobs
c
Better informed
Higher status
b a d
e
CENTRALITY
Getting jobs
Better informed
Higher status
What is well-connected?
DEGREE CENTRALITY
Degree centrality
We already know this. Simply the number of incoming/outgoing
ties => indegree centrality, outdegree centrality
How many ties does an individual have?
e e
^`abcaa , = d 234
4f8
^`abcaa W = 4 c
^`abcaa g = 1
^`abcaa K = 1 b a d
e
DEGREE
The indegree ,)@AZ Y of node Y is simply the number of ties
that point towards Y.
The outdegree MhC@AZ Y of node Y is simply the number of
ties that point away from Y.
indeg(1) = 3 indeg(3) = 1
1 2 outdeg(1) = 2 outdeg(3) = 1
3 4 indeg(2) = 1 indeg(4) = 1
outdeg(2) = 2 outdeg(4) = 1
DEGREE DISTRIBUTION
indegree outdegree
4 4
3 3
freq
freq
2 2
1 1
0 0
1 2 3 1 2 3
indeg(1) = 3 indeg(3) = 1
1 2 outdeg(1) = 2 outdeg(3) = 1
3 4 indeg(2) = 1 indeg(4) = 1
outdeg(2) = 2 outdeg(4) = 1
CLOSENESS CENTRALITY
Closeness centrality
How close is an individual (on average) from all other individuals?
Farness
How many steps (on average) does it take an individual to reach all
other individuals?
e
1 -,
iWJ)ABB , = d S34
j1 S34 = shortest path
4f8 between i and j
FARNESS
Farness
e
1
iWJ)ABB , = d S34
j1
4f8
1 c
iWJ)ABB W = 1 + 1 + 1 + 1 = 1
4
1 7
iWJ)ABB g = 1 + 2 + 2 + 2 = b a d
4 4
e
CLOSENESS CENTRALITY
1
^kl_ma:amm , =
iWJ)ABB ,
1 c
^kl_ma:amm W = 1/ 1+ 1+ 1+1 = 1
4
1 4
^kl_ma:amm g = 1/ 1+ 2+2+2 = b a d
4 7
e
BETWEENNESS CENTRALITY
Betweeness centrality
How many shortest paths go through an individual?
^=anoaa::amm W = 6 c
^=anoaa::amm g = 0
b a d
e
BETWEENNESS CENTRALITY
Betweeness centrality
How many shortest paths go through an individual?
b a d
e
Give each shortest path a weight inverse to
how many shortest paths there are
between two nodes.
nwdegree
nwbetween
nwevcent
nwcloseness
nwkatz
What is the closeness centrality of node a?
a
What is the closeness centrality of node a?
1 3
=
1+2+2 5
3
a
CENTRALIZATION
How equally/unequally distributed are the
centrality scores of all individuals?
3 ^p rWs ^p ,
^p =
max Bhr
CENTRALIZATION
80
80
80
60
60
60
40
40
40
20
20
20
0
.04 .045 .05 .055 .06 .065
.06 .07 .08 .09 .1 .11 .06 .08 .1 .12 .14
Weight centralization
Out-strength centralization In-strength centralization
95% CI fitted
95% CI fitted 95% CI fitted
Goals
Goals Goals
Grund, T. (2012) Network Structure and Team Performance: The Case of English
Premier League Soccer Teams. Social Networks, Vol. 34, Issue 4, pp. 682-690.
nwdegree
nwbetween
nwcloseness
nwsummarize
Load the florentine data with webnwuse
Calculate betweenness centrality for the networks flomarriage and
flobusiness.
Calculate betweenness centralization for the flomarriage network.
Load the glasgow data from the nwcommands-Server using webnwuse
Calculate indegree centralization for network glasgow1
Tabulate and visualize the indegree distribution (check out tab and hist)
Calculate betweenness centralization for network glasgow2
Session XIV
Change networks
Symmetrize networks
GANG NETWORK
TABULATE NETWORK
RECODE TIE VALUES
FLORENTINE FAMILIES
pazzi pucci
pazzi pucci
acciaiuoli
acciaiuoli salviati
salviati
ginori
ginori
medici medici
albizzi albizzi
barbadori barbadori
tornabuoni tornabuoni
ridolfi ridolfi
guadagni guadagni
bischeri bischeri
peruzzi peruzzi
AFFILIATION DATA
SETTING AFFILIATION DATA
Level 1
Level 1
AFFILIATION DATA
Peter
Tim
Andreas Oxford
Humboldt Richard
Cologne
Clemens
Thomas
UCD
LEVEL 1 PROJECTION
Peter Humboldt
Tim
Andreas Oxford
Humboldt Richard Oxford Cologne
Cologne
Clemens
Thomas
UCD
UCD
LEVEL 2 PROJECTION
Peter
Tim
Andreas Clemens
Tim
Thomas Thomas
UCD
Peter
ONE-MODE PROJECTION
LEVEL 1 PROJECTION
Humboldt
Oxford Cologne
UCD
LEVEL 2 PROJECTION
Tim
Andreas Clemens
Richard
Thomas
Peter
PROJECTION WEIGHTS
PROJECTION WEIGHTS
(DIS)SIMILARITIES
The dissimilarity between two nodes reflects how dissimilar these
nodes are regarding the ties they have to other nodes (tie
vectors). Different distance measures can be used.
Euclidean distance
The Euclidean distance between two tie vectors is equal to the
square root of the sum of the squared differences between them.
That is, the strength of actor A's tie to C is subtracted from the
strength of actor B's tie to C, and the difference is squared. This
is then repeated across all the other actors (D, E, F, etc.), and
summed. The square root of the sum is then taken.
EUCLIDEAN DISTANCE
-1
0
-1
0 0 0
How
(dis)similar are
y y
x34 = d 23O 24O + 2O3 2O4 , N , W)@ N - nodes 4 and 5
O in their tie
vectors?
EUCLIDEAN DISTANCE
-1
0
-1
0 0 0
= 1 y+ 1 y = 2
y y
x34 = d 23O 24O + 2O3 2O4 , N , W)@ N -
O
What is the Euclidean dissimilarity
between nodes 1 and 2?
1 2
3 4
What is the Euclidean dissimilarity
between nodes 1 and 2?
1 2
= 28z 2yz y + 28{ 2y{ y
3 4 = (1 0)y +(0 1)y = 2
MANHATTEN DISTANCE
1
0
1
0 0 0
=2
0
0
0
1 1 0
2 1
= 1 =
4 2
.08
Calculate the dissimilarities between
nodes using the manhatten distance.
.06
Density
Transform the dissimilarities you just
.04
generated in a long edgelist and make
a histogram of the dissimilarities (see
.02
graph)
0
0 10 20 30 40 50
_dissimilar
Session XVII
Expand networks
Homophily
EXPAND VARIABLE TO
NETWORK
Variable Expanded network
acciaiuoli tornabuoni
salviati
salviati
pucci
acciaiuoli
barbadori
medici
albizzi
medici pazzi
castellani
ridolfi
tornabuoni ridolfi lamberteschi
strozzi albizzi
guadagni
ginori ginori
peruzzi
guadagni strozzi
bischeri barbadori
bischeri
peruzzi
lamberteschi castellani
seat = 0 seat = 1
EXPAND VARIABLE TO
NETWORK
EXPAND VARIABLE TO
NETWORK
EXPAND VARIABLE TO
NETWORK
EXPAND VARIABLE TO
NETWORK
mode(absdist)
= abs(20 10) = 10
Load the florentine data from the nwcommands-Server using
webnwuse
Calculate for each node in the flomarriage network the average
wealth of those network neighbors who have the same value on
the variable seat (use nwexpand and then nwcontext).
Question: Are co-offending ties between gang members
from the same ethnicity more likely than ties between gang
members from different ethnicities?
Match(ethnicity) 63
Match(British) 23
Match(Jamaican) 14
Match(Somali) 6
Match(West African) 20 Caribbean East Africa UK West Africa
ADJACENCY MATRIX
0 1 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 1 0
1 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 1 0 1 0 1 0 0
0 0 0 0 0 1 0 1 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
50%
50%
GROUP SIZE MATTERS
50% 50%
50%
50%
50% 75%
50%
25%
50% 75%
50%
25%
%
% = =
Lets say we have 20 red, 10 green and 10
yellow individuals. Furthermore, there are 100
directed network ties. How many of these ties
will be between similarly colored individuals
(when ties are assumed to be assigned
randomly)?
Ties between two red individuals:
100 * 20/40 * 20/40 = 25
Total = 37.5
ETHNICITY DISTRIBUTION
Ethnicity distribution
Observed count
Total 54
British 24
Jamaican 12
Somali 6
West African 12
Caribbean East Africa UK West Africa
Total 54 24 12 6 12
British 24 (24/54)*(24/54)
(12/54)*(24/54)
Jamaican 12 (12/54)*(12/54)
(6/54)*(24/54)
Somali 6 (6/54)*(6/54)
(12/54)*(24/54)
West African 12 . (12/54)*(12/54)
ETHNICITY IN DYADS
Total 54 24 12 6 12
British 24 (24/54)*(24/54)
(12/54)*(24/54)
Jamaican 12 (12/54)*(12/54)
(6/54)*(24/54)
Somali 6 (6/54)*(6/54)
(12/54)*(24/54)
West African 12 . (12/54)*(12/54)
= 133*(24/54)*(24/54) = 26.27
HOMOPHILY
Observed and expected (for networks with same density) match statistics
Observed Expected
Match(ethnicity) 63 41.05
Match(British) 23 26.27
Match(Jamaican) 14 6.57
Match(Somali) 6 1.64
Match(West African) 20 6.57
nwtabulate
nwcorrelate
nwpermute
nwexpand
Load the florentine data from the nwcommands-Server
using webnwuse
Calculate the transitivity of the flomarriage network
(nwsummarize, detail)
Generate 100 undirected random networks with exactly the
same density as the flomarriage network.
Calculate the transitivity scores for the random networks
and plot their distribution.
Compare the observed score against the simulated scores.
Session XVIII
Simulating networks
RANDOM NETWORK
Each tie has the same probability to exist, regardless of any other ties.
LATTICE RING LATTICE
Step 1: m = 2, prob = 1 1 2
Step 2: m = 2, prob = 1 1 2
3 4
PREFERENTIAL ATTACHMENT
MECHANISMS
Each step one new node 1 2
enters and forms m = 2
new ties to the existing
nodes. With prob = 1,
these two new ties are 1 2
uniformly sampled, i.e.
each existing node has
the same probability to
become friends with the 3
new node.
1 2
3 4
PREFERENTIAL ATTACHMENT
MECHANISMS
When prob < 1, some 1 2
new ties are more likely
than other new ties. The
weights are proportional
to the current indegree of 1 2
the established nodes.
3
1 2
1 2
3 4
PREFERENTIAL
ATTACHMENT NETWORK
.4
.3
Density
.2.1
0
0 2 4 6 8
_in_degree
General strategy:
1. Calculate a network statistic that you are interested in.
2. Think about the properties of the network that you want to
conserve.
3. Generate many random networks that have the same
properties as the observed network.
4. Calculate the network statistic on these conditional random
networks and compare this baseline distribution against the
actually observed network statistic in the observed network.
1 Test-statistic
e.g. number of triads,
number of reciprocal ties,
number of ties between
similar individuals
2 Distribution of test-
statistic under null
hypothesis
e.g. distribution of triads
we can expect when there
is no clustering
Question: Is there more or less
clustering (triads) than expected?
transitivity score is 0.36
Is this a lot?
Distribution of test-
statistic under null
2 hypothesis
CJW)B,C,Y,C2c:`_ = ??
(NON-)RANDOMNESS
=0 =1
CONDITIONAL UNIFORM
GRAPHS
Generate random networks with the same size, density, or dyad census
as the observed network and then calculate the test-statistic (transitivity)
on these conditional uniform graphs.
y
Reciprocity of the network =
y
y8z
Density = = = 0.416
{({8) 8y
y8 y
Reciprocity = = = 0.4
M=1, A=2, N=1 (121) y8z
TIE PROBABILITIES
i ? j
There are 12 possible ties. And 5 of these 12 are realized. That means :
PJ 234 = 1 = = 0.416
8y
CONDITIONAL UNIFORM
GRAPHS
Generate random networks with the same size, density, or dyad census
as the observed network and then calculate the test-statistic (transitivity)
on these conditional uniform graphs.
15
transitivity of
the gang
10
Density
network
5
transitivity of
0
.05 .1 .15 .2
random networks transitivity
density as the
gang network
. webnwuse gang2
. nwsummarize gang, detail
transitivity of
the gang
network
. webnwuse gang2
. nwsummarize gang, detail
density of the
observed
network
. nwclear
. nwrandom 54, prob(.092) undirected ntimes(20)
. nwsummarize _all, detail save(myfile)
...
Match(ethnicity) 63
Match(British) 23
Match(Jamaican) 14
Match(Somali) 6
Match(West African) 20 Caribbean East Africa UK West Africa
ADJACENCY MATRIX
0 1 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 1 0
1 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 1 0 1 0 1 0 0
0 0 0 0 0 1 0 1 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
50%
50%
GROUP SIZE MATTERS
50% 50%
50%
50%
50% 75%
50%
25%
50% 75%
50%
25%
%
% = =
Lets say we have 20 red, 10 green and 10
yellow individuals. Furthermore, there are 100
directed network ties. How many of these ties
will be between similarly colored individuals
(when ties are assumed to be assigned
randomly)?
Ties between two red individuals:
100 * 20/40 * 20/40 = 25
Total = 37.5
ETHNICITY DISTRIBUTION
Ethnicity distribution
Observed count
Total 54
British 24
Jamaican 12
Somali 6
West African 12
Caribbean East Africa UK West Africa
Total 54 24 12 6 12
British 24 (24/54)*(24/54)
(12/54)*(24/54)
Jamaican 12 (12/54)*(12/54)
(6/54)*(24/54)
Somali 6 (6/54)*(6/54)
(12/54)*(24/54)
West African 12 . (12/54)*(12/54)
ETHNICITY IN DYADS
Total 54 24 12 6 12
British 24 (24/54)*(24/54)
(12/54)*(24/54)
Jamaican 12 (12/54)*(12/54)
(6/54)*(24/54)
Somali 6 (6/54)*(6/54)
(12/54)*(24/54)
West African 12 . (12/54)*(12/54)
= 133*(24/54)*(24/54) = 26.27
HOMOPHILY
Observed and expected (for networks with same density) match statistics
Observed Expected
Match(ethnicity) 63 41.05
Match(British) 23 26.27
Match(Jamaican) 14 6.57
Match(Somali) 6 1.64
Match(West African) 20 6.57
nwtabulate
nwcorrelate
nwpermute
nwexpand
Load the florentine data from the nwcommands-Server
using webnwuse
Calculate the transitivity of the flomarriage network
(nwsummarize, detail)
Generate 100 undirected random networks with exactly the
same density as the flomarriage network.
Calculate the transitivity scores for the random networks
and plot their distribution.
Compare the observed score against the simulated scores.
Question: Is there more or less correlation
between these two networks than expected?
Padgett, J. and Ansell, C. (1993) Robust Action and the Rise of the Medici, 1400-1434.
American Journal of Sociology 98: 1259-1319
GRAPH CORRELATION
Network 1 Network 2
a c a c
b b
a b c a b c
a 0 1 0 a 0 0 0
b 0 0 1 b 0 0 1
c
1 0 0 c
1 1 0
GRAPH CORRELATION
Network 1
b
row col net1
a b 1
a b c a c 0
a 0 1 0 b a 0
b 0 0 1 = b c 1
c
1 0 0 c a 1
c b 0
GRAPH CORRELATION
Network 1 Network 2
a c a c
b b
net1 net2
1 0
a b c 0 a b c 0
a 0 1 0 0
a 0 0 0 0
b 0 0 1 = 1 b 0 0 1 = 1
c
1 0 0 1 c
1 1 0 1
0 1
GRAPH CORRELATION
KMJJ_=m = 0.372
Is this a lot?
Distribution of test-
statistic under null
2 hypothesis
KMJJc:`_ =? ?
QUADRATIC ASSIGNMENT
PROCEDURE
Network structure is
controlled for. Keeps
dependencies.
PERMUTATION TEST
2 1
1 3 permutation 3 4
4 2
- 1 0 1
- 1 1 1
1 - 1 1
0 - 0 0
0 0 - 0
1 1 - 0
0 0 0 -
0 0 0 -
GRAPH CORRELATION
KMJJ_=m = 0.372
GRAPH CORRELATION
KMJJ = 0.034
GRAPH CORRELATION
KMJJ = 0.101
GRAPH CORRELATION
Corr(flobusiness, flomarriage)
4
3
density
2
1
0
-.2 0 .2 .4
correlation
based on 100 QAP permutations of network flobusiness
Corr(gang, same_Birthplace)
10
density
5
0
= +
= +
7 3 1 0 1 0 0 0 0 0
2 0 0 0 1 0 1 1
2 1
3 1 0 0 0 0 0 0
6
4 1 1 0 0 1 1 0
4
5 0 0 0 1 0 0 0
6 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0
5
1
7
ADJACENCY MATRIX
7 3 1 0 1 0 0 0 0 0
2 0 0 0 1 0 1 1
2 1
3 1 0 0 0 0 0 0
6
4 1 1 0 0 1 1 0
4
5 0 0 0 1 0 0 0
6 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0
5
1
7
ADJACENCY MATRIX
We write 234 = 1 if actors , and - are related to each other (i.e.,
if ,, - %), and 234 = 0 otherwise
The matrix 7 is called the adjacency matrix and is a convenient
representation of a network.
288 28:
7=
2:4 2:=
LOGISTIC REGRESSION
Dependent variable = binary (1 or 0)
Logistic regression: Pr 23 = 1 = L3
L3
logit L3 = ln = + d O sO3
1 L3
1
Logistic function: logistic s =
1 + Ap
1
Logistic function: logistic s =
1 + Ap
Odds-ratio: A
logistic function
QAP REGRESSION
Sender and
receiver ID
DYAD-LEVEL REGRESSION
Dyad values
DYAD-LEVEL REGRESSION
Independent
variables
QAP REGRESSION
Grund,Ethnic
Grund, T. and Densley, J. (2012) T. and Densley, J. (2012)
Heterogeneity inEthnic Heterogeneity
the Activity in the Activity
and Structure of a and Structure
Black Streetof a
Black Street Gang. European Journal of Criminology, , Vol. 9, Issue 3, pp. 388-406.
Gang. European Journal of Criminology, 9(3), 388-406.
nwqap
Load the klas12b data from the nwcommands-Server using
webnwuse
Use nwqap to show if ties in the klas12b_wave1 network are
more likely between individuals with the same sex.
-sex pers
Now, consider not only similarity in sex, but also absolute
difference in age. Are ties between individuals who are similar
when it comes to age more likely?
Encore
EXAMPLE: OUTDEGREE
Simply the number of outgoing ties for each node.
How many ties friends does an individual nominate?
John
4f8
Susan
most nwcommands
1. Parse network
Parse networks.
Populate local
netname.
EXAMPLE: OUTDEGREE
Obtain
adjacency matrix
net
EXAMPLE: OUTDEGREE
Functionality
_NWSYNTAX
Parse networks (and obtain some meta-information)
_NWSYNTAX
_NWSYNTAX
Unabbreviate
network list
_NWSYNTAX
Obtain network
meta-information
_NWSYNTAX
Populate locals
with meta-
information and
parsed network
list
NWNAME
Obtain meta-information
NWNAME
NWNAME
NWNAME
Get ID of a
network
NWNAME
Get meta-
information
NWTOMATA
Obtain adjacency matrix
NWTOMATA
NWTOMATA
Parse network
and populate
local id
NWTOMATA
Make copy of
adjaceny matrix
_nwsyntax
nwname
nwtomata
mata
INDEPENDENCE
Independence means that whatever you observe as outcome
variable in one case does not depend on the value the
outcome variable has in other cases.
I = I I
= , to be estimated
A score given to our
network y using some
parameters and the
m 7 network features s of y
A
=7 =
K
= , to be estimated
A score given to our
network y using some
m 7 parameters
A
=7 =
K
m 7
A
=7 =
K
If parameter vector is set to [0,0,0], then, all unique classes of isomorph
networks get the same score. All possible combination of network features
have the same probability
m 7
A
=7 =
K
If parameter vector is not [0,0,0], then, some networks get a higher
score than others, which means, we assume that they are more likely to be
drawn.
m 7
A Although we only look at classes
=7 =
K of networks defined by their
features, there are still too many
of them to calculate this.
m 7
=7 A
m 7
A
=7 =
K
m 7
=7 A
Probability that
there is a tie from i
to j. Given, n actors AND the rest
of the network, excluding the
dyad in question!
ERGM: INTEPRETATION
ERGMs ultimately give you an estimate for various
parameters O , which mean
m m m
=7 A
EXAMPLE
m m m
=7 A
7 7=
Ba`bam 4 3
Bymncm 5 3
Bnc3:blam 1 0
EXAMPLE
we do not know the proportionality constant
7 7=
{ 8
7 A
z z
7= A
EXAMPLE
although we do not know the proportionality constant we can calculate
the ratio between the two probabilities!
7 7=
{ 8
7 A
=
7= A z z
y
How much more =A
likely is 7 in
contrast to 7= ?
EXAMPLE
so, suppose in a larger network the estimation gave the
following parameters:
7 y
=A
7=
8. y .8 .{
1
=A = A 8.
5.5
i.e. the middle tie is about 5.5 times
likely NOT to exist as to exist (given
the rest of the network)
ERGM FEATURES
Think of ERG models as a probability distribution on a (huge)
space of all possible networks.