Assignment 4. Transitivity & Hierarchy.

(Most of the solutions were embedded in the file…here I expand on those that were not, see the BOLD stuff)

1)  The key to measuring transitivity in a network is identifying what proportion of all possible transitive relations are transitive. That is:

· 
first identify every place that i sends to j, and j sends to k.

·  it is possible for a transitive triad to form if i were then to send to k.

The transitivity ratio is the number of transitive triads over the number of possible transitive triads.

The first condition is a simple two-path, and we know how to find all paths of length two using the reach program. Thus, the number of possible transitive triples = the number of two-paths in the network. The number of transitive triples is equal to the number of two paths that are also one-paths, that is every place in the network where there is both a two path and a direct arc. We can calculate this from the adjacency matrix using the following formula:

Transitive relations = T3 = Sum(A2#A)

Intransitive relations = I3 = Sum(A2) – Trace(A2);

Transitivity ratio = T3 / I3.

Where the # sign means element-wise multiplication and the trace is the sum of the diagonal (to get rid of two paths from ego to ego)

To do this in IML, you would use the lines:

T3 = sum((X**2)#X);

I3 = sum(inmat**2)-trace(inmat**2);

Tranrat = T3/I3;

Print Tranrat;

where X is the adjacency matrix you have entered into IML.

Use this formula to write a SAS IML program that calculates the transitivity ratio for the graduate student ‘help’ network. The program for reading the network & creating PAJEK files is in osugrd_read.sas.

The transitivity ration for help should be 0. 3352228

(see triads1.sas for a sample program on diff data).

Compare the transitivity ratio for (a) the high school friendship to the graduate school best friendship, and the grad school best friendship to the grad school help.

Hint. This will require two separate IML statements: one for the graduate school numbers, a second for the high school numbers.

Grad School help is above (0.335), Grad School Best Friend:

BestFreind / 0.3271303

Highschool friendship: 0.2300127

2)  To get at the global structure of a network, we would want to look at the triad census: the frequency count of every type of triad in the network. Calculate the triad census by hand for the small network below.


To do this, you need to categorize each triad i.e.:

Node 1 / Node 2 / Node 3 / Type
1 / 2 / 3 / 021D
1 / 2 / 4 / 012
1 / 2 / 5 / 120C
1 / 3 / 4 / 111D
1 / 3 / 5 / 210
1 / 4 / 5 / 201
2 / 3 / 4 / 102
2 / 3 / 5 / 111D
2 / 4 / 5 / 111D
3 / 4 / 5 / 300

Giving a triad census of:

003 0

012 1

102 1

021D 1

021U 0

021C 0

111D 3

111U 0

030T 0

030C 0

201 1

120D 0

120U 0

120C 1

210 1

300 1

3)  Of course, we can’t calculate the triad census by hand for networks of any real size. Using PAJEK, calculate the triad census for (a) the prison network, (b) the high-school network, and (c) the osugrad_help network. To do this you need to:

a)  Create the PAJEK files. You may have already done so and have them saved (from the other homeworks) Else, use the same programs we’ve used in the earlier homeworks.

b)  Once you have the PAJEK file open, go to: INFO > NETWORK > TRIADIC CENSUS

and you will get a copy of the census.

Prison:

------

1 - 003 39221

------

2 - 012 5860

3 - 102 2336

4 - 021D 61

5 - 021U 80

6 - 021C 103

7 - 111D 105

8 - 111U 69

9 - 030T 13

10 - 030C 1

11 - 201 12

12 - 120D 15

13 - 120U 7

14 - 120C 5

15 - 210 12

16 - 300 5

------

Sum (2 - 16): 8684

Fake School:

------

Type Number of triads (ni) Expected (ei)

------

1 - 003 11856124 11688510.46

2 - 012 484386 812746.29

3 - 102 172755 4709.44

4 - 021D 1844 4709.44

5 - 021U 2147 4709.44

6 - 021C 2471 9418.89

7 - 111D 2075 109.16

8 - 111U 1731 109.16

9 - 030T 265 109.16

10 - 030C 12 36.39

11 - 201 571 0.63

12 - 120D 192 0.63

13 - 120U 156 0.63

14 - 120C 105 1.26

15 - 210 242 0.01

16 - 300 95 0.00

------

4)  It turns out that the triad census contains a lot of information about the graph (See wasserman and faust, chapter 14). We can use this information by looking at frequencies of combinations of triads. In real data, however, the number of triads is somewhat random, so we want to control for the distribution in a “similar” random graph. The program triads1.sas shows you how to do this. The line

s_tstat=tstat(tcen,1);

In the program will calculate a set of static’s for the triad census. The output will look something like this (This is from a different graph ):

Triad Census

T TPCNT PU EVT VARTU STDDIF

003 21 0.25 0.2157 18.118 3.9576 1.4489

012 26 0.3095 0.3235 27.176 11.838 -0.342

102 11 0.131 0.1294 10.871 4.5459 0.0607

021D 1 0.0119 0.0347 2.9118 2.5215 -1.204

021U 5 0.0595 0.0347 2.9118 2.5215 1.3151

021C 3 0.0357 0.0693 5.8235 4.2626 -1.368

111D 2 0.0238 0.0616 5.1765 4.0205 -1.584

111U 5 0.0595 0.0616 5.1765 4.0205 -0.088

030T 3 0.0357 0.0126 1.0588 0.8292 2.1317

030C 1 0.0119 0.0042 0.3529 0.3274 1.1308

201 1 0.0119 0.0185 1.5529 1.0074 -0.551

120D 1 0.0119 0.0063 0.5294 0.4946 0.6691

120U 1 0.0119 0.0063 0.5294 0.4946 0.6691

120C 1 0.0119 0.0126 1.0588 0.9196 -0.061

210 1 0.0119 0.0084 0.7059 0.5754 0.3877

300 1 0.0119 0.0006 0.0471 0.0448 4.5013

The first column is the standard triad census – exactly what PAJEK produces (in fact, you could use the results from PAJEK as input into the statistics function) – and gives you the count of the number of triads of each type (here I am using the small network that is the class logo), so we see there are 26 type 012 triads in this network. The second column gives you the percent, the third the probability of observing a triad like this given the distribution of dyads, the next is the expected value and then the variance. The last column is the standardized difference between observed and expected, and is like a t-test (values > 2 indicate greater than chance numbers of the triad type). For example, we see that we observe 1 complete triad (T300) in the network, but would expect to observe .04 (i.e. none), giving us a t-value of 4.50, meaning that there are more complete triads in this network than we would expect by chance.

In addition to the counts and individual triad frequencies, we can get the weighted sum of particular types to see how closely it matches an ideal distribution (such as the rank-cluster system). To do this, we include a vector of zeros and ones that indicate whether the triad should be allowed or not in the ideal model, and calculate tau using the formula from the notes. The program calculates these lines for the rank cluster model and the transitivity model. A value of zero would indicate that the model does not fit any better than random, the larger the value the better the fit.

Run the program and discuss the observed distribution of triads for the school friendship network. What do the values of tau tell us? If you were to modify this program to run on the OSU graduate student network, what would you have to do? (You don’t need to do it, but it might be a nice challenge to see if you can).

Your output should look something like this:

Triad Census

T TPCNT PU EVT VARTU STDDIF

003 1.19E7 0.9466 0.9465 1.19E7 11440 6.3217

012 484386 0.0387 0.0387 485009 29196 -3.643

102 172755 0.0138 0.0138 172493 7645.5 3.0014

021D 1844 0.0001 0.0001 1652.1 1627.2 4.7577

021U 2147 0.0002 0.0001 1652.1 1627.2 12.269

021C 2471 0.0002 0.0003 3304.2 3204.8 -14.72

111D 2075 0.0002 0.0002 2352.2 2300.8 -5.779

111U 1731 0.0001 0.0002 2352.2 2300.8 -12.95

030T 265 212E-7 18E-7 22.491 22.476 51.153

030C 12 958E-9 599E-9 7.4969 7.4952 1.6448

201 571 456E-7 0.0001 834.59 811.63 -9.252

120D 192 153E-7 64E-8 8.0122 8.0108 65.006

120U 156 125E-7 64E-8 8.0122 8.0108 52.286

120C 105 838E-8 128E-8 16.024 16.019 22.231

210 242 193E-7 909E-9 11.381 11.375 68.378

300 95 758E-8 107E-9 1.3428 1.3426 80.83

Note the higher and lower than-chance expectations for certain traids. O30T is 52 times more likely than chance to appear, 300 is nearly 80 times more likely than chance. 030C, though rare, occurs essential at random (rule of thumb is that to be significant you should have a std dif > 2). 111U and 201 are unlikely to be found (likely indicating that when i and j agree on k as a friend, they nominate each other).

(The Covariance matrix really doesn’t help you much substantively. It is a statistical necessity for calculating tau, however.)

TAU_RC TAU_TR

19.005026 17.652434

These two values are very similar. The values indicate that the Rank-Cluster model fits marginally better than the transitivity model, but not by much. The substantive difference between the transitivity and rank cluster model is that the transitivity model implies a single, unified hierarchy, whereas the ranked cluster model allows multiple hierarchical tracks in a setting. This implies then, that any hierachy within the school is pretty unified.