A Guide for the Visually Perplexed

A Guide for the Visually Perplexed:

Visually Representing

Social Networks

Sean F. Everton

Stanford University

January 2004

A-1

Version .30

Introduction

Network analysts have long used sociograms (network diagrams) to visualize the networks they are analyzing. A common technique that analysts use to draft a sociogram is to construct it around the circumference of a circle. The circle helps organize the data, but the order in which analysts place the points is determined only by their attempt to keep the number of lines connecting the various points to a minimum. Typically, researchers using this technique engage in a trial-and-error drafting process until they reach an aesthetically pleasing result (Scott 2000). While such a process can make the structure of relations clearer, the relations between the sociogram’s points reflect no specific mathematical properties. The points are arranged arbitrarily and the distances between them are meaningless.

Not surprisingly, how social network data are spatially arranged in graphs influences how viewers perceive a social network’s structural characteristics (McGrath, Blythe, and Krackhardt 1997). Thus, if we wish to infer “something about the actual sociometric properties of a network, then the physical distance between points should correspond as closely as possible to the graph theoretical distances between them” (Scott 2000:148). To this end, researchers, in recent years, have developed a number of techniques (e.g., metric and non-metric multidimensional scaling, correspondence analysis, spring-embedded algorithms, etc.) that mathematically represent the points in space. This guide provides an overview on how to use these various techniques to visually represent one and two-mode networks.

It begins by first examining how to enter, manipulate and prepare social network data using Microsoft’s Access and Excel programs (Chapter 1). It then demonstrates how to perform initial network analysis in Ucinet (Borgatti and Everett 1997),[1] which is a network analysis software program. After preparing our data, it then looks at how to visually represent one-mode (Chapter 2) and two-mode (Chapter 3) networks using two visualization packages, Mage and Pajek.

Mage was developed as a device to be used in molecular modeling (Richardson and Richardson 1992). It produces elegant three-dimensional illustrations that appear as interactive computer displays. Researchers can rotate Mage images, turn parts of the displays on or off, use the mouse to select and identify various points of the network, and animate changes between different arrangements of objects.[2] Appendix A provides guidance for editing Mage files (kinemage) in order to take advantages of these features.

Pajek, which is Slovenian for “Spider,” is a network analysis and graph drawing program that has specifically been designed to handle extremely large data sets. It is still in its development stage and can be downloaded for noncommercial use free of charge from the Pajek web site.[3] An advantage of Pajek is that its developers are continually updating it, including more and more features that social network analysts use to explore social networks.[4]

After exploring how to visualize simple one and two-mode social networks, the manual then turns to more complex visualization issues. Chapter 4 explores how to visualize social networks over time, while Chapter 5 (forthcoming) looks at various block-modeling techniques available in Ucinet and Pajek.

Note: Version .42 of the manual corrects typographical errors and incorrect references to various figures throughout the manual. It also includes an updated glossary.

Version .42

1. Gathering and Preparing Social Network Data

We can gather and prepare social network data in a variety of ways. Here we use Microsoft Access 97 and Excel 97 in order to demonstrate how to gather and prepare the data of one- and two-mode networks.

1.1 Gathering and preparing one-mode social network data

One-mode networks consist of a single set of actors. They differ from two-mode networks in that two-mode networks consist of two sets of actors or one set of actors and one set of events. Actors can be people, groups, organizations, corporations, nation-states, etc. The connections (i.e., relations) between such actors can be friendship or kinship ties, material transactions such as business transactions, the import or export of goods, communication networks involving the sending or receiving of messages, etc.

An example of a one-mode network, one that we will use throughout this manual, is Padgett’s Florentine Families Network (Breiger and Pattison 1986; Padgett and Ansell 1993). Padgett and Ansell collected data on the marriage and business ties (i.e., relations) between 16 prominent Florentine families in 15th century Florence. Both sets of ties were nondirectional and dichotomous. A marital tie was determined to exist if a member of one family married a member of another family while a business tie was determined to exist if a member of one family granted credits, made a loan, or entered into a joint partnership with a member of another family (Wasserman and Faust 1994). For our purposes here we will use the marital tie data.

1.1.1 Gathering and manipulating one-mode social network data

Because of the interchangeability of Microsoft programs we can use either Access or Excel to enter social network data. Excel includes an “autocomplete” feature that compares the text you are typing into a cell with text already entered into the same column. If the same word has been used before, it then completes typing the entry for you. This feature increases accuracy (e.g., consistently spelling the same name the same way each time) and input time, so we recommend, when possible, that you enter social network data initially into Microsoft Excel. You can later import the Excel data into Access. Because we use relatively small networks as examples, it is actually quicker to enter them directly into Access. We use Excel here, however, in order to demonstrate the steps you will want to take with much larger datasets.

We begin by entering the Padgett data into Excel.[5] To do so we enter the data into two columns. As can be seen in Figure 1.1 the first column lists the 16 families while the second lists the families with which they have marital ties. Obviously, families with more than one marital tie will be listed more than once in the first column. For example, the Albizzi family has marital ties with the Ginori, Guadagni and Medici families, so it appears three times in the first column. If you look down the first column to the Guadagni family, you will note that it lists a marital tie with the Albizzi family. This is as it should be since the marital ties between the families are reciprocal.

In this dataset, the Pucci family has no marital ties with any of the other families. To record this in a way that we ultimately end up with a square matrix, we first have to list the Pucci family in column A with a blank cell next to it in column B. Then, we need to list the Pucci family in column B with a blank cell next to it in column A.

Figure 1.1:Padgett Data Entered into Microsoft Excel 97 Worksheet

After you finish entering the data, you will, of course, want to save it and exit Excel, so that you can move to the next step of importing it into Access.

1.2 Gathering and preparing two-mode social network data

Two-mode networks differ from one-mode networks in that rather than consisting of a single set of actors, they either consist of two sets of actors, or one set of actors and one set of events. Typically, researchers refer to them as affiliation networks, but they have also been referred to as membership networks, dual networks and hypernetworks (Faust 1997; Wasserman and Faust 1994). Affiliation networks are “non-dyadic because the affiliation relation relates each actor to a subset of events, and relates each event to a subset of actors” (Faust 1997:158).

An example of a two-mode network is Davis’s Southern Club Women (Breiger 1974; Davis, Gardner, and Gardner 1941). Davis and his colleagues recorded the observed attendance of 18 Southern women at 14 social events.

1.2.1 Gathering and manipulating two-mode social network data

As we did with the Padgett data, we enter the data into two columns.[6] However, in this case the form of the data differs in that the first column lists the women while the second lists the number of the event that they attended.

Figure 1.2:Southern Women Data Entered Into Microsoft Excel 97 Worksheet

It is important to note that each woman is listed separately for every event they attended. Thus, Laura is listed seven times (with the corresponding event number) because she attended seven different events (1, 2, 3, 5, 6, 7 & 8).

After we finish entering the data, we need to save it, so that we can then import it into Access. Because we import, manipulate, export and read two-mode data in the same way we do one-mode data, in what follows we illustrate the process with only one-mode data, but there is no reason why the same techniques cannot be applied to two-mode data.

1.3. Importing social network data into Access 97

The next step in the process is importing this data into Microsoft Access 97. When you first open Access you will see a dialog box that looks like the one in Figure 1.3. Because we are creating a new database, we will choose between the “Blank Database” or “Database Wizard” options. The former, as its name implies, opens up a blank database while the latter initiates a “wizard” that is quite helpful in setting up databases. It provides users with a series of “ready-made” databases that can be readily adapted for other purposes. Our purpose here, however, is not to provide an introduction to Access but simply to show how we can import and manipulate network data using Access. Thus, we will choose the “Blank Database” option. For those who are interested in learning more about Access, we suggest you consult the book, Sams Teach Yourself Access 97 in 21 Days (Eddy, Cassel, Goodling, and Stewart 1998). Once you have created a database, you will choose the option “Open an Existing Database,” which should appear in the list of files appearing just below this option.

Figure 1.3:Access’s Opening Dialog Box

After choosing the “Blank Database” option, you will see a screen that looks similar (but probably not identical) to the one that appears in Figure 1.4.

Figure 1.4:Access’s New Database Dialog Box

Figure 1.5:Database Window for Visualization Database

At this point you will want to give your file a name and then select the “Create” button. (Here we have given it the name “Visualization.”) Selecting this opens a new database window similar to the one shown in Figure 1.5. Under the “File” menu select “Get External Data.” This provides you with two choices: either to “Import” data or to “Link Files.” Select “Import.” This will bring up a dialog box (Figure 1.6) that allows you to first find the Excel spreadsheet you created earlier and then import it. Note that the box provides a number of criteria by which to locate your files. It even provides a “Find” function if you are unsure as to where you saved your Excel file. The important thing here, though, is that in the “Files of Type” box you have selected “Microsoft Excel.”

Figure 1.6:Access’s Import Dialog Box

Click on the “Import” button, and Access will bring up its Import Spreadsheet Wizard (see Figure 1.7). As you can see this wizard initially asks what Excel worksheets you want to import. Currently, we are only interested in the Padgett data, which in this case is the default that Access has selected.

Click on the “Next” button, which takes you to the next dialog box (see Figure 1.8) that asks whether the first row of the data contain column headings. In this case it does not, so we do check the box and move on to the following dialog box by clicking on the “Next button.

Figure 1.7:Access’s Import Spreadsheet Wizard – Worksheet Options

Figure 1.8:Access’s Import Spreadsheet Wizard – Column Heading Options

This next dialog box (Figure 1.9) asks where we want to store the data: in an existing table or in a new one. Here, we select the new table option.

Figure 1.9:Access’s Import Spreadsheet Wizard – Data Storage Options

The next dialog box (Figure 1.10) provides users with the opportunity to assign names to fields. Here, we assign Field 1 the name “Family” and Field 2 the name “Marital Tie.”

Figure 1.10:Access’s Import Spreadsheet Wizard – Field Options

The next dialog box asks whether you want Access to add the table’s primary key. In this case, we will say yes although whether you do will largely depend on the data being imported and whether it already contains a field you wish to designate as the primary key. For more information on primary keys see Eddy et al. (1998). The final dialog box (not shown) asks you to assign a name to the table you are creating. In this case we use the name “ Padgett.”

Figure 1.11:Access’s Import Spreadsheet Wizard – Primary Key Options

Once the import process is complete Access will return to the standard database window displayed in Figure 1.5 except now it will contain a new table. Clicking on the “Open” button opens a table similar to the one displayed in Figure 1.12.

Figure 1.12:Opened Padgett Table in Access

1.4 Creating social network matrices in Access 97

The next step in the process is to create a crosstabulation of the Padgett data such that we can export it as a matrix to Excel and ultimately to Ucinet. At the database window (see Figure 1.5) select the “Queries” tab. Click on the “New” query button, and this will bring up a dialog box similar to the one displayed in Figure 1.13. Select the “Crosstab Query Wizard” option and click “OK.” This will bring up the Crosstab Query wizard, which guide us through the process of creating a crosstabulation.

Figure 1.13:Access’s Query Dialog Box

The query first asks (see Figure 1.14) what tables and queries that will be used to create the crosstab. Since Access is a relational database, it allows us to use multiple tables in creating our queries. What is extremely helpful is the fact that if after we have created a crosstab (or other query), we make changes to the table(s) on which it is based, Access automatically updates the crosstab.

Figure 1.14:Access’s Crosstab Query Wizard

In this case we only have one table to select (Padgett) so we highlight it and click on the “Next” button. The wizard then asks (Figure 1.15) what fields’ values we want as the row heading. Here we select “Family,” move it (using the arrow button) from the “Available Fields” to the “Selected Fields” box and then click on the “Next” button.

Figure 1.15:Access Crosstab Query Wizard – Row Heading Options

Next, the wizard (Figure 1.16) asks what fields values we want as the column heading. Here we select “Marital Tie” and again click on the “Next” button.

Finally, Access asks what number we want calculated for each column and row intersection (Figure 1.17). Access provides a number of options. In this instance we select “ID” in the field box and “count” in the function box. Access also asks whether we want to summarize each row. This can be a helpful statistic, so select this box as well.

Figure 1.16:Access Crosstab Query Wizard – Column Heading Options

Figure 1.17:Access Crosstab Query Wizard – Calculation Options

The final dialog box (not shown) asks what we wish to name the crosstab (it does provide a default name). Type in a name and click on the “Finish” button. This will open a crosstab similar to the one that appears in Figure 1.18.

Figure 1.18:Access 97 Crosstabulation Query of Padgett Data

Notice that the names of the families appear both down the left side (rows) and across the top (columns) as you would find in a typical matrix. The query includes a “Total of ID” column that tabulates (in this case) the number of marital ties that each family has with other families. It also includes a “>” column that indicates, at least in this case, families that have no ties as is the case with the Pucci family. The blank row indicates that none of the families have a marital tie with the Pucci family. A quick comparison of this data with Wasserman and Faust (1994:744) indicates that we have indeed imported and manipulated the data correctly.

1.5 Preparing data for Ucinet

The next step in the process is to prepare the data for analysis in Ucinet. To do this we first export the data from Access to Excel, and then copy the data from Excel into Ucinet. With the query open that you want to export to Excel, click on the “Tools” menu, select “Office Links,” and click on “Analyze It with MS Excel.” This opens the Excel program and exports the data into Excel (Figure 1.19) in a format that looks almost identical to the Access crosstabulation.

First, delete the second row (blank) and the second (Total ID) and third (>) columns since these will not be part of our final matrix.[7] Next, open Ucinet. Along the top of the screen you will find four buttons. The second opens the “Ucinet Spreadsheet.” In principle, the Ucinet spreadsheet should allow us to import Excel data directly into Ucinet. Unfortunately, it does not always work properly. If it does not, simply copy and paste the data from Excel to Ucinet. Once pasted, the data should look something like what you see in Figure 1.20.