Not by the Book: Facebook as a Sampling Frame

Not by the Book: Facebook as Sampling Frame

Christine Brickman-Bhutta

October 27, 2009

Social networking sites and online questionnaires make it possible to do survey research faster, cheaper, and with less assistance than ever before. Themethods are especially well-suited for snowball sampling of elusive subpopulations. This note describes my experience surveying thousands of Catholics via Facebook in less than a month, at little expense, and without hired help. Although the respondents were disproportionately female, young, educated, and religiously active, their responses preservedkey correlations found in standard surveys conducted by Gallup and the GSS. I relate my methods toexisting web-based methods and offer concrete suggestions for future work.

Keywords: social networking websites, Facebook, Internet sampling, snowball sampling, chain-referral sampling, coverage error, inference

Online social networking sites offer new ways for researchers to run surveys quickly, cheaply, and single-handedly – especially when seeking to construct “snowball” samples of small or stigmatized subsets of the general population. Facebook is currently the SNS best suited for survey research, thanks to size (currently exceeding 300 million users worldwide), intensive use, and continuing growth. Each Facebook user is directly linked to his or her personal “friends,” while also having access to membership in one or more of the 35 million Facebook groups that links millions of otherusers throughout the world. Facebookgroups are virtual communities linking people with some shared interest, attribute, or cause. Researcherscan readily sample populations of interest by working through existing groups or creating new ones.

Although researchers and journalists devote much attention to social networking, I have yet to locate any work that exploits SNS’s as a tool of research. Existing SNS research focuses on questions related specifically to the phenomenon of online social networking: what functions do SNS’s serve for those who use them, and what benefits do users derive (Joinson 2008)? Is the accumulation of social capital one of those benefits (Ellison, Steinfield, and Lampe 2007)? Do SNS users behave differently or look differently than non-users (Acquisiti and Gross 2006; Ellison et al. 2007)? What privacy concerns do the rise of SNS’s raise (Jones and Soltren 2005; Dwyer, Hiltz, and Passerini2007), and to what extent do these concerns influence online behavior (Acquisiti and Gross 2006)? Can online social interactions predict tie strength (Gilbert and Karahilos 2007)?

My work shifts the emphasis from research about SNS’s to research through SNS’s. Within five days of releasing a 12-minute online survey to a Facebook group of potential volunteers, I harvested 2,788 completed questionnaires. Within a month, the total number of respondents increased to about 4,000. Total monetary costs averaged less than one cent per survey – vastly less than the cost of surveys obtained through mail, phone, or even email.[1] Moreover, the responses became available for review the moment they were entered. Hence, if the survey turned out to contain any substantial errors or omissions, I could repair the damage within minutes. By working through social networks, I was also able to reach a population that is difficult to reach through conventional survey methods. Although the respondents by no means constituted a random sample of the relevant (Catholic) population, their responses preserved many of the statistical relationships obtained by traditional means. These and other advantages described below make Facebook a useful tool both for researchers with limited means and for rapid pre-testing of surveys destined for dissemination via traditional methods.

The paper proceeds as follows. Section one reviews the relevant literature on chain-referral sampling and electronic survey methods, highlighting strengths and limitations of both of these methods. Section two follows with a description of the Facebook featuresthat make it an effective tool for snowball sampling. Section three discusses attempts to recruit study volunteers, and section four details the results of those efforts. Sections five and six address the nature of the bias in the data, and section seven concludes the paper with suggestions for others interested in replicating this method.

1. Related Research

Chain-referral sampling first emerged in response to the neglect of social structure and interpersonal-relationships in survey research methods. As Coleman (1958) notes, most early analyses overlooked the role of relationships, “neverincluding (except by accident) two persons who were friends” (28). Snowball sampling is a chain-referral technique that accumulates datathrough existing social structures. The researcher begins with a small sample from the target subpopulation and then extends the sample by asking those individuals to recommend others for the study. Chain-referral techniques have the added benefit of providing relatively easy access to “hidden” subpopulations that are almost impossible to sample by standard (phone, mail, or door-to-door) methods, due to their small size or distrust of outsiders. Examples include studies of prostitutes (Faugier 1996), the homeless (Anderson and Calhoun 1992), AIDS victims (Martin and Dean 1990), drug users (Biernacki and Waldorf 1981; Griffiths et al. 2006), and religious “cults” (Lewis 1986).

Sample bias is the principal downside of the chain-referral approach. On the one hand, study volunteers may try to protect their friends by not referring them, a problem known as “masking.” On the other hand, “referrals occur through network links, sosubjects with larger personal networks will be oversampled, and relative isolates will be excluded” (Heckathorn 1996). Thus Faugier’s (1996) study of prostitutes undersampled women who were new to the business or who had been ostracized by their peers. Participants may also recruit inappropriate volunteers, especially if they misinterpret the study’s design or purpose (Biernacki and Waldorf 1981). And response rates are difficult to define, much less estimate, when participation spreads through forwarded surveys and undocumented invitations.

Despite these limitations, no one disputes the value of chain referral methods for studies of elusive subpopulations and exploratory work (Penrod et al. 2003; Faugier 1996). Moreover, new techniques can help to overcome some of the problems discussed above.[2]

Facebook and other social network sites allow us to carry chain-referral methods into the age of the Internet, while also exploiting the strengths of online questionnaires. A single scholar can complete projects that previously required large teams. Costs of printing, postage, and data entry virtually disappear. Feedback is instantaneous. Turnaround times shrink from weeks to days. It becomes much easier to reach remote, diffuse, and alienated subpopulations. (For recent work on the costs and benefits of web-based research, see Bachmann et al. 1996; Berge and Collins 1996; Goree and Marszalek 1995; Kiesler and Sproul 1986; Parker 1992; Schmidt 1997; Sproul 1986; Weible and Wallace 1998;Roselle and Neufeld 1998; Coomber 1997; and Evans and Mathur 1995).

SNS sampling shares most of the limitations associated with other forms of web-based research. We cannot reach those who lack the requisite computer skills and equipment. Nor are we likely to reach many people with serious concerns about Internet privacy. The layout and readability of surveys can vary across hardware and software. Electronic surveys can easily reach unintended recipients and are more readily taken multiple times. And response rates tend to be lower than those associated with phone, mail, and interviews. (For more on these difficulties, see Berge and Collins 1996; Kiesler and Sproull 1986; Parker 1992; Sproull 1986; Best and Krueger 2002; O’Lear 1996; Sell 1997; Evans and Mathur 1995; Kittleson 1995; Greene, Speizer, and Wiitala 2008; McDonald and Adam 2003; Converse et al. 2008; Cole 2005; Swoboda et al. 1997; Griffis, Goldsby, and Cooper 2003; Smith and Leigh 1997; Goree and Marszalek 1995).

Bearing in mind all these considerations, let us turn to a specific SNS-based project.

2. Facebook as a Sampling Frame

Facebook is currently the world’s largest and fastest-growing SNS. Each usercreates a personal profile (basically a personal webpage) with information about his or her interests, hobbies, education, occupation, contact information, and the like. Most users also post a personal profile picture. Users can invite people to become their Facebook “friends,” thereby creating networks for public postings and private messages. Users can also create specific Facebook “groups” based on shared interests, workplaces, regions, schools, or anything else.Each group or network has its own Facebook page where members can post messages or chat, and many users list the networks and groups that they belong to on their profile pages.

Table 1. Proportion of the U.S. Populationa with Facebook Profiles
Age / Oct-08 / Feb-09 / Jun-09 / Sep-09
15-19 / 57.8% / 62.9% / 65.1% / 81.3%
20-24 / 63.7% / 73.2% / 72.1% / 87.4%
25-29 / 32.6% / 47.1% / 56.7% / 73.7%
30-34 / 19.9% / 36.6% / 47.7% / 62.3%
35-39 / 12.2% / 27.4% / 39.3% / 52.4%
40-44 / 5.7% / 15.7% / 27.8% / 38.6%
45-49 / 3.4% / 9.3% / 19.4% / 30.9%
>49 / 1.2% / 3.4% / 9.1% / 16.3%
All**** / 17.0% / 24.4% / 31.5% / 42.8%

aBased on 2007 U.S. Census population estimates

Starting with one or more groups or networks, researchers can create snowball samples by gathering respondents via links to additional friends, groups, and networks. To illustrate the potential of this simple approach, consider the result achieved by one enterprising Facebook user who created a group called “Six Degrees of Separation: The Experiment.” In order to maximize the number of group members, he invited all his friends to join and encouraged all of them to do likewisead infinitum. The group currently numbers more than five million!

As Facebook has grown it also has become increasingly representative of the U.S. population. Though created in 2004 for college students alone, Facebook soon launched a high school version and in September 2006 provided free memberships to anyone. Table 1 documents the spread of Facebook from October 2008 through September 2009. Over those twelve months, the proportion of Americans with Facebook profiles increased from 17.0 percent to 42.8 percent, and the greatest growth (in absolute percentages) occurred among adults aged 25 to 44. Significantly, whereas just over one percent of Americans aged 50 and older had Facebook profiles in October 2008, a year later this figure had increased to 16.3 percent. No longer is Facebook solely a tool for young adults.

3. Recruiting Survey Participants

As noted above, snowball sampling is especially effective when targeting hard-to-reach populations. This was the case in the present investigation, which included inactive- and ex-Catholics who are by definition underrepresented in pews and parish rosters.

I began my search in December 2008 by creating a new group named, “Please Help Me Find Baptized Catholics!” The group’s description explained the purpose of the group, outlined eligibility requirements, and provided instructions on how to be involved. Though I wished only to survey baptized Roman Catholics, the text of my group page invited any viewer to join the group and likewise encouraged them to forward invitations to all their Facebook friends and groups. This strategy was designed both to maximize sample size and to avoid the biases associated with sampling down social chains composed entirely of Catholics. After about a month, all group members would receive the link to the survey, which explored Catholic identity.

I then sought to identify the administrators of Facebook groups who could help me recruit members for my study group. Because there are literally tens of thousands of Catholic groups on Facebook, I excluded groups with large proportions of foreign members and groups with narrow membership criteria, such as those created for specific Facebook networks, college alumni groups, or ethnic groups. This process yielded fifty Catholic groups which I then tried to further categorize by level of religious participation and identification. From group names, descriptions, and postings, I concluded that inactive- or ex-Catholics predominated in twenty-five of the groups; active Catholics predominated in seventeen groups; and the remaining eight groups were of mixed or uncertain composition.

I thencontacted the administrators of all 50 Facebook groups, soliciting their help in recruiting volunteers for the study. Each administrator received a personal message that explained the purpose of the research and asked them to send a message to the members of their groups with an invitation to join the research group. I also encouraged them to contact me with any questions or concerns.

Over the course of three days, I sent personal messages to 43 of the 50 administrators of the Catholic groups. On the first day of solicitations, I contacted nine administrators, and six responded positively. As a result, my group initially grewquite quickly. As time went on, however, my requests for help yielded fewer responses. Of the 25 administrators contacted on the second day, 18 failed to follow up, and no one responded to messages sent on the third day. I suspect that this rapid decline in responses was a consequence of my own initial success. As my group grew, the administrators of other groups perhaps concluded I no longer needed their help. In any case, in light of the rapid growth of my own group and the rapid decline in responses from other group administrators, I decided not to contact the last seven of fifty administrators.

Table 2 summarizes attempts to recruit study volunteers. Although I asked administrators to send messages to their members, some elected to post the information to their group’s pages instead, and they did so for one of two reasons: either they considered messagingtheirmembers obtrusive, or the size of their groups inhibited their ability to send mass messages.[3] Because viewing the posting required users to visit the group page, and because Facebook users often join groups that they rarely if ever return to, this approach left users less likely to learn about the study. Nevertheless, I preferred some assistance to no assistance. Moreover, one person who posted the link administers a group with over 30,000 members; even if the rate of awareness in his group was low, the potential for recruitment from his group in absolute numbers was substantial.

Table 2. Administrator Response by Group Classification and Type of Assistance Provided
ADMINISTRATOR SUMMARY
Number contacted: 42a
Number who helped: 15
Response Rate: 35.7%
RESPONSE BY TYPE
Sent a Message to All Group Members
Group Type / # Members / % Members / # Groups
Former/Inactive / 2329 / 50.5% / 5
Mixed / 743 / 16.1% / 3
Active / 1542 / 33.4% / 2
Total / 4614 / 100.0% / 10
Posted a Link to the Group
Group Type / # Members / % Members / # Groups
Former/Inactive / 976 / 3.0% / 3
Mixed / 0 / 0.0% / 0
Active / 31,873 / 97.0% / 2
Total / 32,849 / 100.0% / 5
a43 administrators were contacted but one group was misclassified. The group appeared to be comprised of Catholics but it was not.

After working through existing Catholic groups, I also contacted over 200 of my own Facebook friends, seeking their help in recruiting volunteers. Though I failed to collect systematic information on the results of this process, the feedback that I received suggests that many of them passed along my group link to other people, and a few also posted information about the study on their personal Facebook pages.

Facebook is designed in such a way that group administrators lose the capacity to send mass messages to their membership if the group size exceeds 5,000. Because I planned to use mass mailings to direct people to my online survey, I closed my initial group when membership approached 4,500 and opened a second group for additional volunteers. After 2,500 people joined the second group,[4] I closed it and opened a third and final group. Altogether, nearly 7,500 people joined one of the three groups over the course of one month and received the link to the survey.

4. Response Rate

Although I cannot know how many people ultimately received the survey link, the web-based survey software (QuestionPro) collects basic statistics that indicate the response rate of those who start the survey. The survey was started 4,709 times, yielding 4,016 completed surveys for a completion rate of 85.3 percent. Only 18 responses came from ineligible participants,leaving a total of 3,998 usable surveys. On average, respondents completed the survey in 12 minutes, and the high completion rate suggests this was a reasonable request of one’s time.

Speed of response represents a significant advantage of web-based surveys,[5] one that this research captures quite well. Figure 1 plots the number of completed surveys within the first month of its release. Although I kept the survey link active for 100 days, the vast majority of completed surveys arrived within days.[6] Members of groups two and three received the survey link at time zero; members of group one received the survey link on day three. Just five hours after releasing the survey, I tallied 426 completed surveys, a little more than 10 percent of the total number of completed surveys. Within five days, the number grew to 2788, 70 percent of the total. Within ten days, the figure increased to 80 percent. Participation continued to taper off, and 90 percent of the data arrived by the end of the first month.

5. Sample Characteristics

To assess the representativeness the Facebook sample, I compared those aged 18 and older to adult respondents of the 2008 GSS who identified Catholicism as the religion of their upbringing. Compared to the general population of U.S. adult Catholics and ex-Catholics, the Facebook sample is younger on average (46.7 years versus 30.7 years), more female (54.6 percent versus 69.9 percent), and less likely to identify as Latino/Hispanic (29.1 percent versus 5.6 percent). The Facebook sample is also much better educated and more religious. Two-thirds of Facebook respondents have at least a bachelor’s degree (versus one-fourth of GSS respondents) and 76.8 percent claim to attend Mass at least once per week (versus 29.2 percent in the GSS). Table 3 summarizes these comparisons.

Table 3. Sample Characteristics: GSS 2008a vs. Facebooka
Facebook / GSS
(N=3767) / (N=672)
Demographics
Mean Age / 30.7 / 46.7
% Female / 69.9 / 54.6
% Latino / 5.6 / 29.1
% College Grad / 66.8 / 26.6
Mass Attendance (%)
Never or seldom / 13.3 / 41.7
Sometimes / 19.9 / 29.2
Once/week / 43.1 / 23.8
>Once/week / 23.7 / 5.4
Total / 100.0 / 100.1
aAdults (age 18+) raised Catholic and converts to Catholicism
Column totals may exceed 100 percent due to rounding

Religious activity was especially striking among the Facebook males – 27.9 percent report attending Mass more than once per week, versus 21.9 percent for females (Table 4). This result directly contradicts not only the GSS data, but also a huge and varied body of research demonstrating greater religiosity among women than men for all aspects of religious behavior, all regions of the world, and all know eras (Miller and Stark 2002; Beit-Hallahmi 1997).