Cliff Lampe – Position Paper

TMSP Workshop February 11-12, 2010

Socio-Technical Infrastructures for Research

Cliff Lampe

College of Communication Arts and Sciences

Michigan State University

A great deal of the work in the study of socio-technical systems has had to conform to the opportunities to collect data. When looking at how online communities work, we’ve either studied them “in the wild” using a variety of techniques, studied principles of interaction in lab settings, or have created online communities that could be manipulated for scientific purposes.

Systems in the wild: The study of socio-technical systems has long depended on “real world” systems where interaction was occurring naturally as sites to understand online interaction. Email, BBS’s, Usenet and MUDs were all early forms of online communities that were studied from a variety of perspectives (Curtis, 1992; Rheingold, 2000; Sproull & Kiesler, 1991; Whittaker, Terveen, Hill, & Cherny, 1998). In the earlier days of networked computing, these were open systems where it was relatively easy to access user interactions and most information was public and globally available. It’s interesting to reflect on the fact that there are few (if any) studies of interactions that occurred in the closed gardens of the time, like Compuserve and AOL. AOL especially hosted a monumental amount of user interactions, which have largely been lost.

Recently, new crops of “in the wild” sites have been studied. Interaction on Wikipedia has led to numerous papers looking at a variety of social phenomena occurring on that system (Kittur, Suh, Pendleton, & Chi, 2007; Priedhorsky, et al., 2007; Viegas, Wattenberg, & Dave, 2004). Others have received data through collaboration with a corporate partner (Lampe, Ellison, & Steinfield, 2007; Lampe, Johnston, & Resnick, 2007), or have been members of a corporation where they have access to large-scale systems (Burke, Marlow, & Lento, 2009; Leskovec & Horvitz, 2008; Marlow, Naaman, boyd, & Davis, 2006).

An example of how beneficial access to “closed garden” data can be for researchers can be found through the Notre Dame collaboration to release SourceForge data to researchers. SourceForge is one of the largest repositories of open source software projects on the Internet, and in the early 2000s, they made data available to academic researchers, which was partially supported by an NSF grant[1]. Access to this data has led to hundreds, if not thousands of research papers on the social and technical practices of open source software development. Another example of a corporation making their user data public is the Netflix dataset, which was made available to researchers through the Machine Learning Repository at UC Irvine. This data set has been central to refinements in collaborative filtering mechanisms, and has been used for other types of research beyond the machine learning implications.

Issues of working with “in the wild” data can include:

  • Negotiating access. This can include issues like contacting decision makers in organizations, matching university and corporate needs (in particular IRB), and managing disclosure agreements.
  • Sharing data. Often, access agreements make it so you can’t share data, which limits replicability.
  • Control of interface elements. Even when private or public data sources are available, the administrators of those sites rarely allow researchers to manipulate site features.

Interactions in the lab. Another, less common route to understanding issues with socio-technical systems is to run controlled lab experiments, and then map those experiments to core principles in understanding online communities. In some cases, this can take the form of a specific known problem in online communities (i.e. eliciting responses) (Hsieh, Kraut, Hudson, & Weber, 2008), and in other cases can use the site more as platform for addressing more basic theoretical questions (Tong, Heide, Langwell, & Walther., 2008).

Sandboxes for research. Rarely, researchers or teams will create a new socio-technical system solely for the purpose of scientific research. With control over not just variables, but over tools within the technology that shape interactions, these sandboxes can be incredibly valuable for looking at causal relationships between technical and social systems.

Often these sites are created in the context of corporate research labs. One recent example is the Beehive project at IBM, which tested many principles of both social network sites in the organization, but also more general principles of online communities (Chen, Geyer, Dugan, Muller, & Guy, 2009; DiMicco & Millen, 2007).

An example of a research sandbox that has been created in academia is the MovieLens project at the University of Minnesota. This was a movie recommender that was used to test basic principles of recommender systems as in (Cosley, Lam, Albert, Konstan, & Riedl, 2003). By controlling user interfaces, this group was able to empirically test both design questions, as well as tests of how social science theory could be mapped to online interaction (Ling, et al., 2005).

Creating sandboxes has several issues:

  • Cost of the design and implementation of the system.
  • Populating the site with a critical mass of users can be difficult.
  • Ongoing management and promotion of the site can be costly.
  • Often, sites are created for a primary purpose where failure is to be avoided. This limits the types of research that can be conducted on those sites.

Position. I propose that the NSF spend money and attention addressing the issue of data access for researchers of socio-technical systems. Just as large infrastructures must be put in place for other sciences (i.e. earthquake shake tables, particle colliders, etc), so this discipline must have appropriate tools in order to advance the state of the art in the field. Continued butterfly collecting will yield insights, but will always be hemmed by the natural limits of control over variables that face individual site cases, or lack external validity in the case of many lab experiments.

This type of investment would be nontrivial, and equates to infrastructure investments made in other sciences. Initial design and development, hardware maintenance, content creation and regeneration, moderation and more would make this hard for any single group to do, and consequently is a strong candidate for support.

Burke, M., Marlow, C., & Lento, T. (2009). Feed me: Motivating newcomer contribution in social network sites. Paper presented at the ACM Conference on Human Factors in Technical Systems, Boston, MA.

Chen, J., Geyer, W., Dugan, C., Muller, M., & Guy, I. (2009). Make new friends, but keep the old: recommending people on social networking sites. Paper presented at the Proceedings of the 27th international conference on Human factors in computing systems.

Cosley, D., Lam, S. K., Albert, I., Konstan, J. A., & Riedl, J. (2003). Is Seeing Believing? How Recommender Interfaces Affect Users’ Opinions. Paper presented at the Computer Human Interaction (ACM-CHI), Minneapolis, MN.

Curtis, P. (1992). Mudding: Social phenomena in text-based virtual realities. Paper presented at the Conference on Directions and Implications of Advanced Computing, Berkeley, CA.

DiMicco, J. M., & Millen, D. R. (2007). Identity management: multiple presentations of self in facebook. Paper presented at the Proceedings of the 2007 international ACM conference on Supporting group work.

Hsieh, G., Kraut, R., Hudson, S. E., & Weber, R. (2008). Can markets help?: applying market mechanisms to improve synchronous communication. Paper presented at the Proceedings of the ACM 2008 conference on Computer supported cooperative work.

Kittur, A., Suh, B., Pendleton, B. A., & Chi, E. H. (2007). He Says, She Says: Conflict and Coordination in Wikipedia. Paper presented at the Human Factors in Computing Systems, San Jose, CA.

Lampe, C., Ellison, N., & Steinfield, C. (2007). Profile Elements as Signals in an Online Social Network. Paper presented at the ACM Conference on Human Factors in Computing Systems (CHI), San Jose, CA.

Lampe, C., Johnston, E., & Resnick, P. (2007). Follow the Reader: Filtering Comments on Slashdot. Paper presented at the ACM Conference on Human Factors in Computing Systems (CHI'07), SanJose, CA.

Leskovec, J., & Horvitz, E. (2008). Planetary-scale views on a large instant-messaging network. Paper presented at the Proceeding of the 17th international conference on World Wide Web.

Ling, K., Beenen, G., Ludford, P., Wang, X., Chang, K., Li, X., et al. (2005). Using social psychology to motivate contributions to online communities. Journal of Computer-Mediated Communication, 10(4).

Marlow, C., Naaman, M., boyd, d., & Davis, M. (2006). HT06, Tagging Paper, Taxonomy, Flickr, Academic Article, ToRead. Paper presented at the Conference on Hypertext and Hypermedia, Odense, Denmark.

Priedhorsky, R., Chen, J., Lam, S. K., Panciera, K., Terveen, L., & Riedl, J. (2007). Creating, destroying, and restoring value in wikipedia. Paper presented at the Proceedings of the 2007 international ACM conference on Supporting group work, Sanibel Island, Florida, USA.

Rheingold, H. (2000). The Virtual Community: Homesteading on the Electronic Frontier (2nd ed.). Cambridge, MA: MIT Press.

Sproull, L., & Kiesler, S. (1991). Connections: New ways of working in the networked organization. Cambridge, MA: MIT Press.

Tong, S. T., Heide, B. V. D., Langwell, L., & Walther., J. B. (2008). Too much of a good thing? The relationship between number of friends and interpersonal impressions on Facebook. Journal of Computer-Mediated Communication, 13(3), 531-549.

Viegas, F. B., Wattenberg, M., & Dave, K. (2004). Studying cooperation and conflict between authors with history flow visualizations. Paper presented at the Proceedings of the 2004 conference on Human factors in computing systems, Vienna, Austria.

Whittaker, S., Terveen, L., Hill, W., & Cherny, L. (1998). The Dynamics of Mass Interaction. Paper presented at the Computer-Supported Cooperative Work, Seattle Washington.

[1]