Bartal & Ravid

PREDICTING LINKS INSOCIAL NETWORKS: THE ISRAELI LAW SYSTEM and NETFLIX

A.Bartal and G.Ravid

Department of IndustrialEngineering & Management

Ben-GurionUniversity of the Negev

P.O.Box 653, Beer-Sheva84105

ABSTRACT

A substantial foundation of the legal system is assigning judges randomly to a case, which makes it nearly impossible, to predict this pairing.A two mode social network, comprised of judges and lawyers, was build. We focused on the group of lawyers which appeared in front of a Supreme Court judge and in front of 255 different judges. One judge was randomly chosen, trying to predict the lawyers that will appear in front of him in the following years.Our results shows that some measures achieved significant performance in predicting new judge- lawyer interactions of 89.51%.The second experiment tried to predict viewers that will watch a certain movie in the future. The experiment was based upon Netflix data base, as a two mode network. We focused on the group of viewers who saw a certain movie. One movie was randomly chosen, tring to predict which of the viewers will see it.The model was able to achieve 95.27% of accuracy.

  1. INTRODUCTION and THEORETICAL BACKGROUND

The challenge of predicting changes in a social network is called the link prediction problem [8]. Liben- Nowell and Kleinberg, [6] explain it as: Given a snapshot of a social network at time t, we seek to accurately predict the edges that will be added to the network during the interval from time t to a given future time t+1.

Social networks are highly dynamic objects which grow and change rapidly over time. The mechanisms by which they evolve is a fundamental question that is still not well understood, thus it forms the motivation for this work. For example, two people who are not connected and are close in a friendships network will have friends in common.The goal of this research is to find which measures of the network can lead to the most accurate link predictions. According to Wasserman and Faust, [12] "a social network consists of a finite set or sets of actors and the relation or relations defined on them”. Social networks are typically affiliation networks in which participants collaborate in groups and links are established by common group membership [11]. Examples of a relationship between people include friendship, email exchange etc. [7].Potgieter el al., [3] indicate that SNA is a research area aimed at understanding social complexity by representing and analyzing social networks using mathematical graphs.A graph G is a structure consisting of a set of vertices V and a set of links E. Social networks can represent many different types of collaboration links, such as marriages, classmates, friendships, the Hollywood-actor collaboration graph [10], [4] and the academic collaboration graph [1], where two people (i.e. vertices) who collaborated together are connected by an edge (i.e. link).

  1. DATA and EXPERIMENTALSETUP

Two experiments were conducted in order to discover new links by classification of forming links.The first experiment used a data set that contains verdicts, sentences in trial cases and the judges and lawyers which were involved in those cases from website.The second experiment used the Netflix data set. Netflix is an on-line movie subscription rental service which allows people to rent movies for a monthly fee.The data contains the movie and person's ID that saw the movie, plus a rank of the movie which describes the movie's quality by that person. We tried to predict who will see a certain movie in the upcoming period given a particular group of people.The data bases are mapped into two-mode networks. Wasserman and Faust [12] discuss networks affiliation as two-mode data. Two-mode social networks are defined for two sets of social actors (vertices) and contain measurements of a relation from the actors in one set to actors in the other set. In our experiment actors are considered as judges or lawyers, in the first experiment or as movies and viewers at the second experiment.To focus on the class of actors, we transformed the two-mode graph into a one-mode network. In the bipartite framework an affiliation tie is added to the network if an actor has participated in the given context.In a two-mode network, N = (U, V, R, w), one set of social units is denoted by and the second set of units is denoted by,, The social relation RU×V is defined as one between the units in these two sets and is represented by the set of arcs or edges with initial vertices in the set U and terminal vertices in the set V. The mapping w: R → R is a weight, (Doreian, Batagelj and Ferligoj, 2004). We changed the two-mode networks into two one-mode networks and analyzed them.

  1. CLASSIFICATION ALGORITHM

A graph G is a structure consisting of vertices V and links (edges) E and can be indicated as G= (V, E). A link eE represents a connection between two vertices from the set V.The problem of link prediction is trying to find quantitative classification rules that classify dyads into two classes, C1 and C2 where C1 includes vertices which will not form new links and C2 contains vertices which will form a new link, based only on the values of features (tables 2). Each pair of vertices either represents a positive example (a link will form) or a negative example (a link will not form).We build a two mode social network which contains 418 vertices comprised of judges, lawyers and 1051 edges at the year 2005. Judges and lawyers are the vertices and there is a link between a lawyer and a judge, only if they were involved in the same case. The two mode network was converted into one mode network containing 163 vertices (only lawyers were considered) and has 4559 edges. If two lawyers had engaged the same judge –then they have a common link. We focused on the group of prosecution lawyers which appeared in front of the Supreme Court judge: Aharon Barak at 2005 which was chosen because of his position. This group contains 163 lawyers which appeared in front of 255 judges at the year 2005.One of the judges waschosen randomly from the network trying to predict the lawyers that will stand before him at the following years (2006-2008).

The second experiment was based upon the Netflix data base and contained a two mode network with 263 vertices comprised of movies and people who saw a movie and 518 edges between people and movies at the year 2004. A link between a movie and a viewer exist if that viewer saw that specific movie. The two mode network was converted into one mode network containing 199 vertices– only viewers were considered and has 3489 edges. A link between two viewers indicates they saw the same movie. We focused on the group of people who saw a certain movie in the time interval 1/1/04 to 31/12/04. This group contains 199 people and 64 movies and then randomly chose one movie that was included in the network and tried to predict which of the viewers that haven't seen this movie will order it at the following years (2005-2008).A list of features and definitions is detailed in table 2.

Table 2- metric definitions
Description / Definition / Name
Weight is defined as the number of papers two authors have co authored together. / Weight(vi,vj) / Link's Weight(vi,vj)
The number of Vertices linked to both examined Vertices (i.e. mutual friends). / / Common neighbors [14]
Number of common neighbors of the examined Vertices divided by the number of Vertices that are neighbors of either examined Vertex. / / Jaccard's coefficient [14]
The number of features shared by the Vertices, divided by the log of the frequency of the features. This metric rates rarer features more heavily. / / Adamic\ Adar similarity[7]
The product of the number of edges incident to the two Vertices. / / Preferential attachment [2]
Clustering Coefficients considering only 1-neighborhood / / Cc1 [9]
Clustering Coefficients considering 2-neighborhood / / Cc2 [9]
The number of links from vertex vi to any other vertex / / Degree
The length of the shortest path between two vertices / Min{} / Shortest path
How close the vertex is to all other vertices. Ranges from 0 (far away) to / / Closeness [13]
Sum of all shortest paths between all vertices that contain vi as a percentage of all shortest paths between all vertices. is the number of shortest paths from v1 to v2 that pass through a vertex vi / / Betweenness [13]
  1. RESULTS

A DT model was created based on 70% of Psakdin'snetwork data which was used as training set, and cross validation for 10 times. The DT created has 42 levels and 89.51% prediction accuracy over the testing set.The most effective features areCC2, and Pcore.In order to verify the results we have created a network of judges& lawyers, similar to the network of Aharon Barak, only this time the network is focused on the randomlychosenjudge Zahava Agi at the year 2005. We tried to predict new links in a random network by composing a two mode social network with129 vertices comprised of judges and lawyers and 156 edges at the year 2005. The two mode network was converted into one mode network containing 34 vertices (only lawyers were considered) and has 61 edges.Those lawyers appeared in front of 95 different judges (besides judgeAgi) at the year 2005. We randomly chose one of the judges and tried to predict which of the lawyers will appear before him at the following years (2006-2008). 261 new links were created between lawyers in the one mode network during the prediction period.A DT model was created based on 70% of Psakdin's network data which was used as training set, using cross validation for 10 times. The DT created using those parameters has 5 levels and 83.23% prediction accuracy over the testing set. The most effective features are SumInput, Betweenness andCC2.In the second part of the judges& lawyers experiment we examine the hypothesis that judges are appointed randomly to cases, in contradiction to the finding discussed above. We took the initial one-mode network of judges and lawyers at the year 2005 with 163 lawyers and randomly created the same number of links that were created at the testing period. The next stage is predicting the random generated links. If the model will succeed in the prediction- it will be reasonable to assume that the judges and lawyers network is a random. The model contains 10 times cross validation, boosting 10 times, and was build based on 70% of the data as training and 30% of the data as testing. The DT created using those parameters has 8 levels and 50.56% prediction accuracy over the testing set.

The second experiment used the Netflix Data base. A DT model was created based on 70% of the data which was used as training set, using the Boosting option for 10 times, and cross validation. The DT created using those parameters has 33 levels and 95.27% prediction accuracy over the testing set. The most effective features areCC1, MaxInput and Pcore.

  1. CONCLUSIONS

The experiments discovered that prediction based on graph metrics could aid link prediction by finding new links between entities. Although all metrics are significantly very different no individual metric alone is useful for classifying the two classes. The results of the judges & lawyers network prediction support the assumption that judges and lawyers are not being appointed randomly to cases. A reasonable explanation for this finding is the fact that every judge isspecializing in a certainfiled at the law system. In the same manner lawyers has specialization in a certainfiled at the law system, thus in many cases specific lawyers and judges will meet at court several times.The contribution of this work emphasizes the ability of predicting links in a graph using graph theory and uncovering hidden relationships.

  1. REFERENCES
  1. A.L. Barabasi, H. Jeong, Z. Neda, E. Ravasz, A. Schubert, and T. Vicsek, “Evolution of the social network of scientific collaboration”, Physica A 311, 2002, 590-614
  2. A.L. Barabasi, and R. Albert, “Emergence of scaling in random networks”, Science, AAAS, 286, 1999, 509-512
  3. Potgieter, K. A. April, R. J. E. Cooke, and I. O. Osunmakinde, "Temporality in Link Prediction: Understanding Social Complexity", Respectively Department of Computer Science, University of Cape Town, Republic of South Africa, 2007
  4. Aleman-Meza, M. Nagarajan, C.Ramakrishnan, A. Sheth, I. Arpinar, L. Ding, P. Kolari, A. Joshi, and T. Finin, “Semantic analytics on social networks: Experiences in addressing the problem of conflict of interest detection”. Proceedings of the 15th international conference on World Wide Web, ACM New York, 2006, 407-416
  5. Liben-Nowell, "An Algorithmic Approach to Social Networks", PhD thesis at MIT Computer Science and Artificial Intelligence Laboratory, 2005
  6. D. Liben-Nowell, and J. Kleinberg, “The link prediction problem for social networks”, Proceedings of the twelfth international conference on information and knowledge management, 2003, 556-559
  7. L.A. Adamic., and E. Adar, “Friends and Neighbors on the Web”, Social Networks, Elsevier, 25(3), 2003, 211-230
  8. L. Getoor, and C. Diehl, “Link Mining: A Survey”, ACM SIGKDD Explorations Newsletter, 7(2),2005, 3-12
  9. M. E. J. Newman,"The structure of scientific collaboration networks", Proceedings of the National Academy of Science,USA, 2001, 404-409
  10. M. E. J. Newman, "Assortative Mixing in Networks", Physical Review Letters, 89(5), APS, Santa Fe, New Mexico, 2002, 1120
  11. M. E. J. Newman, "The structure and function of complex networks", SIAM Review 45,Ann Arbor, Department of Physics and Center for the Study of Complex Systems, University of Michigan, 2003, 167-256
  12. S. Wasserman, and K. Faust, "Social Network Analysis: Methods and Applications", Structural Analysis in the Social Sciences, 8, Cambridge University Press, Cambridge, England, 1994
  13. V.V. Raghavan, and S. K. M. A. Wongcritical, "A critical analysis of vector space model for information retrieval", Journal of the American Society for Information Science, 37(5),Computer Science Department, University of Regina, Regina, Saskatchewan, Canada , 1986, 279-287
  14. Z. Huang, X. Li, and H. Chen, “Link Prediction Approach to Collaborative Filtering”, Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, IEEE, 2005, 7-11
  1. ACKNOWLEDGMENT

The partial support for this research by the Ira Center for Business, Technology & Society, Ben-Gurion University, is gratefully acknowledged.