Analysis of US Congress twitter network by NodeXL
Data Set: - Data set of US Congress twitter network was fetched directly by using Twitter APIs. NodeXL does give a support for fetching any user’s network; however it is less efficient in fetching very large twitter network of any user. So, sample code was written in C# to fetch the network of various Congressmen in twitter. Tweet Congress was used to get the information about Congressmen active on twitter.com. Below is the simple statistic showing number of Congressmen on twitter from each party.
Figure 1: D-Democrat, R-Republican, I-Independent
Sample code generated tab delimited data, which was very easy to copy and paste in the NodeXL template. US Congress twitter Data contain 181 nodes and 5001 edges with only following relationship type. Unlike two relationship types fetched in NodeXL, only one relationship type was kept for the simplicity and understanding.
Here is a simple example to understand relationship conversion. Suppose A and B, both follow each other and NodeXL is used to fetch the twitter network of A and B. We get 4 edges
1. A – B (Relationship Type = Following)
2. A – B (Relationship Type =Followed)
3. B – A (Relationship Type = Following)
4. B –A (Relationship Type =Followed)
Above edge list contains 2 duplicate edges. Sample code simply merges duplicate edges and produces unique edges with only following Relationship type.
1. A – B (Relationship Type = Following)
2. B – A (Relationship Type = Following)
Here are the few headlines found during analysis and exploration of US Congress twitter network.
Headline 1: Republicans are very active and strongly connected in twitter.
Figure 2: Red, Blue and Yellow spheres indicate Republican, Democrats, and Independent leaders respectively.
Description: - Data was loaded in the NodeXL and the “Fruchterman-Reingold” layout, force directed layout, was selected to generate a network graph. The AutoFill option was not chosen to select the nodes’ color that’s why legends are not available. To remove the clutter because of edges, opacity of the edges was reduced to 5%.
It can be easily seen from the above network graph that Republicans are very well connected compare to Democrats. High number of connected Republicans shows their activeness and understanding of the importance of social media in contemporary world.
Headline 2: Mike Quigley (Republican) is the only leader in the US Congress twitter network, who is not followed by any other leader of Congress.
Figure 3: US Congress Twitter Network with grid layout
Description: - NodeXL provides very nice feature to display image associated with each node. To get a quick overview of who is in the Congress twitter network, the Grid layout was used. Users can easily select the image of person, in which they are interested, and get a quick overview of the leader’s network. In-degree for each node was calculated and dynamic filter was applied on In-degree to display nodes having In-degree equal to 0.
Mike Quigley is the only leader in US Congress twitter network who is not being followed by any other Congress leader. This may be due to his lack of popularity in US Congress or the absence of the Congressmen on twitter with whom he closely works.
Headline 3: Indifference of Pete Hoekstra (Republican) in US Congress twitter network.
Figure 4: Congressmen who are not following anyone
Description: - To know which leader is not following anyone in the US Congress twitter network, Out-degree was calculated and dynamic filter was applied on Out-degree to display nodes having out degree equal to 0. “Fruchterman-Reingold” layout was used with the same coloring scheme and edge opacity as in Headline 1.
The above filtered node graph (Figure 4) shows more Democratic leaders than Republicans which can be easily understood by less connectivity of Democrats in US Congress network. To further explore whether these leaders show the same behavior in overall twitter network, a dynamic filter was applied on “following” column and the minimum value was gradually increased. Surprisingly, all records were filtered out above 65 except one, named “Pete Hoekstra”.
Republican leader Pete Hoekstra (showing in selected node in the graph in figure 5) seems to be outlier in this list. He is very active and well connected in twitter network (Followers: 8561, Following/Friends: 955, Tweets: 478) however in the US Congressmen twitter network he is not following any Congress leader. This may be due to his lack of interest in following any Congress leader.
Figure 5: Network after applying filters on Out-degree and following columns
Headline 4: Eric Cantor (Republican) and John Boehner (Republican) are the most favorite leaders in US Congress Twitter network.
Figure 6: Graph showing most followed leaders in US Congress network.
Description: - In-degree of each node was calculated and the same settings were applied as Headline 3 for size, opacity and color schemes. A dynamic filter was applied on In-degree to display only the nodes having very high In-degree. Two nodes stand out having In-degree 75 and 78.
The above graph (Figure 6) shows that John Boehner (upper node in the graph) and Eric Cantor (lower node in graph) are most favorite leaders in the US Congress followed by 76 and 78 leaders respectively. However to know more about their followers, New graph was created using circle layout. Figure 7 and Figure 8 show Networks of Eric cantor and John Boehner respectively. Graphs show that John Boehner is not followed by any Democrats while Eric Cantor has some followers in Democrats. This shows that Eric Cantor is also popular among Democrats.
Figure 7: Network of Eric Cantor
Figure 8: Network of John Boehner
NodeXL Critique:-
Firstly, let me thank NodeXL team to create this fantastic social network analysis tool. Multiple layout algorithms, various vertex metrics, dynamic filtering, and ease of use make this tool very suitable for analyzing various social networks. Moreover, learning curve is very little as it comes as an excel template.
However there are few points which can be incorporated to make this tool better. The basic concern is performance of this tool for larger data sets. Even in the US Congress twitter network, which has 181 nodes and 5000 edges, response time started increasing. Graph window could not be widened after certain limit, which limits the user’s ability to focus totally on the network graph. However, graph pan can be taken out from excel but after this user finds difficulties to make any change in work book data.
The values of the work book columns used in AutoFill option are not cleared when the “Reset All” button pressed. Users have to manually clear those values. Legends are not generated if AutoFill feature is not used and there is no way to add legend manually to the graph.
Moreover, while doing network analysis, NodeXL crashes sometimes. If NodeXL is left minimized for a long period of time, sometimes it cannot be maximized or restored.
Finally, various features and capabilities provided by NodeXL are really fantastic. Popularity of NodeXL is increasing as it is being used in many classrooms to understand the networks. I wish best luck to NodeXL team and congratulate them for building very strong and promising social network analysis tool.
References:
1. www.twitter.com
2. www.tweencongress.com
3. www.wikipedia.com
4. http://nodexl.codeplex.com/
Name: Puneet Sharma
Department: Computer Science
Email:
Date: 04th November 2009
1
