School of Computer and Information Science

CIS Research Placement Report

Agile Visualisation Using Roassal in Smalltalk.

Marc Seyfang

Date: 08/11/16

Supervisor: Dr. Georg Grossmann

Abstract

The first aspect of this project was to research what agile visualisation is and how it can be useful. Another part of the project was to learn how to code in the Smalltalk programming language in the VisualWorks development environment; specifically how to use the Roassal visualization engine to create a number of data visualisations in order to aid data analysis. The final section of the project was to construct a number of visualisations to help decide on more complex visualizations to work on.

Contents

1Introduction

2Visualisation

3Smalltalk and Roassal

4Implementation

4.1Circular Tree Map

5Conclusion

6Bibliography

1Introduction

The aim of this project was to create visualisations of data, using the agile visualisation method by programming in Smalltalk using VisualWorks. The first step in achieving this was to research what agile visualisation is and how it can be useful. The nextpart of the project was to learn how to code in the Smalltalk programming language in the VisualWorks development environment; specifically how to use the Roassal visualization engine to create a number of data visualisations in order to aid data analysis. The final aspect of the project was to construct a number of visualisations to help decide on more complex visualisations to work on. The first visualisation that will be looked at is the weighting of data based on the number of connections to different nodes. The second visualisation is the use of circular tree map for the representation of data in a hierarchical structure. These visualisations will then be able to provide an insight into the benefits of visualising data.

2Visualisation

Data is the collection of facts, which can include measurements, observations and descriptions. Raw data is data that has not been processed, such as a list of everyone’s eye colour in an area; this list could contain thousands of pieces of data and can be somewhat meaningless without it being processed.(Steele 14 Feb. 2012)Data visualisation is the presentation of data such as this in a graphical or pictorial format; common data visualisations include graphs, charts, trees and maps. Once the data is processed into one of these visualisations it can have greater meaning; with the example data from above a pie chart could show the percentage of people with specific eye colours; thisgives the viewer some useful information from the data that they would not easily get from the raw data.

This process of extracting useful information from raw data is called data analysis. When completing a data analysis, the useful information can sometimes be difficult to extract from the data, and this is where agile visualisation can be useful. Agilevisualisation is process of creating many data visualisations in a short time period; the quicker a visualisation can be created the more can be produced, which helps data analysts arrive at the useful information more quickly. (Bergel 6 September 2016)

3Smalltalk and Roassal

There are many programming languages to consider when choosing a language to program in. Smalltalk is an object-oriented programming language which is ideal for rapid, iterative development. (Leon 3 April 2007)Smalltalk also contains the Roassal visualization engine which contains many methods for agile visualization.For thesereasonsVisualWorks, a Smalltalk development environment, was chosen as the program to be used for the creation of data visualisations.

Roassal creates a visualisation with the following components; views, elements, shapes, edges and interactions. A view is a container of the graphical elements or nodes of the visualisation; these elements are representations of an object which contains information such as a number or a string.These elements can be added and removed from the view. The graphical representation of these elements can be changed using the Roassalcomponent shape; this includes the following shapes; circle, box, and labels. Edges can be created, connecting elements to one another to represent a relationship between the two elements. Finally the user is able to interact with the visualisation and position the elements around the view; hovering over elements displays the name of the corresponding object. All of these Roassal features can be used for the benefit of data analysis to create many different visualisations in as short a time as possible.

4Implementation

For this project, some data was provided to try out some visualisations; the data was a csv file containing 3 columns of data, a column for subject, predicate and object nodes. Each piece of data contained in these columns is a string representing a URL. Each row in the data contains one subject predicate and object node and these nodes can be connected forming a relationship; this relationship can then be extracted into an ordered connection callededgeAssociations. When learning how to program in Smalltalk, possible ways to visualise this data wereconsidered; figure 1 below contains some of the starting layouts of the data.

Figure 1: (left) Circular layout, (center) force based layout and (right) rectangle pack layout

The above visualisations use the view RTMondrian; Mondrian is a code library designed to build expressive and flexible visualizations. The Circular layout above evenly distributes the elements around a circle; this is useful for seeing the connections between the nodes. For a force based layout, elements repel one another similar to electric charges repelling and a rectangle pack layout packs all the elements as tightly as possible.

After learning more about Smalltalk and visualisation, a specific visualisation was chosen to work on, increasing the size of the nodes based on how many connections they had. First a feature of Smalltalk called normalizer was found that could adjust their size of the nodes based upon a variable or method of the object. However, this did not solve the problem as the objects in the given data are just strings, meaning that only a few variables and methods are available for use, such as #size, which returns the length of the string.This can be seen in figure 2 below.

Figure 2: The data adjusted by the size of the string.

To solve this problem, a new class was created call AVNode, which stands for Agile Visualisation Node. The AVNode class contained a variable URL, to contain a string that would hold the data, and the AVNode also contained asorted collection called edges to store a list of the nodes that connected to it. Finally, the AVNode class was given a method called countEdges which returns the number of nodes stored in the AVNode’ssorted collection, edges. An AVNode was created for each piece of data and placed in an ordered collection called allNodes.This ordered collection allNodes was then looped over and each node was compared to the list of edgeAssociations, and when a match was found the corresponding nodes were added to the edge collection of the AVNode. This allowed allNodes to be used as the nodes of the RTMondrian and the normaliser to use the method #countEdges, which can be seen in figure 3 below.

Figure 3: Data weighted based on the number of edge connections.

As shown in figure 3 the nodes have variable size, however the connection stopped being shown. It was found that now that the nodes represent AVNodes rather than strings, the edgeAssociations collection used to generate the edges no longer worked. This was because the edgeAssociations collection contains associations of strings to strings rather than AVNode to AVNode. A new collection had to be created using AVNodes;this was done by looping over the edgeAssociations collection and the allNodes collection, and for each edge association, adding the nodes that correspond to the association to a new collection of associations edgeNodes. This new collection was in the correct format of AVNode to AVNode, which allowed edgeNodes to be used in the formation of connections in the visualisation, as seen in figure 4.

Figure 4: Data weighted and showing connections.

Originally when the nodes were strings, you could hover over the nodes and it would display the string; this is part of the interaction that Roassal provides. However, now that the nodes are AVNodes, when hovering over the nodes the string is not displayed - just the text ‘an AVNode’. This slightly limits the usefulness of the visualization; for example, if the user hovers over the largest node it will no longer tell them which URL it represents.This problem could be solved using highlights, popups, or labels; labels were investigated and it was found that labels can be added when specifying the shape of the nodes. These labels can be added by specifying an aspect of the nodes, in this case the label will be the URL string of the AVNode using the code ‘withTextAbove: #url’. A number of layouts were tried for the final visualisation, including cluster, sugiyama and force; however, the best layout found for visualising the size differences and the connections was the circle layout because the spacing between the nodes makes the visualisation clearer as seen in figure 5.

Figure 5: Final weighted and labelled visualisation of the data.

4.1Circular Tree Map

A second visualisation for this data was attempted, the circular tree map; a tree is a hierarchy visualisation where there is a root node and this root node branches into child nodes and child nodes can further branch in more nodes. A circular tree map is very similar where the root node is a large circular graphical element and child nodes are circular graphical elements inside the parent node as seen in figure 6. (Bergel 6 September 2016)

Figure 6: A circular tree map visualising the root node RTObject and all subclasses.

In figure 6 the smaller circles within larger circles are subclasses of the larger circles, their superclasses. The transparent circles are ones that contain smaller circles and hence have subclasses and the purple circles are classes that do not haveany subclasses.

This tree visualisation is not really compatible with the original data without some manipulation of it, as it is not in the parent and child format. As the strings in the data are in the format of URLs, with all of the data coming from two different websites and the data further branching from the subfolders and subpages of the websites. In order to create compatible data to use the circular tree map, a new class was created, a StringNode; this StringNode contain a string called substring which contains the portion of the URL that makes it unique. Each StringNode also contains a collection of nodes called substringNodes that contain all the StringNodes that are the substrings of the URL.

This allows a hierarchy to be formed using the strings and substrings; for example,the string ‘ and the string ‘ would be subNodes of the parent node ‘ as this is where the website pages diverge. The resulting circular tree map can be seen in Figure 7.

Figure 7: Circular tree map of the hierarchy of the websites and subpages on the site.

5Conclusion

Research was conducted to find out what visualisation is and how it can be useful; it was found that visualisation is the process of presenting data in a graphical or pictorial way and that the main benefit of visualisation is to improve the usefulness and meaning of raw data.In order to visualise the data a program had to be used; the chosen program wasVisualWorksbecause ofSmalltalk’s benefits for rapid, iterative development along with the capabilities of the Roassal visualization engine. It was found that Roassaluses views, elements, shapes, edges and interactions to create a number of visualisations for data analysis.

Agile visualisation was found to be the process of creating many data visualisations in a short time period to help find useful information from data as quickly as possible. This was done initially and a number of basic visualisations were attempted, then two specific visualisations were attempted. The first visualisation was to adjust the size of the nodes depending on the number of connections. The second visualisation constructed was a circular tree map to represent the data as a hierarchy depending on the subfolders of the website URL. In conclusion, it was found that different visualisations can be constructed from a singular data source and these visualisations can be useful in finding more meaning in the data.

6Bibliography

Bergel, A. (6 September 2016). Agile Visualization. Acessed 10/11/16. Web URL:

Leon, R. (3 April 2007). "Why Smalltalk." Acessed 13/11/16. Web URL:

Steele, J. (14 Feb. 2012). "Why Data Visualization Matters.". Accessed 12/11/16. Web URL: