Final technical report from the Mapping Service Quality project
Quality of Experience and Quality of Service in the Context of an Internet-based Map Service
Markus Fiedler*, Charlott Eliasson*, Patrik Arlos*, Sara Eriksén+ and Annelie Ekelin+
Blekinge Institute of Technology
*Dept. of Telecommunication Systems, 371 79 Karlskrona, Sweden
+Dept. of Interaction and System Design, 372 25 Ronneby, Sweden
{markus.fiedler | charlott.eliasson | patrik.arlos | sara.eriksen | annelie ekelin}@bth.se
Abstract
This report targets the relationship between Quality of Experience (QoE) and Quality of Service (QoS) in the particular context of a network-based interactive map application. This relationship is derived from user ratings (QoE) of response times on the one hand, and the measurement of the consequences of QoS degradations in the form of constant delays, variable delays and losses on those response times on the other hand. While a duplication of the response time makes the user grading sink by one mark on average, the response time as such has proven to be very sensitive to delays and in particular delay variations. The quantitative relationships shown are expected to help application designers to optimise the application in order to provide users with good QoE even under unfortunate network conditions that appear amongst others in wireless or mobile networks. The tests were performed within the interdisciplinary research project Mapping Service Quality, in which ethnographic field studies of map services in use were combined with laboratory-based user tests. Quantitatively oriented researchers from the telecommunication systems area and qualitatively oriented researchers from informatics and human work science have wrestled with each others’ diverse research traditions and terminologies while attempting to cooperate and share the work of exploring, measuring and interpreting the results. In this report, the main focus is that of the telecommunication system researchers.
1. Introduction
Networked ICT services keep on invading our professional and private lives. Increasingly many work processes depend on ICT services. The latter obviously define the efficiency of how tasks are solved. The perceived usability of a service is thus heavily linked to the performance of the service, for instance its responsiveness. Users do not like to wait unnecessarily; long response times might interrupt their flow of thoughts and finally entail a loss of interest [Nie-94].
Quality aspects, seen from the user’s point of view, have gained importance. Users rate the quality of the ICT services explicitly (by rating and commenting on them) or implicitly (by using them happily, reluctantly or not at all). In cases where several alternative ICT services exist and pricing is not a primary matter of concern, quality may become the discriminating factor between different providers and might determine whether the service becomes a success or a failure. While the rather traditional notion Quality of Service (QoS) is mostly related to technical quality parameters related to the actual data transport, the more recently established notion of Quality of Experience (QoE) extends the notion of quality to include user perception and expectations. QoS and QoE are interconnected through measurable so-called Key Performance Indicators (KPI).
Obviously, QoE has strong subjective[1] components and is also connected to the situation and context in which the user finds him- or herself. For instance, the normal response time (RT) of a service (e.g. providing an answer to a user request within four seconds) might be considered sufficient in a relaxed situation, but not when the user is in a stressed situation, e.g. having to make an urgent decision based on the result displayed by the service. User stress makes an impact on the threshold(s) marking off different levels of perceived quality (excellent; good; fair; bad; etc.). More background concerning this particular issue is found in [EEF+07].
In the course of this work, we have aimed at finding quantitative relationships between QoE and QoS in order to provide application designers with means to adapt their application to available networks and the related conditions. We have particularly focused on a Geographical Information System (GIS) Map Service, a web-based client-server application, provided by ESRI S-Group and used amongst others by professional users for planning tasks. We have conducted user experiments to derive the relationship between user-perceived QoE, expressed qualitatively (through comments) and quantitatively (on the well-known scale from 1 = worst to 5 = best), and the response time, i.e. the time the user has to wait for the execution of a command. We have then established quantitative relationships between the response time and network problems such as losses and delays that have been generated in a controlled way using a traffic shaper between client and server.
The structure of the report reflects the steps described above. Section 2 describes typical map planning tasks and investigates the response time and the related KPI in this context. Section 3 describes the measurement set-up, including its instrumentation. Section 4 addresses the user experiments, determining critical response times, and Section 5 shows the measured impact of network disturbances on response times, thus completing the bridge between QoE and QoS. Section 6 concludes and provides an outlook on future work.
2. Planning tasks and response times
Based on initial ethnographic field studies of the map services in use, the following generic planning tasks were identified:
1. Type 1 task: Map manipulations, typically in order to find the desired location;
2. Type 2 task: Selecting the located object and carrying out some related action, e.g. displaying related information, measuring an area, etc.
Internet-based map services generally belong to the category of interactive services [FCI+05]. Users interact with the system in several steps and are thus sensitive to delays that make the system less responsive. From the perspective of a user, an interactive service should display the desired result as quickly as possible after the user has issued the request. Fast response is an important part of what the user considers as good QoE. We define the response time (RT) as the elapsed time between when a user requests information and when the information is received and displayed by the application.
A number of studies have investigated the willingness of users to wait for web-based information, cf. [BBK-00], [BKB-00], [RE-01] and [Zon-99]. Based on these and [Nie-94], [Fie-04] summarises reported thresholds in user perception as follows:
· t > 100 ms: the user notices that the system is not reacting instantaneously;
· t > 1 s: the user's flow of thought is interrupted;
· t > 4 s: the user gets bored;
· t > 10 s: the user's attention is lost.
Indeed, field studies have shown that users feel disturbed by extraordinary delays and try to avoid or “bridge” excessive response times, e.g. by fetching a cup of coffee, while waiting for a task to be completed.
The probability that the response time of the application is less than or equal to a threshold t, Pr{RT≤t}, is a KPI. For instance, a service might be considered to work sufficiently well if Pr{RT ≤ 4s} = 95%, i.e. if the chance that the response time does not exceed four seconds is 95%. This type of specified service level objective is usually part of a Service Level Agreement (SLA).
The response time itself is influenced by the ICT system, consisting of applications (clients; servers; peers) and networks (devices; links). Any kind of disturbance in the ICT system may cause increases of the response times. Examples of such disturbances are
· competing processes or performance limitations at the client and server side (e.g. in the terminal hardware and software), yielding delays in sending and receiving data;
· network-level perturbations such as situations of overload and/or resource limitations on network links and within network equipment, yielding loss and delay when transfering data. Data loss entails extra delays because of necessary data retransmissions. Such perturbations are typically considered to be QoS problems.
The end user perceives the response time at the end of a supply chain, i.e. each entity along the path between client and server application might add additional delays. The more and worse the QoS problems (additional delays and/or loss), the longer the user-perceived response time becomes. It gets closer to, or even crosses, the threshold t more frequently, which means a reduction of the user’s QoE perception.
In order to quantitatively assess service performance, we define a test sequence of tasks as follows:
1. Zoom out: Map manipulation in order to show the full map, which includes a map update (type 1 task).
2. Coordinates: Map manipulation in order to zoom in to a given coordinate, which also includes a map update (type 1 task).
3. Info+format: The user clicks an object (e.g. the borders of a certain area) in order to retrieve related information (type 2 task). The application gets the actual information first (XML file), then the formatting information (XSLT file). After that, the user gets to see the results.
4. Map update: After showing the results, the map is updated automatically. It appears as an automatically initiated type 1 task.
5. Remove data: When the user closes the information window, the map is updated, which is again an automatically initiated type 1 task.
Such tasks provide the foundation for the results shown in Section 4 and in particular in Section 5.
3. Measurement setup
Figure 1 shows the basic measurement setup used for the experiments described in the following.
Figure 1. Illustration of the measurement setup for response time measurements
with measurement points M1 and M2.
The user issues a request, e.g. by clicking on the zoom out symbol in the map (arrow 0), which also starts the perception of the response time. The request is time-stamped by the client application at time T0. Then, the client sends a request towards the server (arrow 1), which can be delayed by the traffic shaper that emulates irregular network behaviour. The server time-stamps the successfully received incoming request at time T1, processes it and initiates the transmission of data towards the client (arrow 2), time-stamped at time T2. Finally, the results are displayed to the user (arrow 3), which is time-stamped by the client at time T3. We thus obtain the response time in the user interface as RT(M1) = T3 – T0 and the response time at the server as RT(M2) = T2 – T1, respectively. Observing that T0 < T1 < T2 < T3, we immediately see that the response time obtained from server logs is smaller than the one observed by the user, i.e. RT(M2) < RT(M1). The latter also captures the influence of the network stacks at client and server as well as potential extra delays introduced by the traffic shaper. The difference between both will be show in Section 5.
Additionally, network traffic traces have been taken at entrance and exit of the traffic shaper in order to illustrate the effect of the latter on network level, which means on the timing of packets. These observations will help to clarify some of the behaviours seen in Section 5.
4. User experiments
We first turn our attention to the user perception of response times. Here, we confine ourselves to average behaviours in order to illustrate trends. An experienced professional user was asked to perform a self-chosen set of planning tasks, while the response times were varied in the background through setting different network-level delays in the traffic shaper, and provide Opinion Scores (OS) on a scale of 1 (= worst) to 5 (= best) as a QoE measure. Out of the 172 experiments, the user rated 16 times ”5”, 32 times ”4”, 55 times ”3”, 40 times ”2”, and 29 times ”1”. For each of these ratings, the corresponding RTs were averaged, yielding mRT. The result is shown in Figure 2, together with a logarithmic best-approximation.
Figure 2. Quality of Experience (QoE) in form of Opinion Scores (OS) vs. average response time m(RT) in a
user experiment.
The logarithmic relationship OS = a + b×log(mRT) proved to be superior to linear, exponential or power relationships. We can thus confirm the results presented in the ITU-T standard G.1030 [G.1030] and also the general results from [Nie-94], [Fie-04] as outlined in Section 2: An average delay of one second obviously does not do much harm (OS = 5), while for a delay of about four seconds, the user gets “mixed” feelings (OS = 3). Furthermore, each time RT (roughly) doubles, the OS sinks by one unit.
The logarithmic match works best in the area of high OS, while the deviations become larger for low OS. In case of long RTs, the test person sometimes empathised with the system having to handle such a lot of data. This can explain at least some of the deviations, in particular for OS = 2.
5. Network level experiments
In this section, we will investigate the impact of controlled QoS degradations on the RT, which then relate to QoE as outlined in Section 4. Combining the results of this section with those of Section 4, we will be able to draw conclusions about the relationship between user perception and (measurable) network performance.
Constant and varying delay as well as loss will affect the direction of the data, i.e. towards the end-user. In cases with less than one second delay, 40 subsequent RT measurements have been carried out, otherwise 20. This happened for each of the tasks introduced in Section 2. Again, in order to focus on the trends, we confine ourselves to showing averages of the measurements.