1
Use of the E-Model to Estimate Call Quality in eHealth
1
Use of the E-Model to Estimate Call Quality in eHealth
Legal
Copyright 2018 Concord Communications, Inc. All rights reserved. The information provided in this document is the proprietary and confidential information of Concord Communications, Inc. (Concord). You may use this information solely for your internal evaluation as requested by Concord. This information is subject to change at any time without notice, and Concord makes no representation, warranty or covenant regarding this information or any software or other
table of contents
Legal......
I.Introduction
II.Basic Technology Issues
E-Model and Voice Quality
Jitter
III.VQM Parameters and Variables
Input Parameters
Performance Variables
Live Exceptions and Traps
IV.Reports
V.Further Technical Detail
E-model Parameters, Defaults, Approximations and Calculations
Codec Considerations
Jitter Parameters
VI.Throughput Calculations
VII.Performance Benchmarks
VIII.References
Appendix A E-model Calculations
E-model Transmission Rating Factor
Mean Opinion Score (MOS)
Impairment/Calculated Impairment Planning Factor (Icpif)
Appendix B Impairments Due to Packet Loss
Appendix C Constraints on Jitter/MOS Parameters
I.Introduction
Voice Quality Monitor (VQM) permits eHealth to generate synthetic traffic that looks to the network like voice-over-IP traffic, and gather metrics critical to voice quality, such as latency, jitter and packet loss. In addition VQM calculates voice quality metrics such as transmission rating factors and estimated Mean Opinion Scores.
.
Voice Quality Monitor consists of pairs of stations, which exchange traffic streams. These streams look to the network like the Real-time Transport Protocol (RTP) characteristic of voice-over-IP traffic.
Each VQM pair consists of a transmit/receive module (T/R) and a Reflector (Rf). The T/R module initiates tests in the form of voice-over-IP packet streams. Each packet is received by the Rf module and reflected back to the T/R module. Performance information is collected and consolidated at the T/R module, where it is incorporated into an SNMP Agent and made available to eHealth through SNMP polling. Aview is used to configure the modules, and eHealth learns about them through explicit discovery. Figure 1 is a VQM layout.
Voice-over-IP traffic is generated by codecs – short for coder/decoder -- which convert human voice directly into network packets or, alternatively, circuit-switched T1 or E1 bits into network voice packets. The VQM stations are designed to emulate the behavior of the codecs.
Figure 1. Voice Quality Monitor Layout
II.Basic Technology Issues
Two concepts are basic to an understanding of eHealth VQM: voice quality, as represented by the E-Model, and the notion of jitter. We discuss each in turn.
E-Model and Voice Quality
Speech quality for voice-over-IP or, indeed, any type of voice call, is typically evaluated by use of Mean Opinion Score (MOS) ratings. A Mean Opinion Score is a subjective benchmark for measuring quality of speech. A large number of listeners evaluate the quality of a voice sample on a scale of 1 to 5 where 5 is excellent and a 4 or above is considered toll quality. The resultant average is defined as the Mean Opinion Score.
There are analytical techniques that are capable of predicting MOS ratings with a reasonable degree of accuracy. The technique most commonly used in this category is the E-model, which was designed as a tool to enable system designers to predict call quality from first principles, but can be adapted to serve as a performance measurement tool as well. [9]
The E-Model is an analytical model that predicts call quality from mathematical calculations based on a set of 18 input parameters. Details of the E-Model calculation can be found in Appendix A.
Table 1, obtained from Reference [9], offers the E-model parameters and recommended ranges. The E-model is not considered valid outside these ranges.
Table 1. E-Model System Parameters
Descriptions of these parameters and how they can be used can be found in References [8], [9], [10], [11] and [12].
The E-model does not calculate the MOS directly. Rather it calculates a figure of merit known as a transmission rating factor, generally expressed as R, which can be converted to a MOS. Strictly speaking, MOS values are subjective. What we are really calculating is an analytical approximation to a subjective MOS value.
Values of the transmission rating factor range roughly up to 100 where value ranges have the following significance: [10]
90 – 100users very satisfied
80 – 90users satisfied
70 – 80some users dissatisfied
60 – 70many users dissatisfied
59 – 60nearly all users dissatisfied
The transmission rating factor is obtained from five terms:
R = Ro – Is – Id – Ie + A
Where the terms are defined as follows:
RoE-model basic signal-to-noise ratio
IsE-model simultaneous impairment factor
IdE-model delay impairment factor
IeE-model equipment impairment factor
AE-model advantage factor
Details of how to calculate these terms from the system parameters in Table 1 as well as the algorithm for converting R to a MOS rating can be found in Appendix A.
Recognizing that measured values of all 18 system parameters might not necessarily be available -- they include such variables as loudness ratings, room noise and a rather subjective advantage factor -- the framers of the standard were considerate enough to include a comprehensive list of defaults, as shown in Table 1.
Jitter
Jitter is a measure of variation in inter-packet delay and has a critical impact on voice quality. Most packet streams tend to spread out in time as they traverse a network, increasing the delays between successive packets. This behavior is called positive jitter. Under some circumstances the packets can experience the opposite effect with the inter-packet delays decreasing in time, offering negative jitter. Either way, if the magnitude of the jitter is too large, voice quality suffers. See Figure 2.
Figure 2. Jitter Diagrams
To compensate for excessive jitter, codecs make use of buffers, which delay packets at the receiving end and force them to arrive in a packet stream with constant inter-packet times. This is done at the expense of adding to the overall delay, which also has an effect on voice quality, implying a tradeoff of effects. There are limits to how long a packet can be delayed. If a packet is too late in arriving at the jitter buffer, it has to be discarded.
Mathematically, jitter is defined as the difference in time between the receipt of two successive packets, minus the difference in time between the transmission of those same two packets:
Ji= (Ri+1– Ri) – (Si+1 – Si)
Where
Jiis the jitter between packet i and packet i+1
Siis the time packet i is sent, and
Riis the time packet i is received.
Thus jitter is positive if the receive separation is larger than the transmit separation and negative if receive separation is smaller than send separation. Jitter can be measured in each direction, from source to destination (JitterOut) and from destination to source (JitterIn). [17]
Relevant definitions are as follows:
n-1
Jitter = (1/(2(n-1))) ( |Jouti| + |Jini|)
i=1
n-1
JitterIn = (1/(n-1)) |Jini|
i=1
n-1
JitterOut = (1/(n-1)) |Jouti|
i=1
n-1
NegativeJitter = (1/(2(n-1)))( |min(Jouti,0)|
i=1
n-1
+ |min(Jini,0)| )
i=1
n-1
PositiveJitter = (1/(2(n-1)))( |max(Jouti,0)|
i=1
n-1
+ |max(Jini,0)| )
i=1
where n represents the number of packets in a stream and n-1 the number of pairs of successive packets.
III.VQM Parameters and Variables
Input Parameters
eHealth support for VQM requires the user to provide a set of configuration parameters. VQM input parameters consist of
- a set of explicitly configurable parameters,
- several parameters related to the characteristics of the codec,
- several parameters that characterize the jitter buffers, and
- all E-model inputs that are not obtained from performance data.
The explicitly configurable parameters are input via Aview while the other three sets of parameters are included in parameter templates. There are three predefined templates, each representing parameters associated with a particular type of codec. These parameters are not entirely independent. Rather they are subject to the mathematical constraints described in Appendix C. Table 2a displays the Aview configuration parameters and Table 2b the parameter templates.
Users can create additional templates if they don’t like the predefined configuration parameters, by copying a template and editing it as necessary. Under these circumstances, the user is responsible for satisfying the parameter constraints.
Only the codecs represented in Table 2b can be supported (G.711, G.729-A + VAD, and G.723.1-A + VAD), since the E-model calculation depends on impairment-versus-packet-loss curves (Appendix B) and, at present, such curves are available only for those three codecs. As additional data becomes available, support for additional codecs can be added.
No GUI is required to edit a parameter template. The user modifies a flat file using a text editor.
Performance Variables
Performance variables consist of
- polled performance variables, and
- trend variables
Trend variables are a superset of polled performance variables. All polled performance variables are potentially trend variables (although some may not make much sense to present to the user). The additional trend variables are the results either of calculations on one or more polled performance variables or of internal eHealth calculations. The polled performance variables are shown in Table 2c. These are the MIB variables made available to eHealth via SNMP. The calculated trend variables are shown in Table 2d.
The eighteen E-model input parameters are obtained partly from the parameter templates and partly from calculations based on polled performance variables. While the user may change the template parameters (subject to the parameter constraints discussed above), there is no provision for the user to override any E-model input parameters calculated from the performance variables.
A number of performance variables offer quality metrics based on the E-Model calculation. The E-Model calculation itself is invoked for each test cycle – each defined packet stream -- and yields a set of gauges such as MOS, R, Ro, Io and others (Section V). However eHealth polls the agent every five minutes or so and needs to obtain averages for all the test cycles during that interval. To accomplish this, the pollable variables provide the sums of the gauges, defined as counters, and the number of test cycles, also defined as a counter. eHealth obtains counter differences over the poll period and a simple division yields an average, e.g., average value of MOS over the poll period.
Another noteworthy category of performance variables provides packet loss information. Network packet losses in either direction are explicitly measured. However to these measured losses must be added those due to inter-packet jitter exceeding the configured buffer discard threshold.
Both the T/R and Rf modules need to be able to tally the number of packet losses due to jitter, i.e., the number of received packets where jitter exceeds the jitter buffer discard threshold. There are no actual jitter buffers and the T/R and Rf modules do not actually drop packets due to jitter. Rather they keep track of any packets which exceed the jitter buffer discard threshold (Table 2a).
This permits a calculation of the total number of packets lost in either direction by adding the number of pseudo jitter losses to the measured network packet losses.
Live Exceptions and Traps
Live Exceptions are based on trend variables and are summarized in Table 2e. For the most part they identify cases where voice quality is degraded or is threatening to degrade. There are no defined traps, since all the obvious cases are amenable to Live Exception definitions and there seems little reason to invoke two mechanisms when one is sufficient.
Table 2a. Jitter/MOS Path Configuration Parameters
Table 2b. Jitter/MOS Path Parameter Templates
Table 2c. Polled Performance Variables
Table 2d. Other Trend Variables.
Table 2e. Live Exceptions
IV.Reports
eHealth views each pair of T/R and Rf modules as a Response Path. eHealth support for VQM includes all existing Response Path reports, including the Health, Top N and Trend Reports. As with all other Response Path tests, a special At-a-Glance Report has been created for VQM. See Figure 3.
The usual drilldowns/drillups are supported, including those involving Live Exceptions and Aview, specifically
Drill downs from the Live Exceptions screen to the following locations:
- View Complete Exception/Trap Contents
- View, Edit, Disable, or Delete the Triggering Monitoring Entry, to permit the user to fine-tune thresholds
Drill Up from AdvantEDGE View to the following reports:
- At-a-Glance
- Trend
- Health
- Top N
- Live Trend
- Live Status
Figure 3. Voice Quality Monitor At-A-Glance Report
V.Further Technical Detail
This section provides more detail on how VQM handles E-Model calculations, modeling of codecs, and jitter calculations.
E-model Parameters, Defaults, Approximations and Calculations
The E-Model calculation of the MOS, described in Appendix A, consists of 17 inputs and 7 outputs:
- Inputs: SLR, RLR, STMR, LSTR, Ds, TELR, WEPL, T, Tr, Ta, qdu, Ie, Nc, Nfor, Ps, Pr, A,
where T, Tr, Ta, and, Ie are calculated from performance variables and the rest are obtained from the service profiles
- Outputs: MOS, Ro, Io, Idte, Iq, Idd
There are 17 rather than 18 inputs since the inputs are not entirely independent, as indicated in Table 1, Note 2.
A variety of relevant voice quality variables can be calculated from the outputs, including the Impairment/Calculated Impairment Planning Factor (Icpif) described in Appendix A.
The E-Model calculation is invoked once for each test cycle. The values of T, Tr, Ta and Ie are calculated from polled performance variables averaged over the appropriate test cycle. In general, there are multiple test cycles in each polling interval, each test cycle representing a “conversation” and requiring an invocation of the E-Model calculation.
To keep the reported variables other than those derived from the E-Model consistent with the E-Model variables, counters such as successful_attempts, failed_attempts and the various jitter sums are incremented only at the end of each test cycle.
eHealth uses standard default values for the following E-Model input parameters:
SLRSend Loudness Rating
RLRReceive Loudness Rating
STMRSidetone Masking Rating
LSTRListener Sidetone Rating
DsD-value of telephone at send-side
TELRTalker Echo Loudness Rating
WEPLWeighted Echo Path Loss
NcCircult Noise referred to the 0 dBr-point
NforNoise floor at the receive-side
PsRoom noise at the send-side
PrRoom noise at the receive side
We set the values for the remaining E-model input parameters as discussed below:
TaAbsolute Delay in echo-free Connections
This is the mean one-way delay between the transmitting and receiving telephone sets [10]. It represents the sum of codec delay, network delay and jitter buffer delay. These delays can be obtained from a combination of configuration parameters and measured performance variables:
Codec delays are codec-specific and require knowledge of the codecs in use.
Network delay represents one-way trip time, which is estimated by taking the average measured network round-trip time and dividing by 2
Jitter buffer delay is the delay that results from buffering packets at the receive end to smooth out jitter.
Ta can be calculated from configuration and performance variables as follows:
Ta (in) = (1/2)*sum_response_times /number_successful_attempts + codec_delay (in) + jitter_buffer_delay (in)
Ta (out) = (1/2)*sum_response_times/number_successful_attempts + codec_delay (out) + jitter_buffer_delay (out)
The performance variables used in the delay calculations above (sum_response_times , number_successful_attempts) are sums over a single test cycle.
TMean one-way Delay of the Echo Path
This parameter represents the delay from the transmitting station to the point at which there is an echo. We assume the echo to be from the receiving station and set T equal to Ta.
T (in) = Ta (in); T (out) = Ta (out) as defined above.
TrRound Trip Delay in a 4-wire Loop
This variable represents the round-trip delay due to 4-wire connections. We simply add together the two one-way delays defined above.
Tr = sum_response_times/number_successful_attempts + codec_delay (in) + jitter_buffer_delay (in) + codec_delay (out) + jitter_buffer_delay (out)
qduNumber of Quantization Distortion Units
A qdu is the noise that results from a complete A-law/mu-law encoding cycle of analog to digital and digital back to analog. The noise resulting from encoding/decoding by a pair of G.711 codecs represents 1 qdu. [10] For encoding other than A-law or mu-law, we use Ie in place of qdu, but leave the value of qdu at 1. See Reference [10], page 53.
IeEquipment Impairment Factor
This parameter represents all impairments resulting from equipment, including packet loss and jitter. We calculate it from total packet loss, which includes losses due to the network as well as losses due to jitter.
To obtain Ie we make use of curves fitted to experimental data in Appendix B:
Ie (in) = c0 + c1 * ploss (in)
where
c0, c1 depend on the codec and ploss range, as discussed in Appendix B
ploss (in) = 100 * total_packets_lost_in/(successful_attempts + network_packets_lost_in)
Ie (out) = c0 + c1 * ploss (out)
where
c0, c1 depend on the codec and ploss range, as discussed in Appendix B
ploss (out) = = 100 * total_packets_lost_out/(successful_attempts + failed_attempts)
The performance variables that are used in the ploss calculation (total_packets_lost_in, total_packets_lost_out, successful_attempts and failed_attempts ) represent averages over a single test cycle and not an entire poll period.
The total packets lost figures are obtained by adding jitter buffer discard losses to measured network packet losses. The jitter buffer discard losses are obtained by the T/R and Rf plug-in modules by comparing the jitter buffer discard threshold against measured values of jitter to determine the number of packets lost due to jitter for each test cycle.
AAdvantage Factor
The advantage factor represents adjustments to the model based on “advantage of access,” i.e., a lowering of customer expectations based on access advantage such as use of a mobile wireless medium. For standard wired access, A is zero. A assumes nonzero values for mobile and satellite situations and is subtracted from the overall rating.
For our default, we make the conventional wirebound assumption in Reference [9] and set A to zero.
Codec Considerations
Codecs convert between analog voice and packetized data. Analog-to-packet conversion involves the following three processes:
- Analog to pulse-code-modulated (PCM) conversion, subject to A law or Mu law, which converts analog voice to non-packetized digital data suitable, in principle, for T1 or E1 transport.
- A compression algorithm, which actually compresses the data
- A packetization process, which places digital data into packets.
Packet-to-analog conversion just reverses these three processes. In practice, these delays overlap somewhat. We combine all of them into a single codec delay, representing analog-to-packet and packet-to-analog delays, with data taken from Reference [14].