chapter 13 QOS Congestion Management

Resources used:

IP Telephony Self-Study - Cisco QoS Exam Certification Guide, Second Edition

CCIE Routing and Switching Exam Certification Guide 3rd Edition

A queue is nothing more than a series of pointers to the memory buffers that hold the individual packets that are waiting to exit the interface.

If a queue is full tail drop occurs.

Increasing the length of the cue will cause on average greater delay or jitter for a packet, but less likely to be tail dropped.

All queues are FIFO, the scheduler controls how often packets are taken from each cue.

The size of the packet doesn’t affect the length of the cue.

There is a TX ring ( FIFO cue) just before outbound interface where packets cue after the scheduler has move a packet from the queue towards an interface. The point of this cue is to keep link utilization as high as possible. The interface ASIC’s knows how to fetch packets from the TX ring so it doesn’t have to wait for the CPU to send the packet to the actual interface.

The TX ring is unaffected by QOS settings it can however affect queuing performance depending on its length.

Short hardware queue lengths mean packets are more likely to be in the controllable software queues, giving the software queuing more control of the traffic leaving the interface.

IOS shortens the interface Hardware Queue automatically when a software queuing method is configured.

On subinterface ( frame relay PVC) queuing can be configured on either the sub or the actual interfaces or both depending how you exactly want to control the traffic.

IOS automatically uses WFQ on E1 or slower links ( to use FIFO queue in this instance the no fair-que command will need to be issued)

Priority Queuing

PQ has 4 queues , if there are any packets in the high queue it is always served first, this can lead to starvation of lower queues.

Custom Queuing

Has 16 queues but doesn’t have a much service high que first feature which can harm jitter and delay sensitive packets ( VOIP etc)

Performs Round robin on each queue

CQ takes packets from the queue, until the total byte count specified for the queue has been met or exceeded. After the queue has been serviced for that many bytes, or the queue does not have any more packets, CQ moves on to the next queue, and repeats the process.

CQ does not configure the exact link bandwidth percentage, but rather it configures the number of bytes taken from each queue during each round-robin pass through the queues.

Queues are numbered 1 to 16

Because there is an amount of data configured per queue, if there is a queue of 1501 bytes configured and two 1500 byte packets arrive in the queue both will be served this means the queue actually served 3000 bytes not 1501.

Modified Deficit Round-Robin (MDRR)

Works like CQ but any extra bytes taken from a queue service are minus from that queue the next time around.

Weighted Fair Queuing (WFQ)

Doesn’t require any configuration

Is based on flows ( source and dest ip, MAC, port number and ip precedence)

Each flow has its own queue ( WFQ has 4096 queues and must be a power of 2 (256,1024))

WFQ prefers high ip precedence small packets over low precedence large packets.

A flow only exist as long as packets relating to it are queued

WFQ tries to service all queues fairly

WFQ gives aprox its ip precedence value plus 1 in terms of bandwidth per flow. So if a flow had a precedence of 7 it would get 8 times the bandwidth of a flow with a precedence of 0 (0+1).

lower volume flows get relatively better service, and higher-volume flows get worse service. Higher precedence flows get better service than lower-precedence flows. If lower-volume flows are given higher-precedence values, the bandwidth /delay /jitter /loss characteristics improve even more.

WFQ uses Sequence numbers(SN) to determine the which packet should be served first. The lowest SN gets across all queues gets served first.

SN calculation

Previous_SN + (weight * new_packet_length)

Where “weight” is calculated as follows:

Weight = 32,384 / (IP_Precedence + 1)

Precedence After 12.0(5)T/12.1

0 32384

1 16192

2 10794

3 8096

4 6476

5 5397

6 4626

7 4048

Order of packets out : 13, 14, 15, 16, 9, 5, 10, 1, 11, 6, 12, 2, 7, 8, 3, 4.

WFQ places an absolute limit on the number of packets queued among all queues; called the hold-queue limit. If a new packet arrives, and the hold-queue limit has been reached, the packet is discarded.

If WFQ needs to drop a packet from within a queue see the flow diagram below.

CDT is congestive discard threshold.

WFQ has 8 “special queues” that have a very low weight, these are for router traffic (routing protocols) and RSVP ( interServ).

Class-Based WFQ (CBWFQ)

Define classes by using MQC.

One class can be configured to use WFQ ( the class default queue)

Cisco 7500 series routers support either FIFO or WFQ inside each and every CBWFQ queue, whereas other platforms only support both FIFO and WFQ inside CBWFQ’s class-default queue.

Supports tail drop and Weighted random early detection

Supports upto 64 classes/queues

The scheduler gives a percentage of the bandwidth to each class, based on the configured values.

Maximum queue length Varies per router model and memory

If a bandwidth statement is configured within the policy map and it exceeds 75% of the interface bandwidth when the service map is applied it will error. This where bandwidth percentage is useful.

LLQ low latency queuing

Add a priority queue to CBWFQ which is always serviced if it has a packet in it.

LLQ polices the queue if queue uses more then its allotted bandwidth then packets are discarded.

Multiple LLQ’s can be configured so that the effects of the policing statement can be more granular but all the LLQ queues are treaded as one FIFO queue.

low-latency queues in one policy map does enable you to police traffic more

granularly, but it does not reorder packets among the various low-latency queues.

to create an LLQ configuration, with some non-LLQ queues, and you want to

subdivide the bandwidth amongst the non-LLQs based on percentages, use the bandwidth remaining percent command.

So bandwidth remaining = bandwidth – default class(25%) – LLQ

TCP windowing

There are two windows types:

Advertised window, this is normal TCP window that grows slowly over time unless a packet is dropped, not ack’d. Windowing works by the receiver of the tcp connection changing the window size as time goes on ( larger and larger).

Congestion window, (CWND) varies its size much more quickly to react to congestion. It is never sent and is calculated by the sender

The TCP sender always uses the smallest of the two is used for the window size.

A drop of a packet works like this

A TCP sender fails to receive an acknowledgment in time, signifying a possible lost packet.

The TCP sender sets CWND to the size of a single segment.

Another variable, called slow start threshold (SSTHRESH) is set to 50 percent of the CWND value before the lost segment.

After CWND has been lowered, slow start governs how fast the CWND grows up until the CWND has been increased to the value of SSTHRESH.

After the slow start phase is complete, congestion avoidance governs how fast CWND grows after CWND > SSTHRESH.

Slow start increases CWND by the maximum segment size for every packet for which it receives an acknowledgment. Because TCP receivers may, and typically do, acknowledge segments well before the full window has been sent by the sender, CWND grows at an exponential rate.

At SSTHRESH congestion avoidance sets in and forces a liner scale in growth.

Global synchronization is caused by the flows on a link following the above because many flows hit congestion at the same time and then all follow the same process creating a graph that looks like this:

The overall rate does not drop to almost nothing because not all TCP connections happen to have packets drop when tail drop occurs, and some traffic uses UDP, which does not slow down in reaction to lost packets.

Random Early Detect (RED)

Red first measures the average queue depth, this is done to determine when the link is in congestion.

Red wants to avoid global synconization, so it must drop in a controlled manner and not a jerky sporadic fashion.

RED algorithm New average = (Old_average * (1 – 2–n)) + (Current_Q_depth * 2–n) .

Default for n = 9 ( also known as Exponential weighting constant)

the current queue depth only accounts for .2 percent of the new average each time it

is calculated.

RED has two thresholds

Minimum: A percentage of packets dropped. Drop percentage increases from 0 to a maximum percent as the average depth moves from the minimum threshold to the maximum.

Maximum : All new packets discarded similar to tail dropping. While this action might seem like a Tail Drop action, technically it is not, because the actual queue might not be full yet.

In between the two thresholds, however, RED discards a percentage of packets, with the percentage growing linearly as the average queue depth grows.

RED randomly picks the packets that will be discarded.

Weighted RED (WRED)

WRED is much the same as RED but it creates drop profiles for each class, which consist of minimum threshold, maximum threshold and percentage of packets to discard.

Instead of directly configuring the discard percentage, you configure the Mark

Probability Denominator (MPD), with the percentage being 1/MPD.

WRED can be enabled on a physical interface; it cannot be concurrently enabled along with any other queuing tool! When using Modular QoS command-line interface (MQC) to configure queuing, WRED can be used for individual class queues

WRED is based off ip precedence or DSCP which means even if you have it turned on in a class within a policy map packets within that queue must have either DSCP or IP precedence marked.

WRED calculates the average queue depth

WRED then compares the average queue depth to the minimum and maximum thresholds to decide whether it should discard packets.

Default drop settings

Class selector DSCP values use the same WRED profile settings as their orresponding precedence values

You cannot enable WRED inside a class configured as the low-latency queue

WRED only operates on FIFO queues, so for CBWFQ it operates within each class.

policy-map queue-on-dscp

class dscp-ef

priority 58

class dscp-af21

bandwidth 20

random-detect dscp-based

class dscp-af23

bandwidth 8

random-detect dscp-based

class class-default

fair-queue

random-detect dscp-based

Explicit Congestion Notification (ECN)

Based of WRED but instead of dropping packets it changes flags in the tcp headers and then forwards them. This flag forces the host to drop the congestion window (CWND) by 50 percent.

ECN causes the sender of the randomly-chosen packet to slow down

1. A TCP sender has negotiated a TCP connection, and both endpoints agree that they can support ECN. To indicate that support, the TCP sender sets the ECN bits to either 01 or 10. (If a TCP sender does not support ECN, the bits should be set to 00.)

2. The router uses WRED to recognize congestion, and the router randomly chooses this packet for discard. However, with ECN configured, the router checks the packet’s ECN bits, and finds them set to “01”. So, the router sets the bits to “11”, and forwards the packet instead of discarding it.

3. The packet continues on its trip to the TCP receiver.

4. The TCP receiver receives the packet, and notices ECN = 11. As a result, the receiver sets a TCP flag in the next TCP segment it sends back to the TCP sender. The flag is called the Explicit Congestion Experienced (ECE) flag. (The ECN bit settings are not important in the packet at step 4, but they can be used to signal congestion in the right-to-left direction in this example.)

5. The packet passes through the router just like any other packet—there is no need for the router to watch for this return packet, or to set any other bits at this point.

6. The TCP sender receives the TCP segment with TCP ECE flag set, telling it to slow down. So, the TCP sender reduces its congestion window (CWND) by half.

7. The TCP sender wants the TCP receiver to know that it “got the message” and slowed down. To do so, the TCP sender, in it’s next TCP segment, sets another new TCP flag called the Congestion Window Reduced (CWR) flag.

If ECN = 00, discard the packet

Otherwise, set ECN = 11, and forward the packet.

Configuration needed:

class class1

bandwidth percent 50

random-detect dscp-based

random-detect ecn

LAN SWITCHING QOS

Switches can translate COS into DSCP and vis versa.

You would translate into DSCP if a layer 3 device needs to make a qos decision

You would translate to COS is a switch needed to make a QOS decision (eg a trunk link).

The mechanisms used for this translation are CoS-DSCP maps and DSCP-CoS maps

In the case of voice traffic the translation maps need to be changed because a VOIP phone markets COS and DSCP differently. A phone does Cos 3 and AF 31 for signal, COS 5 and EF for the voice.

Based on the map below ( default map) the dscp and COS values need to align.

By default, the Ethernet interfaces on a Catalyst switch are in the untrusted state for QOS

Any CoS value received on an interface will be overwritten with a CoS value of 0.

all DSCP markings are overwritten with a value of 0 as well