Diff: draft-ietf-tsvwg-aqm-dualq-coupled-25.original - draft-ietf-tsvwg-aqm-dualq-coupled-26v2.txt

	draft-ietf-tsvwg-aqm-dualq-coupled-25.original		draft-ietf-tsvwg-aqm-dualq-coupled-26v2.txt

	Transport Area working group (tsvwg) K. De Schepper		Transport Area working group (tsvwg) K. De Schepper
	Internet-Draft Nokia Bell Labs		Internet-Draft Nokia Bell Labs
	Intended status: Experimental B. Briscoe, Ed.		Intended status: Experimental B. Briscoe, Ed.

	Expires: 2 March 2023 Independent		Expires: April 24, 2023 Independent
	G. White		G. White
	CableLabs		CableLabs

	29 August 2022		October 21, 2022

	DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput		DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput
	(L4S)		(L4S)

	draft-ietf-tsvwg-aqm-dualq-coupled-25		draft-ietf-tsvwg-aqm-dualq-coupled-26

	Abstract		Abstract

	This specification defines a framework for coupling the Active Queue		This specification defines a framework for coupling the Active Queue
	Management (AQM) algorithms in two queues intended for flows with		Management (AQM) algorithms in two queues intended for flows with
	different responses to congestion. This provides a way for the		different responses to congestion. This provides a way for the
	Internet to transition from the scaling problems of standard TCP		Internet to transition from the scaling problems of standard TCP
	Reno-friendly ('Classic') congestion controls to the family of		Reno-friendly ('Classic') congestion controls to the family of
	'Scalable' congestion controls. These are designed for consistently		'Scalable' congestion controls. These are designed for consistently
	very Low queuing Latency, very Low congestion Loss and Scaling of		very Low queuing Latency, very Low congestion Loss and Scaling of

	skipping to change at page 1, line 49 ¶		skipping to change at page 1, line 49 ¶
	Internet-Drafts are working documents of the Internet Engineering		Internet-Drafts are working documents of the Internet Engineering
	Task Force (IETF). Note that other groups may also distribute		Task Force (IETF). Note that other groups may also distribute
	working documents as Internet-Drafts. The list of current Internet-		working documents as Internet-Drafts. The list of current Internet-
	Drafts is at https://datatracker.ietf.org/drafts/current/.		Drafts is at https://datatracker.ietf.org/drafts/current/.

	Internet-Drafts are draft documents valid for a maximum of six months		Internet-Drafts are draft documents valid for a maximum of six months
	and may be updated, replaced, or obsoleted by other documents at any		and may be updated, replaced, or obsoleted by other documents at any
	time. It is inappropriate to use Internet-Drafts as reference		time. It is inappropriate to use Internet-Drafts as reference
	material or to cite them other than as "work in progress."		material or to cite them other than as "work in progress."


	This Internet-Draft will expire on 2 March 2023.		This Internet-Draft will expire on April 24, 2023.

	Copyright Notice		Copyright Notice

	Copyright (c) 2022 IETF Trust and the persons identified as the		Copyright (c) 2022 IETF Trust and the persons identified as the
	document authors. All rights reserved.		document authors. All rights reserved.

	This document is subject to BCP 78 and the IETF Trust's Legal		This document is subject to BCP 78 and the IETF Trust's Legal

	Provisions Relating to IETF Documents (https://trustee.ietf.org/		Provisions Relating to IETF Documents
	license-info) in effect on the date of publication of this document.		(https://trustee.ietf.org/license-info) in effect on the date of
	Please review these documents carefully, as they describe your rights		publication of this document. Please review these documents
	and restrictions with respect to this document. Code Components		carefully, as they describe your rights and restrictions with respect
	extracted from this document must include Revised BSD License text as		to this document. Code Components extracted from this document must
	described in Section 4.e of the Trust Legal Provisions and are		include Simplified BSD License text as described in Section 4.e of
	provided without warranty as described in the Revised BSD License.		the Trust Legal Provisions and are provided without warranty as
			described in the Simplified BSD License.

	Table of Contents		Table of Contents

	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3		1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
	1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3		1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3
	1.2. Context, Scope & Applicability . . . . . . . . . . . . . 6		1.2. Context, Scope & Applicability . . . . . . . . . . . . . 6
	1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7		1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7
	1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9		1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9
	2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11		2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11
	2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11		2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11
	2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12		2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12
	2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12		2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12
	2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13		2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13
	2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17		2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17
	2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17		2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17
	2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18		2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18
	2.5.2. Management Requirements . . . . . . . . . . . . . . . 19		2.5.2. Management Requirements . . . . . . . . . . . . . . . 19
	2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19		2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19
	2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21		2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21

	2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 22		2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 21
	2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22		2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22
	3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22		3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22
	4. Security Considerations . . . . . . . . . . . . . . . . . . . 22		4. Security Considerations . . . . . . . . . . . . . . . . . . . 22
	4.1. Low Delay without Requiring Per-Flow Processing . . . . . 22		4.1. Low Delay without Requiring Per-Flow Processing . . . . . 22
	4.2. Handling Unresponsive Flows and Overload . . . . . . . . 23		4.2. Handling Unresponsive Flows and Overload . . . . . . . . 23
	4.2.1. Unresponsive Traffic without Overload . . . . . . . . 24		4.2.1. Unresponsive Traffic without Overload . . . . . . . . 24
	4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S		4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S

	Throughput or Delay? . . . . . . . . . . . . . . . . 25		Throughput or Delay? . . . . . . . . . . . . . . . . 24
	4.2.3. L4S ECN Saturation: Introduce Drop or Delay? . . . . 26		4.2.3. L4S ECN Saturation: Introduce Drop or Delay? . . . . 26

	4.2.3.1. Protecting against Overload by Unresponsive		4.2.3.1. Protecting against Overload by Unresponsive ECN-
	ECN-Capable Traffic . . . . . . . . . . . . . . . . 28		Capable Traffic . . . . . . . . . . . . . . . . . 27
	5. References . . . . . . . . . . . . . . . . . . . . . . . . . 28		5. References . . . . . . . . . . . . . . . . . . . . . . . . . 28
	5.1. Normative References . . . . . . . . . . . . . . . . . . 28		5.1. Normative References . . . . . . . . . . . . . . . . . . 28

	5.2. Informative References . . . . . . . . . . . . . . . . . 29		5.2. Informative References . . . . . . . . . . . . . . . . . 28
	Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 35
			Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 34
	A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 35		A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 35

	A.2. Pass #2: Edge-Case Details . . . . . . . . . . . . . . . 46		A.2. Pass #2: Edge-Case Details . . . . . . . . . . . . . . . 45
	Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 51		Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 50
	B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 51		B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 50
	B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 57		B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 56
	Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 59		Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 58
	C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 59		C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 58
	C.2. Guidance on Controlling Throughput Equivalence . . . . . 60		C.2. Guidance on Controlling Throughput Equivalence . . . . . 59
	Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 64		Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 63
	Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 64		Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 63
	Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 65		Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 64

	1. Introduction		1. Introduction

	This document specifies a framework for DualQ Coupled AQMs, which can		This document specifies a framework for DualQ Coupled AQMs, which can
	serve as the network part of the L4S		serve as the network part of the L4S
	architecture [I-D.ietf-tsvwg-l4s-arch]. A Coupled DualQ AQM consists		architecture [I-D.ietf-tsvwg-l4s-arch]. A Coupled DualQ AQM consists
	of two queues; L4S and Classic. The L4S queue is intended for		of two queues; L4S and Classic. The L4S queue is intended for
	Scalable congestion controls that can maintain very low queuing		Scalable congestion controls that can maintain very low queuing
	latency (sub-millisecond on average) and high throughput at the same		latency (sub-millisecond on average) and high throughput at the same
	time. The Coupled DualQ acts like a semi-permeable membrane: the L4S		time. The Coupled DualQ acts like a semi-permeable membrane: the L4S

	skipping to change at page 4, line 15 ¶		skipping to change at page 4, line 16 ¶
	now it has not been possible to allow any number of low latency, high		now it has not been possible to allow any number of low latency, high
	throughput applications to seek to fully utilize available capacity,		throughput applications to seek to fully utilize available capacity,
	because the capacity-seeking process itself causes too much queuing		because the capacity-seeking process itself causes too much queuing
	delay.		delay.

	To reduce this queuing delay caused by the capacity seeking process,		To reduce this queuing delay caused by the capacity seeking process,
	changes either to the network alone or to end-systems alone are in		changes either to the network alone or to end-systems alone are in
	progress. L4S involves a recognition that both approaches are		progress. L4S involves a recognition that both approaches are
	yielding diminishing returns:		yielding diminishing returns:


	* Recent state-of-the-art active queue management (AQM) in the		o Recent state-of-the-art active queue management (AQM) in the
	network, e.g. FQ-CoDel [RFC8290], PIE [RFC8033], Adaptive		network, e.g. FQ-CoDel [RFC8290], PIE [RFC8033], Adaptive
	RED [ARED01] ) has reduced queuing delay for all traffic, not just		RED [ARED01] ) has reduced queuing delay for all traffic, not just
	a select few applications. However, no matter how good the AQM,		a select few applications. However, no matter how good the AQM,
	the capacity-seeking (sawtoothing) rate of TCP-like congestion		the capacity-seeking (sawtoothing) rate of TCP-like congestion
	controls represents a lower limit that will either cause queuing		controls represents a lower limit that will either cause queuing
	delay to vary or cause the link to be under-utilized. These AQMs		delay to vary or cause the link to be under-utilized. These AQMs
	are tuned to allow a typical capacity-seeking Reno-friendly flow		are tuned to allow a typical capacity-seeking Reno-friendly flow
	to induce an average queue that roughly doubles the base RTT,		to induce an average queue that roughly doubles the base RTT,
	adding 5-15 ms of queuing on average (cf. 500 microseconds with		adding 5-15 ms of queuing on average (cf. 500 microseconds with
	L4S for the same mix of long-running and web traffic). However,		L4S for the same mix of long-running and web traffic). However,
	for many applications low delay is not useful unless it is		for many applications low delay is not useful unless it is
	consistently low. With these AQMs, 99th percentile queuing delay		consistently low. With these AQMs, 99th percentile queuing delay
	is 20-30 ms (cf. 2 ms with the same traffic over L4S).		is 20-30 ms (cf. 2 ms with the same traffic over L4S).


	* Similarly, recent research into using e2e congestion control		o Similarly, recent research into using e2e congestion control
	without needing an AQM in the network (e.g. BBR		without needing an AQM in the network (e.g. BBR
	[I-D.cardwell-iccrg-bbr-congestion-control]) seems to have hit a		[I-D.cardwell-iccrg-bbr-congestion-control]) seems to have hit a
	similar lower limit to queuing delay of about 20ms on average, but		similar lower limit to queuing delay of about 20ms on average, but
	there are also regular 25ms delay spikes due to bandwidth probes		there are also regular 25ms delay spikes due to bandwidth probes
	and 60ms spikes due to flow-starts.		and 60ms spikes due to flow-starts.

	L4S learns from the experience of Data Center TCP [RFC8257], which		L4S learns from the experience of Data Center TCP [RFC8257], which
	shows the power of complementary changes both in the network and on		shows the power of complementary changes both in the network and on
	end-systems. DCTCP teaches us that two small but radical changes to		end-systems. DCTCP teaches us that two small but radical changes to
	congestion control are needed to cut the two major outstanding causes		congestion control are needed to cut the two major outstanding causes

	skipping to change at page 7, line 14 ¶		skipping to change at page 7, line 14 ¶

	For the public Internet, nearly all the benefit will typically be		For the public Internet, nearly all the benefit will typically be
	achieved by deploying the Coupled AQM into either end of the access		achieved by deploying the Coupled AQM into either end of the access
	link between a 'site' and the Internet, which is invariably the		link between a 'site' and the Internet, which is invariably the
	bottleneck (see section 6.4 of[I-D.ietf-tsvwg-l4s-arch] about		bottleneck (see section 6.4 of[I-D.ietf-tsvwg-l4s-arch] about
	deployment, which also defines the term 'site' to mean a home, an		deployment, which also defines the term 'site' to mean a home, an
	office, a campus or mobile user equipment).		office, a campus or mobile user equipment).

	Latency is not the only concern of L4S:		Latency is not the only concern of L4S:


	* The "Low Loss" part of the name denotes that L4S generally		o The "Low Loss" part of the name denotes that L4S generally
	achieves zero congestion loss (which would otherwise cause		achieves zero congestion loss (which would otherwise cause
	retransmission delays), due to its use of ECN.		retransmission delays), due to its use of ECN.


	* The "Scalable throughput" part of the name denotes that the per-		o The "Scalable throughput" part of the name denotes that the per-
	flow throughput of Scalable congestion controls should scale		flow throughput of Scalable congestion controls should scale
	indefinitely, avoiding the imminent scaling problems with 'TCP-		indefinitely, avoiding the imminent scaling problems with 'TCP-
	Friendly' congestion control algorithms [RFC3649].		Friendly' congestion control algorithms [RFC3649].

	The former is clearly in scope of this AQM document. However, the		The former is clearly in scope of this AQM document. However, the
	latter is an outcome of the end-system behaviour, and therefore		latter is an outcome of the end-system behaviour, and therefore
	outside the scope of this AQM document, even though the AQM is an		outside the scope of this AQM document, even though the AQM is an
	enabler.		enabler.

	The overall L4S architecture [I-D.ietf-tsvwg-l4s-arch] gives more		The overall L4S architecture [I-D.ietf-tsvwg-l4s-arch] gives more
	detail, including on wider deployment aspects such as backwards		detail, including on wider deployment aspects such as backwards
	compatibility of Scalable congestion controls in bottlenecks where a		compatibility of Scalable congestion controls in bottlenecks where a
	DualQ Coupled AQM has not been deployed. The supporting papers		DualQ Coupled AQM has not been deployed. The supporting papers

	[DualPI2Linux], [PI2], [DCttH19] and [PI2param] give the full		[L4Seval22], [DualPI2Linux], [PI2] and [PI2param] give the full
	rationale for the AQM's design, both discursively and in more precise		rationale for the AQM's design, both discursively and in more precise
	mathematical form, as well as the results of performance evaluations.		mathematical form, as well as the results of performance evaluations.
	The main results have been validated independently when using the		The main results have been validated independently when using the
	Prague congestion control [Boru20] (experiments are run using Prague		Prague congestion control [Boru20] (experiments are run using Prague
	and DCTCP, but only the former are relevant for validation, because		and DCTCP, but only the former are relevant for validation, because
	Prague fixes a number of problems with the Linux DCTCP code that make		Prague fixes a number of problems with the Linux DCTCP code that make
	it unsuitable for the public Internet).		it unsuitable for the public Internet).

	1.3. Terminology		1.3. Terminology


	skipping to change at page 8, line 46 ¶		skipping to change at page 8, line 48 ¶
	averages 2 congestion signals per round-trip whatever the flow		averages 2 congestion signals per round-trip whatever the flow
	rate, as do other recently developed scalable congestion controls,		rate, as do other recently developed scalable congestion controls,
	e.g. Relentless TCP [I-D.mathis-iccrg-relentless-tcp], TCP Prague		e.g. Relentless TCP [I-D.mathis-iccrg-relentless-tcp], TCP Prague
	[I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux],		[I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux],
	BBRv2 [BBRv2], [I-D.cardwell-iccrg-bbr-congestion-control] and the		BBRv2 [BBRv2], [I-D.cardwell-iccrg-bbr-congestion-control] and the
	L4S variant of SCREAM for real-time media [SCReAM], [RFC8298]).		L4S variant of SCREAM for real-time media [SCReAM], [RFC8298]).
	For the public Internet a Scalable transport has to comply with		For the public Internet a Scalable transport has to comply with
	the requirements in Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id]		the requirements in Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id]
	(aka. the 'Prague L4S requirements').		(aka. the 'Prague L4S requirements').


	C: Abbreviation for Classic, e.g. when used as a subscript.		C: Abbreviation for Classic, e.g. when used as a subscript.


	L: Abbreviation for L4S, e.g. when used as a subscript.		L: Abbreviation for L4S, e.g. when used as a subscript.

	The terms Classic or L4S can also qualify other nouns, such as		The terms Classic or L4S can also qualify other nouns, such as
	'codepoint', 'identifier', 'classification', 'packet', 'flow'.		'codepoint', 'identifier', 'classification', 'packet', 'flow'.
	For example: an L4S packet means a packet with an L4S identifier		For example: an L4S packet means a packet with an L4S identifier
	sent from an L4S congestion control.		sent from an L4S congestion control.

	Both Classic and L4S services can cope with a proportion of		Both Classic and L4S services can cope with a proportion of
	unresponsive or less-responsive traffic as well, but in the L4S		unresponsive or less-responsive traffic as well, but in the L4S
	case its rate has to be smooth enough or low enough not to build a		case its rate has to be smooth enough or low enough not to build a
	queue (e.g. DNS, VoIP, game sync datagrams, etc.). The DualQ		queue (e.g. DNS, VoIP, game sync datagrams, etc.). The DualQ

	skipping to change at page 10, line 6 ¶		skipping to change at page 10, line 6 ¶
	traffic.		traffic.

	Thousands of tests have been conducted in a typical fixed residential		Thousands of tests have been conducted in a typical fixed residential
	broadband setting. Experiments used a range of base round trip		broadband setting. Experiments used a range of base round trip
	delays up to 100ms and link rates up to 200 Mb/s between the data		delays up to 100ms and link rates up to 200 Mb/s between the data
	centre and home network, with varying amounts of background traffic		centre and home network, with varying amounts of background traffic
	in both queues. For every L4S packet, the AQM kept the average		in both queues. For every L4S packet, the AQM kept the average
	queuing delay below 1ms (or 2 packets where serialization delay		queuing delay below 1ms (or 2 packets where serialization delay
	exceeded 1ms on slower links), with 99th percentile no worse than		exceeded 1ms on slower links), with 99th percentile no worse than
	2ms. No losses at all were introduced by the L4S AQM. Details of		2ms. No losses at all were introduced by the L4S AQM. Details of

	the extensive experiments are available [DualPI2Linux], [PI2],		the extensive experiments are available [L4Seval22], [DualPI2Linux].
	[DCttH19]. Subjective testing using very demanding high bandwidth		Subjective testing using very demanding high bandwidth low latency
	low latency applications over a single shared access link is also		applications over a single shared access link is also described
	described in [L4Sdemo16] and summarized in the section about		in [L4Sdemo16] and summarized in the section about applications in
	applications in the L4S architecture [I-D.ietf-tsvwg-l4s-arch] .		the L4S architecture [I-D.ietf-tsvwg-l4s-arch] .

	In all these experiments, the host was connected to the home network		In all these experiments, the host was connected to the home network
	by fixed Ethernet, in order to quantify the queuing delay that can be		by fixed Ethernet, in order to quantify the queuing delay that can be
	achieved by a user who cares about delay. It should be emphasized		achieved by a user who cares about delay. It should be emphasized
	that L4S support at the bottleneck link cannot 'undelay' bursts		that L4S support at the bottleneck link cannot 'undelay' bursts
	introduced by another link on the path, for instance by legacy Wi-Fi		introduced by another link on the path, for instance by legacy Wi-Fi
	equipment. However, if L4S support is added to the queue feeding the		equipment. However, if L4S support is added to the queue feeding the
	_outgoing_ WAN link of a home gateway, it would be counterproductive		_outgoing_ WAN link of a home gateway, it would be counterproductive
	not to also reduce the burstiness of the _incoming_ Wi-Fi. Also,		not to also reduce the burstiness of the _incoming_ Wi-Fi. Also,
	trials of Wi-Fi equipment with an L4S DualQ Coupled AQM on the		trials of Wi-Fi equipment with an L4S DualQ Coupled AQM on the

	skipping to change at page 10, line 37 ¶		skipping to change at page 10, line 37 ¶
	achieve low delay. The L4S queue can be filled with a heavy load of		achieve low delay. The L4S queue can be filled with a heavy load of
	capacity-seeking flows (TCP Prague etc.) and still achieve low delay.		capacity-seeking flows (TCP Prague etc.) and still achieve low delay.
	The L4S queue does not rely on the presence of other traffic in the		The L4S queue does not rely on the presence of other traffic in the
	Classic queue that can be 'overtaken'. It gives low latency to L4S		Classic queue that can be 'overtaken'. It gives low latency to L4S
	traffic whether or not there is Classic traffic. The tail latency of		traffic whether or not there is Classic traffic. The tail latency of
	traffic served by the Classic AQM is sometimes a little better		traffic served by the Classic AQM is sometimes a little better
	sometimes a little worse, when a proportion of the traffic is L4S.		sometimes a little worse, when a proportion of the traffic is L4S.

	The two queues are only necessary because:		The two queues are only necessary because:


	* the large variations (sawteeth) of Classic flows need roughly a		o the large variations (sawteeth) of Classic flows need roughly a
	base RTT of queuing delay to ensure full utilization		base RTT of queuing delay to ensure full utilization


	* Scalable flows do not need a queue to keep utilization high, but		o Scalable flows do not need a queue to keep utilization high, but
	they cannot keep latency predictably low if they are mixed with		they cannot keep latency predictably low if they are mixed with
	Classic traffic,		Classic traffic,

	The L4S queue has latency priority within sub-round trip timescales,		The L4S queue has latency priority within sub-round trip timescales,
	but over longer periods the coupling from the Classic to the L4S AQM		but over longer periods the coupling from the Classic to the L4S AQM
	(explained below) ensures that it does not have bandwidth priority		(explained below) ensures that it does not have bandwidth priority
	over the Classic queue.		over the Classic queue.

	2. DualQ Coupled AQM		2. DualQ Coupled AQM

	There are two main aspects to the approach:		There are two main aspects to the approach:


	* The Coupled AQM that addresses throughput equivalence between		o The Coupled AQM that addresses throughput equivalence between
	Classic (e.g. Reno, Cubic) flows and L4S flows (that satisfy the		Classic (e.g. Reno, Cubic) flows and L4S flows (that satisfy the
	Prague L4S requirements).		Prague L4S requirements).


	* The Dual Queue structure that provides latency separation for L4S		o The Dual Queue structure that provides latency separation for L4S
	flows to isolate them from the typically large Classic queue.		flows to isolate them from the typically large Classic queue.

	2.1. Coupled AQM		2.1. Coupled AQM

	In the 1990s, the `TCP formula' was derived for the relationship		In the 1990s, the `TCP formula' was derived for the relationship
	between the steady-state congestion window, cwnd, and the drop		between the steady-state congestion window, cwnd, and the drop
	probability, p of standard Reno congestion control [RFC5681]. To a		probability, p of standard Reno congestion control [RFC5681]. To a
	first order approximation, the steady-state cwnd of Reno is inversely		first order approximation, the steady-state cwnd of Reno is inversely
	proportional to the square root of p.		proportional to the square root of p.


	skipping to change at page 15, line 28 ¶		skipping to change at page 15, line 28 ¶
	`----------'\\ \| AQM \|---->: ,'\|`-.___.-'		`----------'\\ \| AQM \|---->: ,'\|`-.___.-'
	\\ \| \|p' \| <' \|		\\ \| \|p' \| <' \|
	\\ `-------' (p'^2) //`'		\\ `-------' (p'^2) //`'
	\\ ^ \| //		\\ ^ \| //
	\\,. \| v p_C //		\\,. \| v p_C //
	< \| _________ .------.//		< \| _________ .------.//
	`\\| \| \| \| Drop \|/		`\\| \| \| \| Drop \|/
	Classic (C) \|queue \|===>\|/mark \|		Classic (C) \|queue \|===>\|/mark \|
	__\|______\| `------'		__\|______\| `------'


	Figure 1: DualQ Coupled AQM Schematic

	Legend: ===> traffic flow; ---> control dependency.		Legend: ===> traffic flow; ---> control dependency.


			Figure 1: DualQ Coupled AQM Schematic

	After the AQMs have applied their dropping or marking, the scheduler		After the AQMs have applied their dropping or marking, the scheduler
	forwards their packets to the link. Even though the scheduler gives		forwards their packets to the link. Even though the scheduler gives
	priority to the L queue, it is not as strong as the coupling from the		priority to the L queue, it is not as strong as the coupling from the
	C queue. This is because, as the C queue grows, the base AQM applies		C queue. This is because, as the C queue grows, the base AQM applies
	more congestion signals to L traffic (as well as C). As L flows		more congestion signals to L traffic (as well as C). As L flows
	reduce their rate in response, they use less than the scheduling		reduce their rate in response, they use less than the scheduling
	share for L traffic. So, because the scheduler is work preserving,		share for L traffic. So, because the scheduler is work preserving,
	it schedules any C traffic in the gaps.		it schedules any C traffic in the gaps.

	Giving priority to the L queue has the benefit of very low L queue		Giving priority to the L queue has the benefit of very low L queue

	skipping to change at page 18, line 41 ¶		skipping to change at page 18, line 41 ¶
	2.5.1.1. Requirements in Unexpected Cases		2.5.1.1. Requirements in Unexpected Cases

	The flexibility to allow operator-specific classifiers (Section 2.3)		The flexibility to allow operator-specific classifiers (Section 2.3)
	leads to the need to specify what the AQM in each queue ought to do		leads to the need to specify what the AQM in each queue ought to do
	with packets that do not carry the ECN field expected for that queue.		with packets that do not carry the ECN field expected for that queue.
	It is expected that the AQM in each queue will inspect the ECN field		It is expected that the AQM in each queue will inspect the ECN field
	to determine what sort of congestion notification to signal, then it		to determine what sort of congestion notification to signal, then it
	will decide whether to apply congestion notification to this		will decide whether to apply congestion notification to this
	particular packet, as follows:		particular packet, as follows:


	* If a packet that does not carry an ECT(1) or CE codepoint is		o If a packet that does not carry an ECT(1) or CE codepoint is
	classified into the L queue:		classified into the L queue:


	- if the packet is ECT(0), the L AQM SHOULD apply CE-marking		* if the packet is ECT(0), the L AQM SHOULD apply CE-marking
	using a probability appropriate to Classic congestion control		using a probability appropriate to Classic congestion control
	and appropriate to the target delay in the L queue		and appropriate to the target delay in the L queue


	- if the packet is Not-ECT, the appropriate action depends on		* if the packet is Not-ECT, the appropriate action depends on
	whether some other function is protecting the L queue from		whether some other function is protecting the L queue from
	misbehaving flows (e.g. per-flow queue protection		misbehaving flows (e.g. per-flow queue protection
	[I-D.briscoe-docsis-q-protection] or latency policing):		[I-D.briscoe-docsis-q-protection] or latency policing):


	o If separate queue protection is provided, the L AQM SHOULD		+ If separate queue protection is provided, the L AQM SHOULD
	ignore the packet and forward it unchanged, meaning it		ignore the packet and forward it unchanged, meaning it
	should not calculate whether to apply congestion		should not calculate whether to apply congestion
	notification and it should neither drop nor CE-mark the		notification and it should neither drop nor CE-mark the
	packet (for instance, the operator might classify EF traffic		packet (for instance, the operator might classify EF traffic
	that is unresponsive to drop into the L queue, alongside		that is unresponsive to drop into the L queue, alongside
	responsive L4S-ECN traffic)		responsive L4S-ECN traffic)


	o if separate queue protection is not provided, the L AQM		+ if separate queue protection is not provided, the L AQM
	SHOULD apply drop using a drop probability appropriate to		SHOULD apply drop using a drop probability appropriate to
	Classic congestion control and appropriate to the target		Classic congestion control and appropriate to the target
	delay in the L queue		delay in the L queue


	* If a packet that carries an ECT(1) codepoint is classified into		o If a packet that carries an ECT(1) codepoint is classified into
	the C queue:		the C queue:


	- the C AQM SHOULD apply CE-marking using the coupled AQM		* the C AQM SHOULD apply CE-marking using the coupled AQM
	probability p_CL (= k*p').		probability p_CL (= k*p').

	The above requirements are worded as "SHOULDs", because operator-		The above requirements are worded as "SHOULDs", because operator-
	specific classifiers are for flexibility, by definition. Therefore,		specific classifiers are for flexibility, by definition. Therefore,
	alternative actions might be appropriate in the operator's specific		alternative actions might be appropriate in the operator's specific
	circumstances. An example would be where the operator knows that		circumstances. An example would be where the operator knows that
	certain legacy traffic marked with one codepoint actually has a		certain legacy traffic marked with one codepoint actually has a
	congestion response associated with another codepoint.		congestion response associated with another codepoint.

	If the DualQ Coupled AQM has detected overload, it MUST introduce		If the DualQ Coupled AQM has detected overload, it MUST introduce

	skipping to change at page 20, line 5 ¶		skipping to change at page 19, line 47 ¶

	2.5.2. Management Requirements		2.5.2. Management Requirements

	2.5.2.1. Configuration		2.5.2.1. Configuration

	By default, a DualQ Coupled AQM SHOULD NOT need any configuration for		By default, a DualQ Coupled AQM SHOULD NOT need any configuration for
	use at a bottleneck on the public Internet [RFC7567]. The following		use at a bottleneck on the public Internet [RFC7567]. The following
	parameters MAY be operator-configurable, e.g. to tune for non-		parameters MAY be operator-configurable, e.g. to tune for non-
	Internet settings:		Internet settings:


	* Optional packet classifier(s) to use in addition to the ECN field		o Optional packet classifier(s) to use in addition to the ECN field
	(see Section 2.3);		(see Section 2.3);


	* Expected typical RTT, which can be used to determine the queuing		o Expected typical RTT, which can be used to determine the queuing
	delay of the Classic AQM at its operating point, in order to		delay of the Classic AQM at its operating point, in order to
	prevent typical lone flows from under-utilizing capacity. For		prevent typical lone flows from under-utilizing capacity. For
	example:		example:


	- for the PI2 algorithm (Appendix A) the queuing delay target is		* for the PI2 algorithm (Appendix A) the queuing delay target is
	dependent on the typical RTT;		dependent on the typical RTT;


	- for the Curvy RED algorithm (Appendix B) the queuing delay at		* for the Curvy RED algorithm (Appendix B) the queuing delay at
	the desired operating point of the curvy ramp is configured to		the desired operating point of the curvy ramp is configured to
	encompass a typical RTT;		encompass a typical RTT;


	- if another Classic AQM was used, it would be likely to need an		* if another Classic AQM was used, it would be likely to need an
	operating point for the queue based on the typical RTT, and if		operating point for the queue based on the typical RTT, and if
	so it SHOULD be expressed in units of time.		so it SHOULD be expressed in units of time.

	An operating point that is manually calculated might be directly		An operating point that is manually calculated might be directly
	configurable instead, e.g. for links with large numbers of flows		configurable instead, e.g. for links with large numbers of flows
	where under-utilization by a single flow would be unlikely.		where under-utilization by a single flow would be unlikely.


	* Expected maximum RTT, which can be used to set the stability		o Expected maximum RTT, which can be used to set the stability
	parameter(s) of the Classic AQM. For example:		parameter(s) of the Classic AQM. For example:


	- for the PI2 algorithm (Appendix A), the gain parameters of the		* for the PI2 algorithm (Appendix A), the gain parameters of the
	PI algorithm depend on the maximum RTT.		PI algorithm depend on the maximum RTT.


	- for the Curvy RED algorithm (Appendix B) the smoothing		* for the Curvy RED algorithm (Appendix B) the smoothing
	parameter is chosen to filter out transients in the queue		parameter is chosen to filter out transients in the queue
	within a maximum RTT.		within a maximum RTT.

	Stability parameter(s) that are manually calculated assuming a		Stability parameter(s) that are manually calculated assuming a
	maximum RTT might be directly configurable instead.		maximum RTT might be directly configurable instead.


	* Coupling factor, k (see Appendix C.2);		o Coupling factor, k (see Appendix C.2);


	* A limit to the conditional priority of L4S. This is scheduler-		o A limit to the conditional priority of L4S. This is scheduler-
	dependent, but it SHOULD be expressed as a relation between the		dependent, but it SHOULD be expressed as a relation between the
	max delay of a C packet and an L packet. For example:		max delay of a C packet and an L packet. For example:


	- for a WRR scheduler a weight ratio between L and C of w:1 means		* for a WRR scheduler a weight ratio between L and C of w:1 means
	that the maximum delay to a C packet is w times that of an L		that the maximum delay to a C packet is w times that of an L
	packet.		packet.


	- for a time-shifted FIFO (TS-FIFO) scheduler (see Section 4.2.2)		* for a time-shifted FIFO (TS-FIFO) scheduler (see Section 4.2.2)
	a time-shift of tshift means that the maximum delay to a C		a time-shift of tshift means that the maximum delay to a C
	packet is tshift greater than that of an L packet. tshift could		packet is tshift greater than that of an L packet. tshift could
	be expressed as a multiple of the typical RTT rather than as an		be expressed as a multiple of the typical RTT rather than as an
	absolute delay.		absolute delay.


	* The maximum Classic ECN marking probability, p_Cmax, before		o The maximum Classic ECN marking probability, p_Cmax, before
	introducing drop.		introducing drop.

	2.5.2.2. Monitoring		2.5.2.2. Monitoring

	An experimental DualQ Coupled AQM SHOULD allow the operator to		An experimental DualQ Coupled AQM SHOULD allow the operator to
	monitor each of the following operational statistics on demand, per		monitor each of the following operational statistics on demand, per
	queue and per configurable sample interval, for performance		queue and per configurable sample interval, for performance
	monitoring and perhaps also for accounting in some cases:		monitoring and perhaps also for accounting in some cases:


	* Bits forwarded, from which utilization can be calculated;		o Bits forwarded, from which utilization can be calculated;


	* Total packets in the three categories: arrived, presented to the		o Total packets in the three categories: arrived, presented to the
	AQM, and forwarded. The difference between the first two will		AQM, and forwarded. The difference between the first two will
	measure any non-AQM tail discard. The difference between the last		measure any non-AQM tail discard. The difference between the last
	two will measure proactive AQM discard;		two will measure proactive AQM discard;


	* ECN packets marked, non-ECN packets dropped, ECN packets dropped,		o ECN packets marked, non-ECN packets dropped, ECN packets dropped,
	which can be combined with the three total packet counts above to		which can be combined with the three total packet counts above to
	calculate marking and dropping probabilities;		calculate marking and dropping probabilities;


	* Queue delay (not including serialization delay of the head packet		o Queue delay (not including serialization delay of the head packet
	or medium acquisition delay) - see further notes below.		or medium acquisition delay) - see further notes below.

	Unlike the other statistics, queue delay cannot be captured in a		Unlike the other statistics, queue delay cannot be captured in a
	simple accumulating counter. Therefore, the type of queue delay		simple accumulating counter. Therefore, the type of queue delay
	statistics produced (mean, percentiles, etc.) will depend on		statistics produced (mean, percentiles, etc.) will depend on
	implementation constraints. To facilitate comparative evaluation		implementation constraints. To facilitate comparative evaluation
	of different implementations and approaches, an implementation		of different implementations and approaches, an implementation
	SHOULD allow mean and 99th percentile queue delay to be derived		SHOULD allow mean and 99th percentile queue delay to be derived
	(per queue per sample interval). A relatively simple way to do		(per queue per sample interval). A relatively simple way to do
	this would be to store a coarse-grained histogram of queue delay.		this would be to store a coarse-grained histogram of queue delay.

	skipping to change at page 22, line 10 ¶		skipping to change at page 21, line 49 ¶
	a sample interval, each bin would accumulate a count of the number		a sample interval, each bin would accumulate a count of the number
	of packets that had fallen within each range. The maximum queue		of packets that had fallen within each range. The maximum queue
	delay per queue per interval MAY also be recorded, to aid		delay per queue per interval MAY also be recorded, to aid
	diagnosis of faults and anomalous events.		diagnosis of faults and anomalous events.

	2.5.2.3. Anomaly Detection		2.5.2.3. Anomaly Detection

	An experimental DualQ Coupled AQM SHOULD asynchronously report the		An experimental DualQ Coupled AQM SHOULD asynchronously report the
	following data about anomalous conditions:		following data about anomalous conditions:


	* Start-time and duration of overload state.		o Start-time and duration of overload state.

	A hysteresis mechanism SHOULD be used to prevent flapping in and		A hysteresis mechanism SHOULD be used to prevent flapping in and
	out of overload causing an event storm. For instance, exit from		out of overload causing an event storm. For instance, exit from
	overload state could trigger one report, but also latch a timer.		overload state could trigger one report, but also latch a timer.
	Then, during that time, if the AQM enters and exits overload state		Then, during that time, if the AQM enters and exits overload state
	any number of times, the duration in overload state is		any number of times, the duration in overload state is
	accumulated, but no new report is generated until the first time		accumulated, but no new report is generated until the first time
	the AQM is out of overload once the timer has expired.		the AQM is out of overload once the timer has expired.

	2.5.2.4. Deployment, Coexistence and Scaling		2.5.2.4. Deployment, Coexistence and Scaling

	skipping to change at page 24, line 5 ¶		skipping to change at page 23, line 38 ¶

	A trade-off needs to be made between complexity and the risk of		A trade-off needs to be made between complexity and the risk of
	either traffic class harming the other. In overloaded conditions the		either traffic class harming the other. In overloaded conditions the
	higher priority L4S service will have to sacrifice some aspect of its		higher priority L4S service will have to sacrifice some aspect of its
	performance. Depending on the degree of overload, alternative		performance. Depending on the degree of overload, alternative
	solutions may relax a different factor: e.g. throughput, delay, drop.		solutions may relax a different factor: e.g. throughput, delay, drop.
	These choices need to be made either by the developer or by operator		These choices need to be made either by the developer or by operator
	policy, rather than by the IETF. Subsequent subsections discuss		policy, rather than by the IETF. Subsequent subsections discuss
	aspects relating to handling of different degrees of overload:		aspects relating to handling of different degrees of overload:


	* Unresponsive flows (L and/or C) but not overloaded, i.e. the sum		o Unresponsive flows (L and/or C) but not overloaded, i.e. the sum
	of unresponsive load before adding any responsive traffic is below		of unresponsive load before adding any responsive traffic is below
	capacity;		capacity;

	This case is handled by the regular Coupled DualQ (Section 2.1)		This case is handled by the regular Coupled DualQ (Section 2.1)
	but not discussed there. So below, Section 4.2.1 explains the		but not discussed there. So below, Section 4.2.1 explains the
	design goal, and how it is achieved in practice;		design goal, and how it is achieved in practice;


	* Unresponsive flows (L and/or C) causing persistent overload,		o Unresponsive flows (L and/or C) causing persistent overload,
	i.e. the sum of unresponsive load even before adding any		i.e. the sum of unresponsive load even before adding any
	responsive traffic persistently exceeds capacity;		responsive traffic persistently exceeds capacity;

	This case is not covered by the regular Coupled DualQ mechanism		This case is not covered by the regular Coupled DualQ mechanism
	(Section 2.1) but the last para in Section 2.5.1.1 sets out a		(Section 2.1) but the last para in Section 2.5.1.1 sets out a
	requirement to handle the case where ECN-capable traffic could		requirement to handle the case where ECN-capable traffic could
	starve non-ECN-capable traffic. Section 4.2.3 below discusses		starve non-ECN-capable traffic. Section 4.2.3 below discusses
	the general options and gives specific examples.		the general options and gives specific examples.


	* Short-term overload that lies between the 'not overloaded' and		o Short-term overload that lies between the 'not overloaded' and
	'persistently overloaded' cases.		'persistently overloaded' cases.

	For the period before overload is deemed persistent,		For the period before overload is deemed persistent,
	Section 4.2.2 discusses options for more immediate mechanisms		Section 4.2.2 discusses options for more immediate mechanisms
	at the scheduler timescale. These prevent short-term		at the scheduler timescale. These prevent short-term
	starvation of the C queue by making the priority of the L queue		starvation of the C queue by making the priority of the L queue
	conditional, as required in Section 2.5.1.		conditional, as required in Section 2.5.1.

	4.2.1. Unresponsive Traffic without Overload		4.2.1. Unresponsive Traffic without Overload

	When one or more L flows and/or C flows are unresponsive, but their		When one or more L flows and/or C flows are unresponsive, but their
	total load is within the link capacity so that they do not saturate		total load is within the link capacity so that they do not saturate
	the coupled marking (below 100%), the goal of a DualQ AQM is to		the coupled marking (below 100%), the goal of a DualQ AQM is to
	behave no worse than a single-queue AQM.		behave no worse than a single-queue AQM.

	Tests have shown that this is indeed the case with no additional		Tests have shown that this is indeed the case with no additional
	mechanism beyond the regular Coupled DualQ of Section 2.1 (see the		mechanism beyond the regular Coupled DualQ of Section 2.1 (see the

	results of 'overload experiments' in [DCttH19]). Perhaps counter-		results of 'overload experiments' in [L4Seval22]). Perhaps counter-
	intuitively, whether the unresponsive flow classifies itself into the		intuitively, whether the unresponsive flow classifies itself into the
	L or the C queue, the DualQ system behaves as if it has subtracted		L or the C queue, the DualQ system behaves as if it has subtracted
	from the overall link capacity. Then, the coupling shares out the		from the overall link capacity. Then, the coupling shares out the
	remaining capacity between any competing responsive flows (in either		remaining capacity between any competing responsive flows (in either
	queue). See also Section 4.2.2, which discusses scheduler-specific		queue). See also Section 4.2.2, which discusses scheduler-specific
	details.		details.

	4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S Throughput		4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S Throughput
	or Delay?		or Delay?


	skipping to change at page 25, line 19 ¶		skipping to change at page 24, line 47 ¶
	Section 2.5.1) to avoid short-term starvation of Classic. Otherwise,		Section 2.5.1) to avoid short-term starvation of Classic. Otherwise,
	as explained in Section 2.4, even a lone responsive L4S flow could		as explained in Section 2.4, even a lone responsive L4S flow could
	temporarily block a small finite set of C packets (e.g. an initial		temporarily block a small finite set of C packets (e.g. an initial
	window or DNS request). The blockage would only be brief, but it		window or DNS request). The blockage would only be brief, but it
	could be longer for certain AQM implementations that can only		could be longer for certain AQM implementations that can only
	increase the congestion signal coupled from the C queue when C		increase the congestion signal coupled from the C queue when C
	packets are actually being dequeued. There is then the question of		packets are actually being dequeued. There is then the question of
	whether to sacrifice L4S throughput or L4S delay (or some other		whether to sacrifice L4S throughput or L4S delay (or some other
	policy) to make the priority conditional:		policy) to make the priority conditional:


	Sacrifice L4S throughput: By using weighted round-robin as the		Sacrifice L4S throughput: By using weighted round-robin as the
	conditional priority scheduler, the L4S service can sacrifice some		conditional priority scheduler, the L4S service can sacrifice some
	throughput during overload. This can either be thought of as		throughput during overload. This can either be thought of as
	guaranteeing a minimum throughput service for Classic traffic, or		guaranteeing a minimum throughput service for Classic traffic, or
	as guaranteeing a maximum delay for a packet at the head of the		as guaranteeing a maximum delay for a packet at the head of the
	Classic queue.		Classic queue.

	Cautionary note: a WRR scheduler can only guarantee Classic		Cautionary note: a WRR scheduler can only guarantee Classic
	throughput if Classic sources are sending enough to use it --		throughput if Classic sources are sending enough to use it --
	congestion signals can undermine scheduling because they determine		congestion signals can undermine scheduling because they determine
	how much responsive traffic of each class arrives for scheduling		how much responsive traffic of each class arrives for scheduling

	skipping to change at page 28, line 30 ¶		skipping to change at page 28, line 10 ¶
	As an example, experiments with the DualPI2 AQM (Appendix A) have		As an example, experiments with the DualPI2 AQM (Appendix A) have
	shown that introducing 'drop on saturation' at 100% coupled L4S		shown that introducing 'drop on saturation' at 100% coupled L4S
	marking addresses this problem with unresponsive ECN as well as		marking addresses this problem with unresponsive ECN as well as
	addressing the saturation problem. At saturation, DualPI2 switches		addressing the saturation problem. At saturation, DualPI2 switches
	into overload mode, where the base AQM is driven by the max delay of		into overload mode, where the base AQM is driven by the max delay of
	both queues and it introduces probabilistic drop to both queues		both queues and it introduces probabilistic drop to both queues
	equally. It leaves only a small range of congestion levels just		equally. It leaves only a small range of congestion levels just
	below saturation where unresponsive traffic gains any advantage from		below saturation where unresponsive traffic gains any advantage from
	using the ECN capability (relative to being unresponsive without		using the ECN capability (relative to being unresponsive without
	ECN), and the advantage is hardly detectable (see [DualQ-Test] and		ECN), and the advantage is hardly detectable (see [DualQ-Test] and

	section IV-E of [DCttH19]. Also overload with an unresponsive ECT(1)		section IV-G of [L4Seval22]. Also overload with an unresponsive
	flow gets no more bandwidth advantage than with ECT(0).		ECT(1) flow gets no more bandwidth advantage than with ECT(0).

	5. References		5. References

	5.1. Normative References		5.1. Normative References

	[I-D.ietf-tsvwg-ecn-l4s-id]		[I-D.ietf-tsvwg-ecn-l4s-id]

	Schepper, K. D. and B. Briscoe, "Explicit Congestion		Schepper, K. and B. Briscoe, "Explicit Congestion
	Notification (ECN) Protocol for Very Low Queuing Delay		Notification (ECN) Protocol for Ultra-Low Queuing Delay
	(L4S)", Work in Progress, Internet-Draft, draft-ietf-		(L4S)", draft-ietf-tsvwg-ecn-l4s-id-14 (work in progress),
	tsvwg-ecn-l4s-id-28, 8 August 2022,		March 2021.
	<https://datatracker.ietf.org/api/v1/doc/document/draft-
	ietf-tsvwg-ecn-l4s-id/>.

	[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate		[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
	Requirement Levels", BCP 14, RFC 2119,		Requirement Levels", BCP 14, RFC 2119,
	DOI 10.17487/RFC2119, March 1997,		DOI 10.17487/RFC2119, March 1997,
	<https://www.rfc-editor.org/info/rfc2119>.		<https://www.rfc-editor.org/info/rfc2119>.

	[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition		[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
	of Explicit Congestion Notification (ECN) to IP",		of Explicit Congestion Notification (ECN) to IP",
	RFC 3168, DOI 10.17487/RFC3168, September 2001,		RFC 3168, DOI 10.17487/RFC3168, September 2001,
	<https://www.rfc-editor.org/info/rfc3168>.		<https://www.rfc-editor.org/info/rfc3168>.

	skipping to change at page 29, line 32 ¶		skipping to change at page 29, line 7 ¶

	[AQMmetrics]		[AQMmetrics]
	Kwon, M. and S. Fahmy, "A Comparison of Load-based and		Kwon, M. and S. Fahmy, "A Comparison of Load-based and
	Queue- based Active Queue Management Algorithms", Proc.		Queue- based Active Queue Management Algorithms", Proc.
	Int'l Soc. for Optical Engineering (SPIE) 4866:35--46 DOI:		Int'l Soc. for Optical Engineering (SPIE) 4866:35--46 DOI:
	10.1117/12.473021, 2002,		10.1117/12.473021, 2002,
	<https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>.		<https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>.

	[ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An		[ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An
	Algorithm for Increasing the Robustness of RED's Active		Algorithm for Increasing the Robustness of RED's Active

	Queue Management", ACIRI Technical Report , August 2001,		Queue Management", ACIRI Technical Report 301, August
	<https://www.icir.org/floyd/red.html>.		2001, <http://www.icsi.berkeley.edu/icsi/node/2032>.

	[BBRv2] Cardwell, N., "BRTCP BBR v2 Alpha/Preview Release", GitHub		[BBRv2] Cardwell, N., "BRTCP BBR v2 Alpha/Preview Release", GitHub
	repository; Linux congestion control module,		repository; Linux congestion control module,
	<https://github.com/google/bbr/blob/v2alpha/README.md>.		<https://github.com/google/bbr/blob/v2alpha/README.md>.

	[Boru20] Boru Oljira, D., Grinnemo, K-J., Brunstrom, A., and J.		[Boru20] Boru Oljira, D., Grinnemo, K-J., Brunstrom, A., and J.
	Taheri, "Validating the Sharing Behavior and Latency		Taheri, "Validating the Sharing Behavior and Latency
	Characteristics of the L4S Architecture", ACM CCR		Characteristics of the L4S Architecture", ACM CCR
	50(2):37--44, May 2020,		50(2):37--44, May 2020,
	<https://dl.acm.org/doi/abs/10.1145/3402413.3402419>.		<https://dl.acm.org/doi/abs/10.1145/3402413.3402419>.

	skipping to change at page 30, line 15 ¶		skipping to change at page 29, line 37 ¶
	[CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay",		[CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay",
	ACM Queue 10(5), May 2012,		ACM Queue 10(5), May 2012,
	<https://queue.acm.org/issuedetail.cfm?issue=2208917>.		<https://queue.acm.org/issuedetail.cfm?issue=2208917>.

	[CRED_Insights]		[CRED_Insights]
	Briscoe, B., "Insights from Curvy RED (Random Early		Briscoe, B., "Insights from Curvy RED (Random Early
	Detection)", BT Technical Report TR-TUB8-2015-003		Detection)", BT Technical Report TR-TUB8-2015-003
	arXiv:1904.07339 [cs.NI], July 2015,		arXiv:1904.07339 [cs.NI], July 2015,
	<https://arxiv.org/abs/1904.07339>.		<https://arxiv.org/abs/1904.07339>.


	[DCttH19] De Schepper, K., Bondarenko, O., Tilmans, O., and B.
	Briscoe, "`Data Centre to the Home': Ultra-Low Latency for
	All", Updated RITE project Technical Report , July 2019,
	<https://bobbriscoe.net/pubs.html#DCttH_TR>.

	[DOCSIS3.1]		[DOCSIS3.1]
	CableLabs, "MAC and Upper Layer Protocols Interface		CableLabs, "MAC and Upper Layer Protocols Interface
	(MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable		(MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable

	Service Interface Specifications DOCSIS® 3.1 Version i17		Service Interface Specifications DOCSIS(R) 3.1 Version i17
	or later, 21 January 2019, <https://specification-		or later, January 2019, <https://specification-
	search.cablelabs.com/CM-SP-MULPIv3.1>.		search.cablelabs.com/CM-SP-MULPIv3.1>.

	[DualPI2Linux]		[DualPI2Linux]
	Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O.,		Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O.,
	and H. Steen, "DUALPI2 - Low Latency, Low Loss and		and H. Steen, "DUALPI2 - Low Latency, Low Loss and
	Scalable (L4S) AQM", Proc. Linux Netdev 0x13 , March 2019,		Scalable (L4S) AQM", Proc. Linux Netdev 0x13 , March 2019,
	<https://www.netdevconf.org/0x13/session.html?talk-		<https://www.netdevconf.org/0x13/session.html?talk-
	DUALPI2-AQM>.		DUALPI2-AQM>.

	[DualQ-Test]		[DualQ-Test]
	Steen, H., "Destruction Testing: Ultra-Low Delay using		Steen, H., "Destruction Testing: Ultra-Low Delay using
	Dual Queue Coupled Active Queue Management", Master's		Dual Queue Coupled Active Queue Management", Master's

	Thesis, Dept of Informatics, Uni Oslo , May 2017,		Thesis, Dept of Informatics, Uni Oslo , May 2017.
	<https://www.duo.uio.no/bitstream/handle/10852/57424/
	thesis-henrste.pdf?sequence=1>.

	[Dukkipati06]		[Dukkipati06]
	Dukkipati, N. and N. McKeown, "Why Flow-Completion Time is		Dukkipati, N. and N. McKeown, "Why Flow-Completion Time is
	the Right Metric for Congestion Control", ACM CCR		the Right Metric for Congestion Control", ACM CCR
	36(1):59--62, January 2006,		36(1):59--62, January 2006,
	<https://dl.acm.org/doi/10.1145/1111322.1111336>.		<https://dl.acm.org/doi/10.1145/1111322.1111336>.

	[Heist21] Heist, P. and J. Morton, "L4S Tests", GitHub README,		[Heist21] Heist, P. and J. Morton, "L4S Tests", GitHub README,
	August 2021, <https://github.com/heistp/l4s-		August 2021, <https://github.com/heistp/l4s-
	tests/#underutilization-with-bursty-traffic>.		tests/#underutilization-with-bursty-traffic>.

	[I-D.briscoe-docsis-q-protection]		[I-D.briscoe-docsis-q-protection]

	Briscoe, B. and G. White, "The DOCSIS(r) Queue Protection		Briscoe, B. and G. White, "Queue Protection to Preserve
	Algorithm to Preserve Low Latency", Work in Progress,		Low Latency", draft-briscoe-docsis-q-protection-00 (work
	Internet-Draft, draft-briscoe-docsis-q-protection-06, 13		in progress), July 2019.
	May 2022,
	<https://datatracker.ietf.org/api/v1/doc/document/draft-
	briscoe-docsis-q-protection/>.

	[I-D.briscoe-iccrg-prague-congestion-control]		[I-D.briscoe-iccrg-prague-congestion-control]
	Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague		Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague

	Congestion Control", Work in Progress, Internet-Draft,		Congestion Control", draft-briscoe-iccrg-prague-
	draft-briscoe-iccrg-prague-congestion-control-01, 11 July		congestion-control-01 (work in progress), July 2022,
	2022, <https://datatracker.ietf.org/api/v1/doc/document/		<https://www.ietf.org/archive/id/draft-briscoe-iccrg-
	draft-briscoe-iccrg-prague-congestion-control/>.		prague-congestion-control-01.txt>.

	[I-D.briscoe-tsvwg-l4s-diffserv]		[I-D.briscoe-tsvwg-l4s-diffserv]
	Briscoe, B., "Interactions between Low Latency, Low Loss,		Briscoe, B., "Interactions between Low Latency, Low Loss,
	Scalable Throughput (L4S) and Differentiated Services",		Scalable Throughput (L4S) and Differentiated Services",

	Work in Progress, Internet-Draft, draft-briscoe-tsvwg-l4s-		draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress),
	diffserv-02, 2 July 2018,		November 2018.
	<https://datatracker.ietf.org/api/v1/doc/document/draft-
	briscoe-tsvwg-l4s-diffserv/>.

	[I-D.cardwell-iccrg-bbr-congestion-control]		[I-D.cardwell-iccrg-bbr-congestion-control]

	Cardwell, N., Cheng, Y., Yeganeh, S. H., Swett, I., and V.		Cardwell, N., Cheng, Y., Yeganeh, S., and V. Jacobson,
	Jacobson, "BBR Congestion Control", Work in Progress,		"BBR Congestion Control", draft-cardwell-iccrg-bbr-
	Internet-Draft, draft-cardwell-iccrg-bbr-congestion-		congestion-control-00 (work in progress), July 2017.
	control-02, 7 March 2022,
	<https://datatracker.ietf.org/api/v1/doc/document/draft-
	cardwell-iccrg-bbr-congestion-control/>.

	[I-D.ietf-tsvwg-l4s-arch]		[I-D.ietf-tsvwg-l4s-arch]

	Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White,		Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low
	"Low Latency, Low Loss, Scalable Throughput (L4S) Internet		Latency, Low Loss, Scalable Throughput (L4S) Internet
	Service: Architecture", Work in Progress, Internet-Draft,		Service: Architecture", draft-ietf-tsvwg-l4s-arch-08 (work
	draft-ietf-tsvwg-l4s-arch-19, 27 July 2022,		in progress), November 2020.
	<https://datatracker.ietf.org/api/v1/doc/document/draft-
	ietf-tsvwg-l4s-arch/>.

	[I-D.mathis-iccrg-relentless-tcp]		[I-D.mathis-iccrg-relentless-tcp]

	Mathis, M., "Relentless Congestion Control", Work in		Mathis, M., "Relentless Congestion Control", draft-mathis-
	Progress, Internet-Draft, draft-mathis-iccrg-relentless-		iccrg-relentless-tcp-00 (work in progress), March 2009.
	tcp-00, 4 March 2009, <https://www.ietf.org/archive/id/
	draft-mathis-iccrg-relentless-tcp-00.txt>.		[L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Oestberg, C.,
			Johansson, I., Strand, J., Ledl, P., and D. Schnieders,
			"Enabling time-critical applications over 5G with rate
			adaptation", Ericsson - Deutsche Telekom White Paper BNEW-
			21:025455 Uen, May 2021, <https://www.ericsson.com/en/
			reports-and-papers/white-papers/enabling-time-critical-
			applications-over-5g-with-rate-adaptation>.

	[L4Sdemo16]		[L4Sdemo16]
	Bondarenko, O., De Schepper, K., Tsang, I., and B.		Bondarenko, O., De Schepper, K., Tsang, I., and B.
	Briscoe, "Ultra-Low Delay for All: Live Experience, Live		Briscoe, "Ultra-Low Delay for All: Live Experience, Live
	Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016,		Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016,
	<https//dl.acm.org/citation.cfm?doid=2910017.2910633		<https//dl.acm.org/citation.cfm?doid=2910017.2910633
	(videos of demos:		(videos of demos:
	https://riteproject.eu/dctth/#1511dispatchwg )>.		https://riteproject.eu/dctth/#1511dispatchwg )>.


	[L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Östberg, C.,		[L4Seval22]
	Johansson, I., Strand, J., Lédl, P., and D. Schnieders,		De Schepper, K., Albisser, O., Tilmans, O., and B.
	"Enabling time-critical applications over 5G with rate		Briscoe, "Dual Queue Coupled AQM: Deployable Very Low
	adaptation", Ericsson - Deutsche Telekom White Paper BNEW-		Queuing Delay for All", Preprint submitted to IEEE/ACM
	21:025455 Uen, May 2021, <https://www.ericsson.com/en/		Transactions on Networking arXiv:2209.01078 [cs.NI],
	reports-and-papers/white-papers/enabling-time-critical-		September 2022, <https://arxiv.org/abs/2209.01078>.
	applications-over-5g-with-rate-adaptation>.

	[Labovitz10]		[Labovitz10]
	Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide,		Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide,
	J., and F. Jahanian, "Internet Inter-Domain Traffic", Proc		J., and F. Jahanian, "Internet Inter-Domain Traffic", Proc
	ACM SIGCOMM; ACM CCR 40(4):75--86, August 2010,		ACM SIGCOMM; ACM CCR 40(4):75--86, August 2010,
	<https://doi.org/10.1145/1851275.1851194>.		<https://doi.org/10.1145/1851275.1851194>.

	[LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency		[LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency
	DOCSIS: Technology Overview", CableLabs White Paper ,		DOCSIS: Technology Overview", CableLabs White Paper ,
	February 2019, <https://cablela.bs/low-latency-docsis-		February 2019, <https://cablela.bs/low-latency-docsis-
	technology-overview-february-2019>.		technology-overview-february-2019>.

	[MEDF] Menth, M., Schmid, M., Heiss, H., and T. Reim, "MEDF - a		[MEDF] Menth, M., Schmid, M., Heiss, H., and T. Reim, "MEDF - a
	simple scheduling algorithm for two real-time transport		simple scheduling algorithm for two real-time transport
	service classes with application in the UTRAN", Proc. IEEE		service classes with application in the UTRAN", Proc. IEEE
	Conference on Computer Communications (INFOCOM'03) Vol.2		Conference on Computer Communications (INFOCOM'03) Vol.2

	pp.1116-1122, March 2003,		pp.1116-1122, March 2003.
	<https://infocom2003.ieee-infocom.org/papers/27_04.PDF>.

	[PI2] De Schepper, K., Bondarenko, O., Briscoe, B., and I.		[PI2] De Schepper, K., Bondarenko, O., Briscoe, B., and I.
	Tsang, "PI2: A Linearized AQM for both Classic and		Tsang, "PI2: A Linearized AQM for both Classic and
	Scalable TCP", ACM CoNEXT'16 , December 2016,		Scalable TCP", ACM CoNEXT'16 , December 2016,

	<https://riteproject.files.wordpress.com/2015/10/		<https://dl.acm.org/doi/10.1145/2999572.2999578>.
	pi2_conext.pdf>.


	[PI2param] Briscoe, B., "PI2 Parameters", Technical Report TR-BB-		[PI2param]
			Briscoe, B., "PI2 Parameters", Technical Report TR-BB-
	2021-001 arXiv:2107.01003 [cs.NI], July 2021,		2021-001 arXiv:2107.01003 [cs.NI], July 2021,
	<https://arxiv.org/abs/2107.01003>.		<https://arxiv.org/abs/2107.01003>.

	[PragueLinux]		[PragueLinux]
	Briscoe, B., De Schepper, K., Albisser, O., Misund, J.,		Briscoe, B., De Schepper, K., Albisser, O., Misund, J.,

	Tilmans, O., Kühlewind, M., and A.S. Ahmed, "Implementing		Tilmans, O., Kuehlewind, M., and A. Ahmed, "Implementing
	the `TCP Prague' Requirements for Low Latency Low Loss		the `TCP Prague' Requirements for Low Latency Low Loss
	Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 ,		Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 ,
	March 2019, <https://www.netdevconf.org/0x13/		March 2019, <https://www.netdevconf.org/0x13/
	session.html?talk-tcp-prague-l4s>.		session.html?talk-tcp-prague-l4s>.

	[RFC0970] Nagle, J., "On Packet Switches With Infinite Storage",		[RFC0970] Nagle, J., "On Packet Switches With Infinite Storage",
	RFC 970, DOI 10.17487/RFC0970, December 1985,		RFC 970, DOI 10.17487/RFC0970, December 1985,
	<https://www.rfc-editor.org/info/rfc970>.		<https://www.rfc-editor.org/info/rfc970>.

	[RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,		[RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,

	skipping to change at page 33, line 21 ¶		skipping to change at page 32, line 34 ¶
	Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,		Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
	S., Wroclawski, J., and L. Zhang, "Recommendations on		S., Wroclawski, J., and L. Zhang, "Recommendations on
	Queue Management and Congestion Avoidance in the		Queue Management and Congestion Avoidance in the
	Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998,		Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998,
	<https://www.rfc-editor.org/info/rfc2309>.		<https://www.rfc-editor.org/info/rfc2309>.

	[RFC2914] Floyd, S., "Congestion Control Principles", BCP 41,		[RFC2914] Floyd, S., "Congestion Control Principles", BCP 41,
	RFC 2914, DOI 10.17487/RFC2914, September 2000,		RFC 2914, DOI 10.17487/RFC2914, September 2000,
	<https://www.rfc-editor.org/info/rfc2914>.		<https://www.rfc-editor.org/info/rfc2914>.


	[RFC3246] Davie, B., Charny, A., Bennet, J.C.R., Benson, K., Le		[RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec,
	Boudec, J.Y., Courtney, W., Davari, S., Firoiu, V., and D.		J., Courtney, W., Davari, S., Firoiu, V., and D.
	Stiliadis, "An Expedited Forwarding PHB (Per-Hop		Stiliadis, "An Expedited Forwarding PHB (Per-Hop
	Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002,		Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002,
	<https://www.rfc-editor.org/info/rfc3246>.		<https://www.rfc-editor.org/info/rfc3246>.

	[RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows",		[RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows",
	RFC 3649, DOI 10.17487/RFC3649, December 2003,		RFC 3649, DOI 10.17487/RFC3649, December 2003,
	<https://www.rfc-editor.org/info/rfc3649>.		<https://www.rfc-editor.org/info/rfc3649>.

	[RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion		[RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion
	Control Algorithms", BCP 133, RFC 5033,		Control Algorithms", BCP 133, RFC 5033,

	skipping to change at page 35, line 9 ¶		skipping to change at page 34, line 19 ¶

	[RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of		[RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of
	Pervasive Encryption on Operators", RFC 8404,		Pervasive Encryption on Operators", RFC 8404,
	DOI 10.17487/RFC8404, July 2018,		DOI 10.17487/RFC8404, July 2018,
	<https://www.rfc-editor.org/info/rfc8404>.		<https://www.rfc-editor.org/info/rfc8404>.

	[SCReAM] Johansson, I., "SCReAM", GitHub repository; ,		[SCReAM] Johansson, I., "SCReAM", GitHub repository; ,
	<https://github.com/EricssonResearch/scream/blob/master/		<https://github.com/EricssonResearch/scream/blob/master/
	README.md>.		README.md>.


	[SigQ-Dyn] Briscoe, B., "Rapid Signalling of Queue Dynamics",		[SigQ-Dyn]
			Briscoe, B., "Rapid Signalling of Queue Dynamics",
	Technical Report TR-BB-2017-001 arXiv:1904.07044 [cs.NI],		Technical Report TR-BB-2017-001 arXiv:1904.07044 [cs.NI],
	September 2017, <https://arxiv.org/abs/1904.07044>.		September 2017, <https://arxiv.org/abs/1904.07044>.

	Appendix A. Example DualQ Coupled PI2 Algorithm		Appendix A. Example DualQ Coupled PI2 Algorithm

	As a first concrete example, the pseudocode below gives the DualPI2		As a first concrete example, the pseudocode below gives the DualPI2
	algorithm. DualPI2 follows the structure of the DualQ Coupled AQM		algorithm. DualPI2 follows the structure of the DualQ Coupled AQM
	framework in Figure 1. A simple ramp function (configured in units		framework in Figure 1. A simple ramp function (configured in units
	of queuing time) with unsmoothed ECN marking is used for the Native		of queuing time) with unsmoothed ECN marking is used for the Native
	L4S AQM. The ramp can also be configured as a step function. The		L4S AQM. The ramp can also be configured as a step function. The

	skipping to change at page 36, line 5 ¶		skipping to change at page 35, line 13 ¶
	[DualPI2Linux]. The specification of the DualQ Coupled AQM for		[DualPI2Linux]. The specification of the DualQ Coupled AQM for
	DOCSIS cable modems and CMTSs is available in [DOCSIS3.1] and		DOCSIS cable modems and CMTSs is available in [DOCSIS3.1] and
	explained in [LLD].		explained in [LLD].

	A.1. Pass #1: Core Concepts		A.1. Pass #1: Core Concepts

	The pseudocode manipulates three main structures of variables: the		The pseudocode manipulates three main structures of variables: the
	packet (pkt), the L4S queue (lq) and the Classic queue (cq). The		packet (pkt), the L4S queue (lq) and the Classic queue (cq). The
	pseudocode consists of the following six functions:		pseudocode consists of the following six functions:


	* The initialization function dualpi2_params_init(...) (Figure 2)		o The initialization function dualpi2_params_init(...) (Figure 2)
	that sets parameter defaults (the API for setting non-default		that sets parameter defaults (the API for setting non-default
	values is omitted for brevity)		values is omitted for brevity)


	* The enqueue function dualpi2_enqueue(lq, cq, pkt) (Figure 3)		o The enqueue function dualpi2_enqueue(lq, cq, pkt) (Figure 3)


	* The dequeue function dualpi2_dequeue(lq, cq, pkt) (Figure 4)		o The dequeue function dualpi2_dequeue(lq, cq, pkt) (Figure 4)


	* The recurrence function recur(q, likelihood) for de-randomized ECN		o The recurrence function recur(q, likelihood) for de-randomized ECN
	marking (shown at the end of Figure 4).		marking (shown at the end of Figure 4).


	* The L4S AQM function laqm(qdelay) (Figure 5) used to calculate the		o The L4S AQM function laqm(qdelay) (Figure 5) used to calculate the
	ECN-marking probability for the L4S queue		ECN-marking probability for the L4S queue


	* The base AQM function that implements the PI algorithm		o The base AQM function that implements the PI algorithm
	dualpi2_update(lq, cq) (Figure 6) used to regularly update the		dualpi2_update(lq, cq) (Figure 6) used to regularly update the
	base probability (p'), which is squared for the Classic AQM as		base probability (p'), which is squared for the Classic AQM as
	well as being coupled across to the L4S queue.		well as being coupled across to the L4S queue.

	It also uses the following functions that are not shown in full here:		It also uses the following functions that are not shown in full here:


	* scheduler(), which selects between the head packets of the two		o scheduler(), which selects between the head packets of the two
	queues; the choice of scheduler technology is discussed later;		queues; the choice of scheduler technology is discussed later;


	* cq.byt() or lq.byt() returns the current length (aka. backlog) of		o cq.byt() or lq.byt() returns the current length (aka. backlog) of
	the relevant queue in bytes;		the relevant queue in bytes;


	* cq.len() or lq.len() returns the current length of the relevant		o cq.len() or lq.len() returns the current length of the relevant
	queue in packets;		queue in packets;


	* cq.time() or lq.time() returns the current queuing delay of the		o cq.time() or lq.time() returns the current queuing delay of the
	relevant queue in units of time (see Note a);		relevant queue in units of time (see Note a);


	* mark(pkt) and drop(pkt) for ECN-marking and dropping a packet;		o mark(pkt) and drop(pkt) for ECN-marking and dropping a packet;

	In experiments so far (building on experiments with PIE) on broadband		In experiments so far (building on experiments with PIE) on broadband
	access links ranging from 4 Mb/s to 200 Mb/s with base RTTs from 5 ms		access links ranging from 4 Mb/s to 200 Mb/s with base RTTs from 5 ms
	to 100 ms, DualPI2 achieves good results with the default parameters		to 100 ms, DualPI2 achieves good results with the default parameters
	in Figure 2. The parameters are categorised by whether they relate		in Figure 2. The parameters are categorised by whether they relate
	to the Base PI2 AQM, the L4S AQM or the framework coupling them		to the Base PI2 AQM, the L4S AQM or the framework coupling them
	together. Constants and variables derived from these parameters are		together. Constants and variables derived from these parameters are
	also included at the end of each category. Each parameter is		also included at the end of each category. Each parameter is
	explained as it is encountered in the walk-through of the pseudocode		explained as it is encountered in the walk-through of the pseudocode
	below, and the rationale for the chosen defaults are given so that		below, and the rationale for the chosen defaults are given so that

	skipping to change at page 38, line 16 ¶		skipping to change at page 37, line 16 ¶
	2: if ( lq.byt() + cq.byt() + MTU > limit)		2: if ( lq.byt() + cq.byt() + MTU > limit)
	3: drop(pkt) % drop packet if buffer is full		3: drop(pkt) % drop packet if buffer is full
	4: timestamp(pkt) % only needed if using the sojourn technique		4: timestamp(pkt) % only needed if using the sojourn technique
	5: % Packet classifier		5: % Packet classifier
	6: if ( ecn(pkt) modulo 2 == 1 ) % ECN bits = ECT(1) or CE		6: if ( ecn(pkt) modulo 2 == 1 ) % ECN bits = ECT(1) or CE
	7: lq.enqueue(pkt)		7: lq.enqueue(pkt)
	8: else % ECN bits = not-ECT or ECT(0)		8: else % ECN bits = not-ECT or ECT(0)
	9: cq.enqueue(pkt)		9: cq.enqueue(pkt)
	10: }		10: }


	Figure 3: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM		Figure 3: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM

	1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues		1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues
	2: while ( lq.byt() + cq.byt() > 0 ) {		2: while ( lq.byt() + cq.byt() > 0 ) {
	3: if ( scheduler() == lq ) {		3: if ( scheduler() == lq ) {
	4: lq.dequeue(pkt) % Scheduler chooses lq		4: lq.dequeue(pkt) % Scheduler chooses lq
	5: p'_L = laqm(lq.time()) % Native LAQM		5: p'_L = laqm(lq.time()) % Native LAQM
	6: p_L = max(p'_L, p_CL) % Combining function		6: p_L = max(p'_L, p_CL) % Combining function
	7: if ( recur(lq, p_L) ) % Linear marking		7: if ( recur(lq, p_L) ) % Linear marking
	8: mark(pkt)		8: mark(pkt)
	9: } else {		9: } else {

	skipping to change at page 38, line 50 ¶		skipping to change at page 37, line 50 ¶

	23: recur(q, likelihood) { % Returns TRUE with a certain likelihood		23: recur(q, likelihood) { % Returns TRUE with a certain likelihood
	24: q.count += likelihood		24: q.count += likelihood
	25: if (q.count > 1) {		25: if (q.count > 1) {
	26: q.count -= 1		26: q.count -= 1
	27: return TRUE		27: return TRUE
	28: }		28: }
	29: return FALSE		29: return FALSE
	30: }		30: }


	Figure 4: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM		Figure 4: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM

	When packets arrive, first a common queue limit is checked as shown		When packets arrive, first a common queue limit is checked as shown
	in line 2 of the enqueuing pseudocode in Figure 3. This assumes a		in line 2 of the enqueuing pseudocode in Figure 3. This assumes a
	shared buffer for the two queues (Note b discusses the merits of		shared buffer for the two queues (Note b discusses the merits of
	separate buffers). In order to avoid any bias against larger		separate buffers). In order to avoid any bias against larger
	packets, 1 MTU of space is always allowed, and the limit is		packets, 1 MTU of space is always allowed, and the limit is
	deliberately tested before enqueue.		deliberately tested before enqueue.

	If limit is not exceeded, the packet is timestamped in line 4 (only		If limit is not exceeded, the packet is timestamped in line 4 (only
	if the sojourn time technique is being used to measure queue delay;		if the sojourn time technique is being used to measure queue delay;

	skipping to change at page 39, line 43 ¶		skipping to change at page 38, line 43 ¶
	loop sloppier (for a typical RTT it would double the Classic queue's		loop sloppier (for a typical RTT it would double the Classic queue's
	feedback delay).		feedback delay).

	All the dequeue code is contained within a large while loop so that		All the dequeue code is contained within a large while loop so that
	if it decides to drop a packet, it will continue until it selects a		if it decides to drop a packet, it will continue until it selects a
	packet to schedule. Line 3 of the dequeue pseudocode is where the		packet to schedule. Line 3 of the dequeue pseudocode is where the
	scheduler chooses between the L4S queue (lq) and the Classic queue		scheduler chooses between the L4S queue (lq) and the Classic queue
	(cq). Detailed implementation of the scheduler is not shown (see		(cq). Detailed implementation of the scheduler is not shown (see
	discussion later).		discussion later).


	* If an L4S packet is scheduled, in lines 7 and 8 the packet is ECN-		o If an L4S packet is scheduled, in lines 7 and 8 the packet is ECN-
	marked with likelihood p_L. The recur() function at the end of		marked with likelihood p_L. The recur() function at the end of
	Figure 4 is used, which is preferred over random marking because		Figure 4 is used, which is preferred over random marking because
	it avoids delay due to randomization when interpreting congestion		it avoids delay due to randomization when interpreting congestion
	signals, but it still desynchronizes the saw-teeth of the flows.		signals, but it still desynchronizes the saw-teeth of the flows.
	Line 6 calculates p_L as the maximum of the coupled L4S		Line 6 calculates p_L as the maximum of the coupled L4S
	probability p_CL and the probability from the native L4S AQM p'_L.		probability p_CL and the probability from the native L4S AQM p'_L.
	This implements the max() function shown in Figure 1 to couple the		This implements the max() function shown in Figure 1 to couple the
	outputs of the two AQMs together. Of the two probabilities input		outputs of the two AQMs together. Of the two probabilities input
	to p_L in line 6:		to p_L in line 6:


	- p'_L is calculated per packet in line 5 by the laqm() function		* p'_L is calculated per packet in line 5 by the laqm() function
	(see Figure 5),		(see Figure 5),


	- Whereas p_CL is maintained by the dualpi2_update() function		* Whereas p_CL is maintained by the dualpi2_update() function
	which runs every Tupdate (Tupdate is set in line 12 of		which runs every Tupdate (Tupdate is set in line 12 of
	Figure 2).		Figure 2).


	* If a Classic packet is scheduled, lines 10 to 17 drop or mark the		o If a Classic packet is scheduled, lines 10 to 17 drop or mark the
	packet with probability p_C.		packet with probability p_C.

	The Native L4S AQM algorithm (Figure 5) is a ramp function, similar		The Native L4S AQM algorithm (Figure 5) is a ramp function, similar
	to the RED algorithm, but simplified as follows:		to the RED algorithm, but simplified as follows:


	* The extent of the ramp is defined in units of queuing delay, not		o The extent of the ramp is defined in units of queuing delay, not
	bytes, so that configuration remains invariant as the queue		bytes, so that configuration remains invariant as the queue
	departure rate varies.		departure rate varies.


	* It uses instantaneous queueing delay, which avoids the complexity		o It uses instantaneous queueing delay, which avoids the complexity
	of smoothing, but also avoids embedding a worst-case RTT of		of smoothing, but also avoids embedding a worst-case RTT of
	smoothing delay in the network (see Section 2.1).		smoothing delay in the network (see Section 2.1).


	* The ramp rises linearly directly from 0 to 1, not to an		o The ramp rises linearly directly from 0 to 1, not to an
	intermediate value of p'_L as RED would, because there is no need		intermediate value of p'_L as RED would, because there is no need
	to keep ECN marking probability low.		to keep ECN marking probability low.


	* Marking does not have to be randomized. Determinism is used		o Marking does not have to be randomized. Determinism is used
	instead of randomness; to reduce the delay necessary to smooth out		instead of randomness; to reduce the delay necessary to smooth out
	the noise of randomness from the signal.		the noise of randomness from the signal.

	The ramp function requires two configuration parameters, the minimum		The ramp function requires two configuration parameters, the minimum
	threshold (minTh) and the width of the ramp (range), both in units of		threshold (minTh) and the width of the ramp (range), both in units of
	queuing time, as shown in lines 17 & 18 of the initialization		queuing time, as shown in lines 17 & 18 of the initialization
	function in Figure 2. The ramp function can be configured as a step		function in Figure 2. The ramp function can be configured as a step
	(see Note c).		(see Note c).

	Although the DCTCP paper [Alizadeh-stability] recommends an ECN		Although the DCTCP paper [Alizadeh-stability] recommends an ECN

	skipping to change at page 41, line 24 ¶		skipping to change at page 40, line 24 ¶
	Figure 5: Example Pseudocode for the Native L4S AQM		Figure 5: Example Pseudocode for the Native L4S AQM

	1: dualpi2_update(lq, cq) { % Update p' every Tupdate		1: dualpi2_update(lq, cq) { % Update p' every Tupdate
	2: curq = cq.time() % use queuing time of first-in Classic packet		2: curq = cq.time() % use queuing time of first-in Classic packet
	3: p' = p' + alpha * (curq - target) + beta * (curq - prevq)		3: p' = p' + alpha * (curq - target) + beta * (curq - prevq)
	4: p_CL = k * p' % Coupled L4S prob = base prob * coupling factor		4: p_CL = k * p' % Coupled L4S prob = base prob * coupling factor
	5: p_C = p'^2 % Classic prob = (base prob)^2		5: p_C = p'^2 % Classic prob = (base prob)^2
	6: prevq = curq		6: prevq = curq
	7: }		7: }


	Figure 6: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM

	(Clamping p' within the range [0,1] omitted for clarity - see text)		(Clamping p' within the range [0,1] omitted for clarity - see text)


			Figure 6: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM

	The coupled marking probability, p_CL depends on the base probability		The coupled marking probability, p_CL depends on the base probability
	(p'), which is kept up to date by the core PI algorithm in Figure 6		(p'), which is kept up to date by the core PI algorithm in Figure 6
	executed every Tupdate.		executed every Tupdate.

	Note that p' solely depends on the queuing time in the Classic queue.		Note that p' solely depends on the queuing time in the Classic queue.
	In line 2, the current queuing delay (curq) is evaluated from how		In line 2, the current queuing delay (curq) is evaluated from how
	long the head packet was in the Classic queue (cq). The function		long the head packet was in the Classic queue (cq). The function
	cq.time() (not shown) subtracts the time stamped at enqueue from the		cq.time() (not shown) subtracts the time stamped at enqueue from the
	current time (see Note a) and implicitly takes the current queuing		current time (see Note a) and implicitly takes the current queuing
	delay as 0 if the queue is empty.		delay as 0 if the queue is empty.

	skipping to change at page 42, line 13 ¶		skipping to change at page 41, line 12 ¶
	[PI2param], the target queuing delay on line 9 of Figure 2 is related		[PI2param], the target queuing delay on line 9 of Figure 2 is related
	to the typical base RTT worldwide, RTT_typ, by two factors: target =		to the typical base RTT worldwide, RTT_typ, by two factors: target =
	RTT_typ * g * f. Below we summarize the rationale behind these		RTT_typ * g * f. Below we summarize the rationale behind these
	factors and introduce a further adjustment. The two factors ensure		factors and introduce a further adjustment. The two factors ensure
	that, in a large proportion of cases (say 90%), the sawtooth		that, in a large proportion of cases (say 90%), the sawtooth
	variations in RTT of a single flow will fit within the buffer without		variations in RTT of a single flow will fit within the buffer without
	underutilizing the link. Frankly, these factors are educated		underutilizing the link. Frankly, these factors are educated
	guesses, but with the emphasis closer to 'educated' than to 'guess'		guesses, but with the emphasis closer to 'educated' than to 'guess'
	(see [PI2param] for full background):		(see [PI2param] for full background):


	* RTT_typ is taken as 25 ms. This is based on an average CDN		o RTT_typ is taken as 25 ms. This is based on an average CDN
	latency measured in each country weighted by the number of		latency measured in each country weighted by the number of
	Internet users in that country to produce an overall weighted		Internet users in that country to produce an overall weighted
	average for the Internet [PI2param]. Countries were ranked by		average for the Internet [PI2param]. Countries were ranked by
	number of Internet users, and once 90% of Internet users were		number of Internet users, and once 90% of Internet users were
	covered, smaller countries were excluded to avoid		covered, smaller countries were excluded to avoid
	unrepresentatively small sample sizes. Also, importantly, the		unrepresentatively small sample sizes. Also, importantly, the
	data for the average CDN latency in China (with the largest number		data for the average CDN latency in China (with the largest number
	of Internet users) has been removed, because the CDN latency was a		of Internet users) has been removed, because the CDN latency was a
	significant outlier and, on reflection, the experimental technique		significant outlier and, on reflection, the experimental technique
	seemed inappropriate to the CDN market in China.		seemed inappropriate to the CDN market in China.


	* g is taken as 0.38. The factor g is a geometry factor that		o g is taken as 0.38. The factor g is a geometry factor that
	characterizes the shape of the sawteeth of prevalent Classic		characterizes the shape of the sawteeth of prevalent Classic
	congestion controllers. The geometry factor is the fraction of		congestion controllers. The geometry factor is the fraction of
	the amplitude of the sawtooth variability in queue delay that lies		the amplitude of the sawtooth variability in queue delay that lies
	below the AQM's target. For instance, at low bit rate, the		below the AQM's target. For instance, at low bit rate, the
	geometry factor of standard Reno is 0.5, but at higher rates it		geometry factor of standard Reno is 0.5, but at higher rates it
	tends to just under 1. According to the census of congestion		tends to just under 1. According to the census of congestion
	controllers conducted by Mishra et al. in Jul-Oct		controllers conducted by Mishra et al. in Jul-Oct
	2019 [CCcensus19], most Classic TCP traffic uses Cubic. And,		2019 [CCcensus19], most Classic TCP traffic uses Cubic. And,
	according to the analysis in [PI2param], if running over a PI2		according to the analysis in [PI2param], if running over a PI2
	AQM, a large proportion of this Cubic traffic would be in its		AQM, a large proportion of this Cubic traffic would be in its
	Reno-Friendly mode, which has a geometry factor of ~0.39 (all		Reno-Friendly mode, which has a geometry factor of ~0.39 (all
	known implementations). The rest of the Cubic traffic would be in		known implementations). The rest of the Cubic traffic would be in
	true Cubic mode, which has a geometry factor of ~0.36. Without		true Cubic mode, which has a geometry factor of ~0.36. Without
	modelling the sawtooth profiles from all the other less prevalent		modelling the sawtooth profiles from all the other less prevalent
	congestion controllers, we estimate a 7:3 weighted average of		congestion controllers, we estimate a 7:3 weighted average of
	these two, resulting in an average geometry factor of 0.38.		these two, resulting in an average geometry factor of 0.38.


	* f is taken as 2. The factor f is a safety factor that increases		o f is taken as 2. The factor f is a safety factor that increases
	the target queue to allow for the distribution of RTT_typ around		the target queue to allow for the distribution of RTT_typ around
	its mean. Otherwise, the target queue would only avoid		its mean. Otherwise, the target queue would only avoid
	underutilization for those users below the mean. It also provides		underutilization for those users below the mean. It also provides
	a safety margin for the proportion of paths in use that span		a safety margin for the proportion of paths in use that span
	beyond the distance between a user and their local CDN.		beyond the distance between a user and their local CDN.
	Currently, no data is available on the variance of queue delay		Currently, no data is available on the variance of queue delay
	around the mean in each region, so there is plenty of room for		around the mean in each region, so there is plenty of room for
	this guess to become more educated.		this guess to become more educated.


	* [PI2param] recommends target = RTT_typ * g * f = 25ms * 0.38 * 2 =		o [PI2param] recommends target = RTT_typ * g * f = 25ms * 0.38 * 2 =
	19 ms. However, a further adjustment is warranted, because target		19 ms. However, a further adjustment is warranted, because target
	is moving year-on-year. The paper is based on data collected in		is moving year-on-year. The paper is based on data collected in
	2019, and it mentions evidence from speedtest.net that suggests		2019, and it mentions evidence from speedtest.net that suggests
	RTT_typ reduced by 17% (fixed) or 12% (mobile) between 2020 and		RTT_typ reduced by 17% (fixed) or 12% (mobile) between 2020 and
	2021. Therefore, we recommend a default of target = 15 ms at the		2021. Therefore, we recommend a default of target = 15 ms at the
	time of writing (2021).		time of writing (2021).

	Operators can always use the data and discussion in [PI2param] to		Operators can always use the data and discussion in [PI2param] to
	configure a more appropriate target for their environment. For		configure a more appropriate target for their environment. For
	instance, an operator might wish to question the assumptions called		instance, an operator might wish to question the assumptions called

	skipping to change at page 45, line 5 ¶		skipping to change at page 43, line 48 ¶
	Tupdate dependent on p'. Instead, in PI2, alpha and beta are		Tupdate dependent on p'. Instead, in PI2, alpha and beta are
	independent of p' because the squaring applied to Classic traffic		independent of p' because the squaring applied to Classic traffic
	tunes them inherently. This is explained in [PI2], which also		tunes them inherently. This is explained in [PI2], which also
	explains why this more principled approach removes the need for most		explains why this more principled approach removes the need for most
	of the heuristics that had to be added to PIE.		of the heuristics that had to be added to PIE.

	Nonetheless, an implementer might wish to add selected details to		Nonetheless, an implementer might wish to add selected details to
	either AQM. For instance the Linux reference DualPI2 implementation		either AQM. For instance the Linux reference DualPI2 implementation
	includes the following (not shown in the pseudocode above):		includes the following (not shown in the pseudocode above):


	* Classic and coupled marking or dropping (i.e. based on p_C and		o Classic and coupled marking or dropping (i.e. based on p_C and
	p_CL from the PI controller) is not applied to a packet if the		p_CL from the PI controller) is not applied to a packet if the
	aggregate queue length in bytes is < 2 MTU (prior to enqueuing the		aggregate queue length in bytes is < 2 MTU (prior to enqueuing the
	packet or dequeuing it, depending on whether the AQM is configured		packet or dequeuing it, depending on whether the AQM is configured
	to be applied at enqueue or dequeue);		to be applied at enqueue or dequeue);


	* In the WRR scheduler, the 'credit' indicating which queue should		o In the WRR scheduler, the 'credit' indicating which queue should
	transmit is only changed if there are packets in both queues		transmit is only changed if there are packets in both queues
	(i.e. if there is actual resource contention). This means that a		(i.e. if there is actual resource contention). This means that a
	properly paced L flow might never be delayed by the WRR. The WRR		properly paced L flow might never be delayed by the WRR. The WRR
	credit is reset in favour of the L queue when the link is idle.		credit is reset in favour of the L queue when the link is idle.

	An implementer might also wish to add other heuristics, e.g. burst		An implementer might also wish to add other heuristics, e.g. burst
	protection [RFC8033] or enhanced burst protection [RFC8034].		protection [RFC8033] or enhanced burst protection [RFC8034].

	Notes:		Notes:


	skipping to change at page 47, line 23 ¶		skipping to change at page 46, line 20 ¶

	The pseudocode given here applies where the link rate is unknown,		The pseudocode given here applies where the link rate is unknown,
	which is more common for software implementations that might be		which is more common for software implementations that might be
	deployed in scenarios where the link is shared with other queues. In		deployed in scenarios where the link is shared with other queues. In
	lines 5a to 5d in Figure 7 the native L4S marking probability, p'_L,		lines 5a to 5d in Figure 7 the native L4S marking probability, p'_L,
	is zeroed if the queue is only 1 packet (in the default		is zeroed if the queue is only 1 packet (in the default
	configuration).		configuration).

	Linux implementation note:		Linux implementation note:


	* In Linux, the check that the queue exceeds Th_len before marking		o In Linux, the check that the queue exceeds Th_len before marking
	with the native L4S AQM is actually at enqueue, not dequeue,		with the native L4S AQM is actually at enqueue, not dequeue,
	otherwise it would exempt the last packet of a burst from being		otherwise it would exempt the last packet of a burst from being
	marked. The result of the check is conveyed from enqueue to the		marked. The result of the check is conveyed from enqueue to the
	dequeue function via a boolean in the packet metadata.		dequeue function via a boolean in the packet metadata.

	Persistent overload is deemed to have occurred when Classic drop/		Persistent overload is deemed to have occurred when Classic drop/
	marking probability reaches p_Cmax. Above this point, the Classic		marking probability reaches p_Cmax. Above this point, the Classic
	drop probability is applied to both L and C queues, irrespective of		drop probability is applied to both L and C queues, irrespective of
	whether any packet is ECN-capable. ECT packets that are not dropped		whether any packet is ECN-capable. ECT packets that are not dropped
	can still be ECN-marked.		can still be ECN-marked.

	skipping to change at page 49, line 41 ¶		skipping to change at page 48, line 41 ¶
	14: continue % continue to the top of the while loop		14: continue % continue to the top of the while loop
	15: }		15: }
	16: mark(pkt) % squared mark		16: mark(pkt) % squared mark
	17: }		17: }
	18: }		18: }
	19: return(pkt) % return the packet and stop		19: return(pkt) % return the packet and stop
	20: }		20: }
	21: return(NULL) % no packet to dequeue		21: return(NULL) % no packet to dequeue
	22: }		22: }


	Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM		Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM
	(Including Code for Edge-Cases)		(Including Code for Edge-Cases)

	1: dualpi2_update(lq, cq) { % Update p' every Tupdate		1: dualpi2_update(lq, cq) { % Update p' every Tupdate
	2a: curq = max(cq.time(), lq.time()) % use greatest queuing time		2a: curq = max(cq.time(), lq.time()) % use greatest queuing time
	3: p' = p' + alpha * (curq - target) + beta * (curq - prevq)		3: p' = p' + alpha * (curq - target) + beta * (curq - prevq)
	4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor		4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor
	5: p_C = p'^2 % Classic prob = (base prob)^2		5: p_C = p'^2 % Classic prob = (base prob)^2
	6: prevq = curq		6: prevq = curq
	7: }		7: }

	Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM
			Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM
	(Including Overload Code)		(Including Overload Code)

	The choice of scheduler technology is critical to overload protection		The choice of scheduler technology is critical to overload protection
	(see Section 4.2.2).		(see Section 4.2.2).


	* A well-understood weighted scheduler such as weighted round-robin		o A well-understood weighted scheduler such as weighted round-robin
	(WRR) is recommended. As long as the scheduler weight for Classic		(WRR) is recommended. As long as the scheduler weight for Classic
	is small (e.g. 1/16), its exact value is unimportant because it		is small (e.g. 1/16), its exact value is unimportant because it
	does not normally determine capacity shares. The weight is only		does not normally determine capacity shares. The weight is only
	important to prevent unresponsive L4S traffic starving Classic		important to prevent unresponsive L4S traffic starving Classic
	traffic in the short term (see Section 4.2.2). This is because		traffic in the short term (see Section 4.2.2). This is because
	capacity sharing between the queues is normally determined by the		capacity sharing between the queues is normally determined by the
	coupled congestion signal, which overrides the scheduler, by		coupled congestion signal, which overrides the scheduler, by
	making L4S sources leave roughly equal per-flow capacity available		making L4S sources leave roughly equal per-flow capacity available
	for Classic flows.		for Classic flows.


	* Alternatively, a time-shifted FIFO (TS-FIFO) could be used. It		o Alternatively, a time-shifted FIFO (TS-FIFO) could be used. It
	works by selecting the head packet that has waited the longest,		works by selecting the head packet that has waited the longest,
	biased against the Classic traffic by a time-shift of tshift. To		biased against the Classic traffic by a time-shift of tshift. To
	implement time-shifted FIFO, the scheduler() function in line 3 of		implement time-shifted FIFO, the scheduler() function in line 3 of
	the dequeue code would simply be implemented as the scheduler()		the dequeue code would simply be implemented as the scheduler()
	function at the bottom of Figure 10 in Appendix B. For the public		function at the bottom of Figure 10 in Appendix B. For the public
	Internet a good value for tshift is 50ms. For private networks		Internet a good value for tshift is 50ms. For private networks
	with smaller diameter, about 4*target would be reasonable. TS-		with smaller diameter, about 4*target would be reasonable. TS-
	FIFO is a very simple scheduler, but complexity might need to be		FIFO is a very simple scheduler, but complexity might need to be
	added to address some deficiencies (which is why it is not		added to address some deficiencies (which is why it is not
	recommended over WRR):		recommended over WRR):


	- TS-FIFO does not fully isolate latency in the L4S queue from		* TS-FIFO does not fully isolate latency in the L4S queue from
	uncontrolled bursts in the Classic queue;		uncontrolled bursts in the Classic queue;


	- Using sojourn time for TS-FIFO is only appropriate if time-		* Using sojourn time for TS-FIFO is only appropriate if time-
	stamping of packets is feasible;		stamping of packets is feasible;


	- Even if time-stamping is supported, the sojourn time of the		* Even if time-stamping is supported, the sojourn time of the
	head packet is always stale, so a more instantaneous measure of		head packet is always stale, so a more instantaneous measure of
	queue delay could be used (see Note a in Appendix A.1).		queue delay could be used (see Note a in Appendix A.1).


	* A strict priority scheduler would be inappropriate as discussed in		o A strict priority scheduler would be inappropriate as discussed in
	Section 4.2.2.		Section 4.2.2.

	Appendix B. Example DualQ Coupled Curvy RED Algorithm		Appendix B. Example DualQ Coupled Curvy RED Algorithm

	As another example of a DualQ Coupled AQM algorithm, the pseudocode		As another example of a DualQ Coupled AQM algorithm, the pseudocode
	below gives the Curvy RED based algorithm. Although the AQM was		below gives the Curvy RED based algorithm. Although the AQM was
	designed to be efficient in integer arithmetic, to aid understanding		designed to be efficient in integer arithmetic, to aid understanding
	it is first given using floating point arithmetic (Figure 10). Then,		it is first given using floating point arithmetic (Figure 10). Then,
	one possible optimization for integer arithmetic is given, also in		one possible optimization for integer arithmetic is given, also in
	pseudocode (Figure 11). To aid comparison, the line numbers are kept		pseudocode (Figure 11). To aid comparison, the line numbers are kept
	in step between the two by using letter suffixes where the longer		in step between the two by using letter suffixes where the longer
	code needs extra lines.		code needs extra lines.

	B.1. Curvy RED in Pseudocode		B.1. Curvy RED in Pseudocode

	The pseudocode manipulates three main structures of variables: the		The pseudocode manipulates three main structures of variables: the
	packet (pkt), the L4S queue (lq) and the Classic queue (cq) and		packet (pkt), the L4S queue (lq) and the Classic queue (cq) and
	consists of the following five functions:		consists of the following five functions:


	* The initialization function cred_params_init(...) (Figure 2) that		o The initialization function cred_params_init(...) (Figure 2) that
	sets parameter defaults (the API for setting non-default values is		sets parameter defaults (the API for setting non-default values is
	omitted for brevity);		omitted for brevity);


	* The dequeue function cred_dequeue(lq, cq, pkt) (Figure 4);		o The dequeue function cred_dequeue(lq, cq, pkt) (Figure 4);


	* The scheduling function scheduler(), which selects between the		o The scheduling function scheduler(), which selects between the
	head packets of the two queues.		head packets of the two queues.

	It also uses the following functions that are either shown elsewhere,		It also uses the following functions that are either shown elsewhere,
	or not shown in full here:		or not shown in full here:


	* The enqueue function, which is identical to that used for DualPI2,		o The enqueue function, which is identical to that used for DualPI2,
	dualpi2_enqueue(lq, cq, pkt) in Figure 3;		dualpi2_enqueue(lq, cq, pkt) in Figure 3;


	* mark(pkt) and drop(pkt) for ECN-marking and dropping a packet;		o mark(pkt) and drop(pkt) for ECN-marking and dropping a packet;


	* cq.byt() or lq.byt() returns the current length (aka. backlog) of		o cq.byt() or lq.byt() returns the current length (aka. backlog) of
	the relevant queue in bytes;		the relevant queue in bytes;


	* cq.time() or lq.time() returns the current queuing delay of the		o cq.time() or lq.time() returns the current queuing delay of the
	relevant queue in units of time (see Note a in Appendix A.1).		relevant queue in units of time (see Note a in Appendix A.1).

	Because Curvy RED was evaluated before DualPI2, certain improvements		Because Curvy RED was evaluated before DualPI2, certain improvements
	introduced for DualPI2 were not evaluated for Curvy RED. In the		introduced for DualPI2 were not evaluated for Curvy RED. In the
	pseudocode below, the straightforward improvements have been added on		pseudocode below, the straightforward improvements have been added on
	the assumption they will provide similar benefits, but that has not		the assumption they will provide similar benefits, but that has not
	been proven experimentally. They are: i) a conditional priority		been proven experimentally. They are: i) a conditional priority
	scheduler instead of strict priority ii) a time-based threshold for		scheduler instead of strict priority ii) a time-based threshold for
	the native L4S AQM; iii) ECN support for the Classic AQM. A recent		the native L4S AQM; iii) ECN support for the Classic AQM. A recent
	evaluation has proved that a minimum ECN-marking threshold (minTh)		evaluation has proved that a minimum ECN-marking threshold (minTh)

	skipping to change at page 58, line 45 ¶		skipping to change at page 57, line 37 ¶
	13: continue % continue to the top of the while loop		13: continue % continue to the top of the while loop
	14: }		14: }
	15: mark(pkt)		15: mark(pkt)
	16: }		16: }
	17: }		17: }
	18: return(pkt) % return the packet and stop here		18: return(pkt) % return the packet and stop here
	19: }		19: }
	20: return(NULL) % no packet to dequeue		20: return(NULL) % no packet to dequeue
	21: }		21: }


	Figure 11: Optimised Example Dequeue Pseudocode for DualQ Coupled		Figure 11: Optimised Example Dequeue Pseudocode for DualQ Coupled AQM
	AQM using Integer Arithmetic		using Integer Arithmetic

	The two ranges, range_L and range_C are expressed as powers of 2 so		The two ranges, range_L and range_C are expressed as powers of 2 so
	that division can be implemented as a right bit-shift (>>) in lines 5		that division can be implemented as a right bit-shift (>>) in lines 5
	and 10 of the integer variant of the pseudocode (Figure 11).		and 10 of the integer variant of the pseudocode (Figure 11).

	For the integer variant of the pseudocode, an integer version of the		For the integer variant of the pseudocode, an integer version of the
	rand() function used at line 25 of the maxrand(function) in Figure 10		rand() function used at line 25 of the maxrand(function) in Figure 10
	would be arranged to return an integer in the range 0 <= maxrand() <		would be arranged to return an integer in the range 0 <= maxrand() <
	2^32 (not shown). This would scale up all the floating point		2^32 (not shown). This would scale up all the floating point
	probabilities in the range [0,1] by 2^32.		probabilities in the range [0,1] by 2^32.

	skipping to change at page 60, line 38 ¶		skipping to change at page 59, line 34 ¶
	operator needs to decide at what base RTT it wants L4S and Classic		operator needs to decide at what base RTT it wants L4S and Classic
	flows to have roughly equal throughput, once the effect of the		flows to have roughly equal throughput, once the effect of the
	additional Classic queue on Classic throughput has been taken into		additional Classic queue on Classic throughput has been taken into
	account. With this approach, a network operator can determine a good		account. With this approach, a network operator can determine a good
	coupling factor without knowing the precise L4S algorithm for		coupling factor without knowing the precise L4S algorithm for
	reducing RTT-dependence - or even in the absence of any algorithm.		reducing RTT-dependence - or even in the absence of any algorithm.

	The following additional terminology will be used, with appropriate		The following additional terminology will be used, with appropriate
	subscripts:		subscripts:


	r: Packet rate [pkt/s]		r: Packet rate [pkt/s]


	R: RTT [s/round]		R: RTT [s/round]


	p: ECN marking probability []		p: ECN marking probability []

	On the Classic side, we consider Reno as the most sensitive and		On the Classic side, we consider Reno as the most sensitive and
	therefore worst-case Classic congestion control. We will also		therefore worst-case Classic congestion control. We will also
	consider Cubic in its Reno-friendly mode ('CReno'), as the most		consider Cubic in its Reno-friendly mode ('CReno'), as the most
	prevalent congestion control, according to the references and		prevalent congestion control, according to the references and
	analysis in [PI2param]. In either case, the Classic packet rate in		analysis in [PI2param]. In either case, the Classic packet rate in
	steady state is given by the well-known square root formula for Reno		steady state is given by the well-known square root formula for Reno
	congestion control:		congestion control:

	r_C = 1.22 / (R_C * p_C^0.5) (5)		r_C = 1.22 / (R_C * p_C^0.5) (5)

	skipping to change at page 65, line 7 ¶		skipping to change at page 64, line 7 ¶
	Henderson <tomh@tomh.org> updated that earlier model and created a		Henderson <tomh@tomh.org> updated that earlier model and created a
	model for the DualQ variant specified as part of the Low Latency		model for the DualQ variant specified as part of the Low Latency
	DOCSIS specification, as well as conducting extensive evaluations.		DOCSIS specification, as well as conducting extensive evaluations.

	Ing Jyh (Inton) Tsang of Nokia, Belgium built the End-to-End Data		Ing Jyh (Inton) Tsang of Nokia, Belgium built the End-to-End Data
	Centre to the Home broadband testbed on which DualQ Coupled AQM		Centre to the Home broadband testbed on which DualQ Coupled AQM
	implementations were tested.		implementations were tested.

	Authors' Addresses		Authors' Addresses


	Koen De Schepper		Koen De Schepper
	Nokia Bell Labs		Nokia Bell Labs
	Antwerp		Antwerp
	Belgium		Belgium
	Email: koen.de_schepper@nokia.com
	URI: https://www.bell-labs.com/about/researcher-profiles/		Email: koen.de_schepper@nokia.com
	koende_schepper/		URI: https://www.bell-labs.com/about/researcher-profiles/koende_schepper/

	Bob Briscoe (editor)		Bob Briscoe (editor)
	Independent		Independent

	United Kingdom		UK

	Email: ietf@bobbriscoe.net		Email: ietf@bobbriscoe.net
	URI: https://bobbriscoe.net/		URI: https://bobbriscoe.net/

	Greg White		Greg White
	CableLabs		CableLabs

	Louisville, CO,		Louisville, CO
	United States of America		US

	Email: G.White@CableLabs.com		Email: G.White@CableLabs.com

End of changes. 125 change blocks.
	201 lines changed or deleted		192 lines changed or added
This html diff was produced by rfcdiff 1.48.