| draft-ietf-tsvwg-aqm-dualq-coupled-25.original | draft-ietf-tsvwg-aqm-dualq-coupled-26v2.txt | |||
|---|---|---|---|---|
| Transport Area working group (tsvwg) K. De Schepper | Transport Area working group (tsvwg) K. De Schepper | |||
| Internet-Draft Nokia Bell Labs | Internet-Draft Nokia Bell Labs | |||
| Intended status: Experimental B. Briscoe, Ed. | Intended status: Experimental B. Briscoe, Ed. | |||
| Expires: 2 March 2023 Independent | Expires: April 24, 2023 Independent | |||
| G. White | G. White | |||
| CableLabs | CableLabs | |||
| 29 August 2022 | October 21, 2022 | |||
| DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | |||
| (L4S) | (L4S) | |||
| draft-ietf-tsvwg-aqm-dualq-coupled-25 | draft-ietf-tsvwg-aqm-dualq-coupled-26 | |||
| Abstract | Abstract | |||
| This specification defines a framework for coupling the Active Queue | This specification defines a framework for coupling the Active Queue | |||
| Management (AQM) algorithms in two queues intended for flows with | Management (AQM) algorithms in two queues intended for flows with | |||
| different responses to congestion. This provides a way for the | different responses to congestion. This provides a way for the | |||
| Internet to transition from the scaling problems of standard TCP | Internet to transition from the scaling problems of standard TCP | |||
| Reno-friendly ('Classic') congestion controls to the family of | Reno-friendly ('Classic') congestion controls to the family of | |||
| 'Scalable' congestion controls. These are designed for consistently | 'Scalable' congestion controls. These are designed for consistently | |||
| very Low queuing Latency, very Low congestion Loss and Scaling of | very Low queuing Latency, very Low congestion Loss and Scaling of | |||
| skipping to change at page 1, line 49 ¶ | skipping to change at page 1, line 49 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on 2 March 2023. | This Internet-Draft will expire on April 24, 2023. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2022 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
| described in Section 4.e of the Trust Legal Provisions and are | include Simplified BSD License text as described in Section 4.e of | |||
| provided without warranty as described in the Revised BSD License. | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3 | 1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3 | |||
| 1.2. Context, Scope & Applicability . . . . . . . . . . . . . 6 | 1.2. Context, Scope & Applicability . . . . . . . . . . . . . 6 | |||
| 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 | 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9 | 1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11 | 2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11 | 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12 | 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12 | 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12 | |||
| 2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13 | 2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13 | |||
| 2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17 | 2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17 | |||
| 2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17 | 2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17 | |||
| 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18 | 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18 | |||
| 2.5.2. Management Requirements . . . . . . . . . . . . . . . 19 | 2.5.2. Management Requirements . . . . . . . . . . . . . . . 19 | |||
| 2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19 | 2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19 | |||
| 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 | 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 | |||
| 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 22 | 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 21 | |||
| 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 | 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 | |||
| 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 | 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 | |||
| 4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 | 4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 | |||
| 4.1. Low Delay without Requiring Per-Flow Processing . . . . . 22 | 4.1. Low Delay without Requiring Per-Flow Processing . . . . . 22 | |||
| 4.2. Handling Unresponsive Flows and Overload . . . . . . . . 23 | 4.2. Handling Unresponsive Flows and Overload . . . . . . . . 23 | |||
| 4.2.1. Unresponsive Traffic without Overload . . . . . . . . 24 | 4.2.1. Unresponsive Traffic without Overload . . . . . . . . 24 | |||
| 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S | 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S | |||
| Throughput or Delay? . . . . . . . . . . . . . . . . 25 | Throughput or Delay? . . . . . . . . . . . . . . . . 24 | |||
| 4.2.3. L4S ECN Saturation: Introduce Drop or Delay? . . . . 26 | 4.2.3. L4S ECN Saturation: Introduce Drop or Delay? . . . . 26 | |||
| 4.2.3.1. Protecting against Overload by Unresponsive | 4.2.3.1. Protecting against Overload by Unresponsive ECN- | |||
| ECN-Capable Traffic . . . . . . . . . . . . . . . . 28 | Capable Traffic . . . . . . . . . . . . . . . . . 27 | |||
| 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 | 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
| 5.1. Normative References . . . . . . . . . . . . . . . . . . 28 | 5.1. Normative References . . . . . . . . . . . . . . . . . . 28 | |||
| 5.2. Informative References . . . . . . . . . . . . . . . . . 29 | 5.2. Informative References . . . . . . . . . . . . . . . . . 28 | |||
| Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 35 | ||||
| Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 34 | ||||
| A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 35 | A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 35 | |||
| A.2. Pass #2: Edge-Case Details . . . . . . . . . . . . . . . 46 | A.2. Pass #2: Edge-Case Details . . . . . . . . . . . . . . . 45 | |||
| Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 51 | Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 50 | |||
| B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 51 | B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 50 | |||
| B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 57 | B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 56 | |||
| Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 59 | Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 58 | |||
| C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 59 | C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 58 | |||
| C.2. Guidance on Controlling Throughput Equivalence . . . . . 60 | C.2. Guidance on Controlling Throughput Equivalence . . . . . 59 | |||
| Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 64 | Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 63 | |||
| Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 64 | Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 63 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 65 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 64 | |||
| 1. Introduction | 1. Introduction | |||
| This document specifies a framework for DualQ Coupled AQMs, which can | This document specifies a framework for DualQ Coupled AQMs, which can | |||
| serve as the network part of the L4S | serve as the network part of the L4S | |||
| architecture [I-D.ietf-tsvwg-l4s-arch]. A Coupled DualQ AQM consists | architecture [I-D.ietf-tsvwg-l4s-arch]. A Coupled DualQ AQM consists | |||
| of two queues; L4S and Classic. The L4S queue is intended for | of two queues; L4S and Classic. The L4S queue is intended for | |||
| Scalable congestion controls that can maintain very low queuing | Scalable congestion controls that can maintain very low queuing | |||
| latency (sub-millisecond on average) and high throughput at the same | latency (sub-millisecond on average) and high throughput at the same | |||
| time. The Coupled DualQ acts like a semi-permeable membrane: the L4S | time. The Coupled DualQ acts like a semi-permeable membrane: the L4S | |||
| skipping to change at page 4, line 15 ¶ | skipping to change at page 4, line 16 ¶ | |||
| now it has not been possible to allow any number of low latency, high | now it has not been possible to allow any number of low latency, high | |||
| throughput applications to seek to fully utilize available capacity, | throughput applications to seek to fully utilize available capacity, | |||
| because the capacity-seeking process itself causes too much queuing | because the capacity-seeking process itself causes too much queuing | |||
| delay. | delay. | |||
| To reduce this queuing delay caused by the capacity seeking process, | To reduce this queuing delay caused by the capacity seeking process, | |||
| changes either to the network alone or to end-systems alone are in | changes either to the network alone or to end-systems alone are in | |||
| progress. L4S involves a recognition that both approaches are | progress. L4S involves a recognition that both approaches are | |||
| yielding diminishing returns: | yielding diminishing returns: | |||
| * Recent state-of-the-art active queue management (AQM) in the | o Recent state-of-the-art active queue management (AQM) in the | |||
| network, e.g. FQ-CoDel [RFC8290], PIE [RFC8033], Adaptive | network, e.g. FQ-CoDel [RFC8290], PIE [RFC8033], Adaptive | |||
| RED [ARED01] ) has reduced queuing delay for all traffic, not just | RED [ARED01] ) has reduced queuing delay for all traffic, not just | |||
| a select few applications. However, no matter how good the AQM, | a select few applications. However, no matter how good the AQM, | |||
| the capacity-seeking (sawtoothing) rate of TCP-like congestion | the capacity-seeking (sawtoothing) rate of TCP-like congestion | |||
| controls represents a lower limit that will either cause queuing | controls represents a lower limit that will either cause queuing | |||
| delay to vary or cause the link to be under-utilized. These AQMs | delay to vary or cause the link to be under-utilized. These AQMs | |||
| are tuned to allow a typical capacity-seeking Reno-friendly flow | are tuned to allow a typical capacity-seeking Reno-friendly flow | |||
| to induce an average queue that roughly doubles the base RTT, | to induce an average queue that roughly doubles the base RTT, | |||
| adding 5-15 ms of queuing on average (cf. 500 microseconds with | adding 5-15 ms of queuing on average (cf. 500 microseconds with | |||
| L4S for the same mix of long-running and web traffic). However, | L4S for the same mix of long-running and web traffic). However, | |||
| for many applications low delay is not useful unless it is | for many applications low delay is not useful unless it is | |||
| consistently low. With these AQMs, 99th percentile queuing delay | consistently low. With these AQMs, 99th percentile queuing delay | |||
| is 20-30 ms (cf. 2 ms with the same traffic over L4S). | is 20-30 ms (cf. 2 ms with the same traffic over L4S). | |||
| * Similarly, recent research into using e2e congestion control | o Similarly, recent research into using e2e congestion control | |||
| without needing an AQM in the network (e.g. BBR | without needing an AQM in the network (e.g. BBR | |||
| [I-D.cardwell-iccrg-bbr-congestion-control]) seems to have hit a | [I-D.cardwell-iccrg-bbr-congestion-control]) seems to have hit a | |||
| similar lower limit to queuing delay of about 20ms on average, but | similar lower limit to queuing delay of about 20ms on average, but | |||
| there are also regular 25ms delay spikes due to bandwidth probes | there are also regular 25ms delay spikes due to bandwidth probes | |||
| and 60ms spikes due to flow-starts. | and 60ms spikes due to flow-starts. | |||
| L4S learns from the experience of Data Center TCP [RFC8257], which | L4S learns from the experience of Data Center TCP [RFC8257], which | |||
| shows the power of complementary changes both in the network and on | shows the power of complementary changes both in the network and on | |||
| end-systems. DCTCP teaches us that two small but radical changes to | end-systems. DCTCP teaches us that two small but radical changes to | |||
| congestion control are needed to cut the two major outstanding causes | congestion control are needed to cut the two major outstanding causes | |||
| skipping to change at page 7, line 14 ¶ | skipping to change at page 7, line 14 ¶ | |||
| For the public Internet, nearly all the benefit will typically be | For the public Internet, nearly all the benefit will typically be | |||
| achieved by deploying the Coupled AQM into either end of the access | achieved by deploying the Coupled AQM into either end of the access | |||
| link between a 'site' and the Internet, which is invariably the | link between a 'site' and the Internet, which is invariably the | |||
| bottleneck (see section 6.4 of[I-D.ietf-tsvwg-l4s-arch] about | bottleneck (see section 6.4 of[I-D.ietf-tsvwg-l4s-arch] about | |||
| deployment, which also defines the term 'site' to mean a home, an | deployment, which also defines the term 'site' to mean a home, an | |||
| office, a campus or mobile user equipment). | office, a campus or mobile user equipment). | |||
| Latency is not the only concern of L4S: | Latency is not the only concern of L4S: | |||
| * The "Low Loss" part of the name denotes that L4S generally | o The "Low Loss" part of the name denotes that L4S generally | |||
| achieves zero congestion loss (which would otherwise cause | achieves zero congestion loss (which would otherwise cause | |||
| retransmission delays), due to its use of ECN. | retransmission delays), due to its use of ECN. | |||
| * The "Scalable throughput" part of the name denotes that the per- | o The "Scalable throughput" part of the name denotes that the per- | |||
| flow throughput of Scalable congestion controls should scale | flow throughput of Scalable congestion controls should scale | |||
| indefinitely, avoiding the imminent scaling problems with 'TCP- | indefinitely, avoiding the imminent scaling problems with 'TCP- | |||
| Friendly' congestion control algorithms [RFC3649]. | Friendly' congestion control algorithms [RFC3649]. | |||
| The former is clearly in scope of this AQM document. However, the | The former is clearly in scope of this AQM document. However, the | |||
| latter is an outcome of the end-system behaviour, and therefore | latter is an outcome of the end-system behaviour, and therefore | |||
| outside the scope of this AQM document, even though the AQM is an | outside the scope of this AQM document, even though the AQM is an | |||
| enabler. | enabler. | |||
| The overall L4S architecture [I-D.ietf-tsvwg-l4s-arch] gives more | The overall L4S architecture [I-D.ietf-tsvwg-l4s-arch] gives more | |||
| detail, including on wider deployment aspects such as backwards | detail, including on wider deployment aspects such as backwards | |||
| compatibility of Scalable congestion controls in bottlenecks where a | compatibility of Scalable congestion controls in bottlenecks where a | |||
| DualQ Coupled AQM has not been deployed. The supporting papers | DualQ Coupled AQM has not been deployed. The supporting papers | |||
| [DualPI2Linux], [PI2], [DCttH19] and [PI2param] give the full | [L4Seval22], [DualPI2Linux], [PI2] and [PI2param] give the full | |||
| rationale for the AQM's design, both discursively and in more precise | rationale for the AQM's design, both discursively and in more precise | |||
| mathematical form, as well as the results of performance evaluations. | mathematical form, as well as the results of performance evaluations. | |||
| The main results have been validated independently when using the | The main results have been validated independently when using the | |||
| Prague congestion control [Boru20] (experiments are run using Prague | Prague congestion control [Boru20] (experiments are run using Prague | |||
| and DCTCP, but only the former are relevant for validation, because | and DCTCP, but only the former are relevant for validation, because | |||
| Prague fixes a number of problems with the Linux DCTCP code that make | Prague fixes a number of problems with the Linux DCTCP code that make | |||
| it unsuitable for the public Internet). | it unsuitable for the public Internet). | |||
| 1.3. Terminology | 1.3. Terminology | |||
| skipping to change at page 8, line 46 ¶ | skipping to change at page 8, line 48 ¶ | |||
| averages 2 congestion signals per round-trip whatever the flow | averages 2 congestion signals per round-trip whatever the flow | |||
| rate, as do other recently developed scalable congestion controls, | rate, as do other recently developed scalable congestion controls, | |||
| e.g. Relentless TCP [I-D.mathis-iccrg-relentless-tcp], TCP Prague | e.g. Relentless TCP [I-D.mathis-iccrg-relentless-tcp], TCP Prague | |||
| [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux], | [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux], | |||
| BBRv2 [BBRv2], [I-D.cardwell-iccrg-bbr-congestion-control] and the | BBRv2 [BBRv2], [I-D.cardwell-iccrg-bbr-congestion-control] and the | |||
| L4S variant of SCREAM for real-time media [SCReAM], [RFC8298]). | L4S variant of SCREAM for real-time media [SCReAM], [RFC8298]). | |||
| For the public Internet a Scalable transport has to comply with | For the public Internet a Scalable transport has to comply with | |||
| the requirements in Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id] | the requirements in Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id] | |||
| (aka. the 'Prague L4S requirements'). | (aka. the 'Prague L4S requirements'). | |||
| C: Abbreviation for Classic, e.g. when used as a subscript. | C: Abbreviation for Classic, e.g. when used as a subscript. | |||
| L: Abbreviation for L4S, e.g. when used as a subscript. | L: Abbreviation for L4S, e.g. when used as a subscript. | |||
| The terms Classic or L4S can also qualify other nouns, such as | The terms Classic or L4S can also qualify other nouns, such as | |||
| 'codepoint', 'identifier', 'classification', 'packet', 'flow'. | 'codepoint', 'identifier', 'classification', 'packet', 'flow'. | |||
| For example: an L4S packet means a packet with an L4S identifier | For example: an L4S packet means a packet with an L4S identifier | |||
| sent from an L4S congestion control. | sent from an L4S congestion control. | |||
| Both Classic and L4S services can cope with a proportion of | Both Classic and L4S services can cope with a proportion of | |||
| unresponsive or less-responsive traffic as well, but in the L4S | unresponsive or less-responsive traffic as well, but in the L4S | |||
| case its rate has to be smooth enough or low enough not to build a | case its rate has to be smooth enough or low enough not to build a | |||
| queue (e.g. DNS, VoIP, game sync datagrams, etc.). The DualQ | queue (e.g. DNS, VoIP, game sync datagrams, etc.). The DualQ | |||
| skipping to change at page 10, line 6 ¶ | skipping to change at page 10, line 6 ¶ | |||
| traffic. | traffic. | |||
| Thousands of tests have been conducted in a typical fixed residential | Thousands of tests have been conducted in a typical fixed residential | |||
| broadband setting. Experiments used a range of base round trip | broadband setting. Experiments used a range of base round trip | |||
| delays up to 100ms and link rates up to 200 Mb/s between the data | delays up to 100ms and link rates up to 200 Mb/s between the data | |||
| centre and home network, with varying amounts of background traffic | centre and home network, with varying amounts of background traffic | |||
| in both queues. For every L4S packet, the AQM kept the average | in both queues. For every L4S packet, the AQM kept the average | |||
| queuing delay below 1ms (or 2 packets where serialization delay | queuing delay below 1ms (or 2 packets where serialization delay | |||
| exceeded 1ms on slower links), with 99th percentile no worse than | exceeded 1ms on slower links), with 99th percentile no worse than | |||
| 2ms. No losses at all were introduced by the L4S AQM. Details of | 2ms. No losses at all were introduced by the L4S AQM. Details of | |||
| the extensive experiments are available [DualPI2Linux], [PI2], | the extensive experiments are available [L4Seval22], [DualPI2Linux]. | |||
| [DCttH19]. Subjective testing using very demanding high bandwidth | Subjective testing using very demanding high bandwidth low latency | |||
| low latency applications over a single shared access link is also | applications over a single shared access link is also described | |||
| described in [L4Sdemo16] and summarized in the section about | in [L4Sdemo16] and summarized in the section about applications in | |||
| applications in the L4S architecture [I-D.ietf-tsvwg-l4s-arch] . | the L4S architecture [I-D.ietf-tsvwg-l4s-arch] . | |||
| In all these experiments, the host was connected to the home network | In all these experiments, the host was connected to the home network | |||
| by fixed Ethernet, in order to quantify the queuing delay that can be | by fixed Ethernet, in order to quantify the queuing delay that can be | |||
| achieved by a user who cares about delay. It should be emphasized | achieved by a user who cares about delay. It should be emphasized | |||
| that L4S support at the bottleneck link cannot 'undelay' bursts | that L4S support at the bottleneck link cannot 'undelay' bursts | |||
| introduced by another link on the path, for instance by legacy Wi-Fi | introduced by another link on the path, for instance by legacy Wi-Fi | |||
| equipment. However, if L4S support is added to the queue feeding the | equipment. However, if L4S support is added to the queue feeding the | |||
| _outgoing_ WAN link of a home gateway, it would be counterproductive | _outgoing_ WAN link of a home gateway, it would be counterproductive | |||
| not to also reduce the burstiness of the _incoming_ Wi-Fi. Also, | not to also reduce the burstiness of the _incoming_ Wi-Fi. Also, | |||
| trials of Wi-Fi equipment with an L4S DualQ Coupled AQM on the | trials of Wi-Fi equipment with an L4S DualQ Coupled AQM on the | |||
| skipping to change at page 10, line 37 ¶ | skipping to change at page 10, line 37 ¶ | |||
| achieve low delay. The L4S queue can be filled with a heavy load of | achieve low delay. The L4S queue can be filled with a heavy load of | |||
| capacity-seeking flows (TCP Prague etc.) and still achieve low delay. | capacity-seeking flows (TCP Prague etc.) and still achieve low delay. | |||
| The L4S queue does not rely on the presence of other traffic in the | The L4S queue does not rely on the presence of other traffic in the | |||
| Classic queue that can be 'overtaken'. It gives low latency to L4S | Classic queue that can be 'overtaken'. It gives low latency to L4S | |||
| traffic whether or not there is Classic traffic. The tail latency of | traffic whether or not there is Classic traffic. The tail latency of | |||
| traffic served by the Classic AQM is sometimes a little better | traffic served by the Classic AQM is sometimes a little better | |||
| sometimes a little worse, when a proportion of the traffic is L4S. | sometimes a little worse, when a proportion of the traffic is L4S. | |||
| The two queues are only necessary because: | The two queues are only necessary because: | |||
| * the large variations (sawteeth) of Classic flows need roughly a | o the large variations (sawteeth) of Classic flows need roughly a | |||
| base RTT of queuing delay to ensure full utilization | base RTT of queuing delay to ensure full utilization | |||
| * Scalable flows do not need a queue to keep utilization high, but | o Scalable flows do not need a queue to keep utilization high, but | |||
| they cannot keep latency predictably low if they are mixed with | they cannot keep latency predictably low if they are mixed with | |||
| Classic traffic, | Classic traffic, | |||
| The L4S queue has latency priority within sub-round trip timescales, | The L4S queue has latency priority within sub-round trip timescales, | |||
| but over longer periods the coupling from the Classic to the L4S AQM | but over longer periods the coupling from the Classic to the L4S AQM | |||
| (explained below) ensures that it does not have bandwidth priority | (explained below) ensures that it does not have bandwidth priority | |||
| over the Classic queue. | over the Classic queue. | |||
| 2. DualQ Coupled AQM | 2. DualQ Coupled AQM | |||
| There are two main aspects to the approach: | There are two main aspects to the approach: | |||
| * The Coupled AQM that addresses throughput equivalence between | o The Coupled AQM that addresses throughput equivalence between | |||
| Classic (e.g. Reno, Cubic) flows and L4S flows (that satisfy the | Classic (e.g. Reno, Cubic) flows and L4S flows (that satisfy the | |||
| Prague L4S requirements). | Prague L4S requirements). | |||
| * The Dual Queue structure that provides latency separation for L4S | o The Dual Queue structure that provides latency separation for L4S | |||
| flows to isolate them from the typically large Classic queue. | flows to isolate them from the typically large Classic queue. | |||
| 2.1. Coupled AQM | 2.1. Coupled AQM | |||
| In the 1990s, the `TCP formula' was derived for the relationship | In the 1990s, the `TCP formula' was derived for the relationship | |||
| between the steady-state congestion window, cwnd, and the drop | between the steady-state congestion window, cwnd, and the drop | |||
| probability, p of standard Reno congestion control [RFC5681]. To a | probability, p of standard Reno congestion control [RFC5681]. To a | |||
| first order approximation, the steady-state cwnd of Reno is inversely | first order approximation, the steady-state cwnd of Reno is inversely | |||
| proportional to the square root of p. | proportional to the square root of p. | |||
| skipping to change at page 15, line 28 ¶ | skipping to change at page 15, line 28 ¶ | |||
| `----------'\\ | AQM |---->: ,'|`-.___.-' | `----------'\\ | AQM |---->: ,'|`-.___.-' | |||
| \\ | |p' | <' | | \\ | |p' | <' | | |||
| \\ `-------' (p'^2) //`' | \\ `-------' (p'^2) //`' | |||
| \\ ^ | // | \\ ^ | // | |||
| \\,. | v p_C // | \\,. | v p_C // | |||
| < | _________ .------.// | < | _________ .------.// | |||
| `\| | | | Drop |/ | `\| | | | Drop |/ | |||
| Classic (C) |queue |===>|/mark | | Classic (C) |queue |===>|/mark | | |||
| __|______| `------' | __|______| `------' | |||
| Figure 1: DualQ Coupled AQM Schematic | ||||
| Legend: ===> traffic flow; ---> control dependency. | Legend: ===> traffic flow; ---> control dependency. | |||
| Figure 1: DualQ Coupled AQM Schematic | ||||
| After the AQMs have applied their dropping or marking, the scheduler | After the AQMs have applied their dropping or marking, the scheduler | |||
| forwards their packets to the link. Even though the scheduler gives | forwards their packets to the link. Even though the scheduler gives | |||
| priority to the L queue, it is not as strong as the coupling from the | priority to the L queue, it is not as strong as the coupling from the | |||
| C queue. This is because, as the C queue grows, the base AQM applies | C queue. This is because, as the C queue grows, the base AQM applies | |||
| more congestion signals to L traffic (as well as C). As L flows | more congestion signals to L traffic (as well as C). As L flows | |||
| reduce their rate in response, they use less than the scheduling | reduce their rate in response, they use less than the scheduling | |||
| share for L traffic. So, because the scheduler is work preserving, | share for L traffic. So, because the scheduler is work preserving, | |||
| it schedules any C traffic in the gaps. | it schedules any C traffic in the gaps. | |||
| Giving priority to the L queue has the benefit of very low L queue | Giving priority to the L queue has the benefit of very low L queue | |||
| skipping to change at page 18, line 41 ¶ | skipping to change at page 18, line 41 ¶ | |||
| 2.5.1.1. Requirements in Unexpected Cases | 2.5.1.1. Requirements in Unexpected Cases | |||
| The flexibility to allow operator-specific classifiers (Section 2.3) | The flexibility to allow operator-specific classifiers (Section 2.3) | |||
| leads to the need to specify what the AQM in each queue ought to do | leads to the need to specify what the AQM in each queue ought to do | |||
| with packets that do not carry the ECN field expected for that queue. | with packets that do not carry the ECN field expected for that queue. | |||
| It is expected that the AQM in each queue will inspect the ECN field | It is expected that the AQM in each queue will inspect the ECN field | |||
| to determine what sort of congestion notification to signal, then it | to determine what sort of congestion notification to signal, then it | |||
| will decide whether to apply congestion notification to this | will decide whether to apply congestion notification to this | |||
| particular packet, as follows: | particular packet, as follows: | |||
| * If a packet that does not carry an ECT(1) or CE codepoint is | o If a packet that does not carry an ECT(1) or CE codepoint is | |||
| classified into the L queue: | classified into the L queue: | |||
| - if the packet is ECT(0), the L AQM SHOULD apply CE-marking | * if the packet is ECT(0), the L AQM SHOULD apply CE-marking | |||
| using a probability appropriate to Classic congestion control | using a probability appropriate to Classic congestion control | |||
| and appropriate to the target delay in the L queue | and appropriate to the target delay in the L queue | |||
| - if the packet is Not-ECT, the appropriate action depends on | * if the packet is Not-ECT, the appropriate action depends on | |||
| whether some other function is protecting the L queue from | whether some other function is protecting the L queue from | |||
| misbehaving flows (e.g. per-flow queue protection | misbehaving flows (e.g. per-flow queue protection | |||
| [I-D.briscoe-docsis-q-protection] or latency policing): | [I-D.briscoe-docsis-q-protection] or latency policing): | |||
| o If separate queue protection is provided, the L AQM SHOULD | + If separate queue protection is provided, the L AQM SHOULD | |||
| ignore the packet and forward it unchanged, meaning it | ignore the packet and forward it unchanged, meaning it | |||
| should not calculate whether to apply congestion | should not calculate whether to apply congestion | |||
| notification and it should neither drop nor CE-mark the | notification and it should neither drop nor CE-mark the | |||
| packet (for instance, the operator might classify EF traffic | packet (for instance, the operator might classify EF traffic | |||
| that is unresponsive to drop into the L queue, alongside | that is unresponsive to drop into the L queue, alongside | |||
| responsive L4S-ECN traffic) | responsive L4S-ECN traffic) | |||
| o if separate queue protection is not provided, the L AQM | + if separate queue protection is not provided, the L AQM | |||
| SHOULD apply drop using a drop probability appropriate to | SHOULD apply drop using a drop probability appropriate to | |||
| Classic congestion control and appropriate to the target | Classic congestion control and appropriate to the target | |||
| delay in the L queue | delay in the L queue | |||
| * If a packet that carries an ECT(1) codepoint is classified into | o If a packet that carries an ECT(1) codepoint is classified into | |||
| the C queue: | the C queue: | |||
| - the C AQM SHOULD apply CE-marking using the coupled AQM | * the C AQM SHOULD apply CE-marking using the coupled AQM | |||
| probability p_CL (= k*p'). | probability p_CL (= k*p'). | |||
| The above requirements are worded as "SHOULDs", because operator- | The above requirements are worded as "SHOULDs", because operator- | |||
| specific classifiers are for flexibility, by definition. Therefore, | specific classifiers are for flexibility, by definition. Therefore, | |||
| alternative actions might be appropriate in the operator's specific | alternative actions might be appropriate in the operator's specific | |||
| circumstances. An example would be where the operator knows that | circumstances. An example would be where the operator knows that | |||
| certain legacy traffic marked with one codepoint actually has a | certain legacy traffic marked with one codepoint actually has a | |||
| congestion response associated with another codepoint. | congestion response associated with another codepoint. | |||
| If the DualQ Coupled AQM has detected overload, it MUST introduce | If the DualQ Coupled AQM has detected overload, it MUST introduce | |||
| skipping to change at page 20, line 5 ¶ | skipping to change at page 19, line 47 ¶ | |||
| 2.5.2. Management Requirements | 2.5.2. Management Requirements | |||
| 2.5.2.1. Configuration | 2.5.2.1. Configuration | |||
| By default, a DualQ Coupled AQM SHOULD NOT need any configuration for | By default, a DualQ Coupled AQM SHOULD NOT need any configuration for | |||
| use at a bottleneck on the public Internet [RFC7567]. The following | use at a bottleneck on the public Internet [RFC7567]. The following | |||
| parameters MAY be operator-configurable, e.g. to tune for non- | parameters MAY be operator-configurable, e.g. to tune for non- | |||
| Internet settings: | Internet settings: | |||
| * Optional packet classifier(s) to use in addition to the ECN field | o Optional packet classifier(s) to use in addition to the ECN field | |||
| (see Section 2.3); | (see Section 2.3); | |||
| * Expected typical RTT, which can be used to determine the queuing | o Expected typical RTT, which can be used to determine the queuing | |||
| delay of the Classic AQM at its operating point, in order to | delay of the Classic AQM at its operating point, in order to | |||
| prevent typical lone flows from under-utilizing capacity. For | prevent typical lone flows from under-utilizing capacity. For | |||
| example: | example: | |||
| - for the PI2 algorithm (Appendix A) the queuing delay target is | * for the PI2 algorithm (Appendix A) the queuing delay target is | |||
| dependent on the typical RTT; | dependent on the typical RTT; | |||
| - for the Curvy RED algorithm (Appendix B) the queuing delay at | * for the Curvy RED algorithm (Appendix B) the queuing delay at | |||
| the desired operating point of the curvy ramp is configured to | the desired operating point of the curvy ramp is configured to | |||
| encompass a typical RTT; | encompass a typical RTT; | |||
| - if another Classic AQM was used, it would be likely to need an | * if another Classic AQM was used, it would be likely to need an | |||
| operating point for the queue based on the typical RTT, and if | operating point for the queue based on the typical RTT, and if | |||
| so it SHOULD be expressed in units of time. | so it SHOULD be expressed in units of time. | |||
| An operating point that is manually calculated might be directly | An operating point that is manually calculated might be directly | |||
| configurable instead, e.g. for links with large numbers of flows | configurable instead, e.g. for links with large numbers of flows | |||
| where under-utilization by a single flow would be unlikely. | where under-utilization by a single flow would be unlikely. | |||
| * Expected maximum RTT, which can be used to set the stability | o Expected maximum RTT, which can be used to set the stability | |||
| parameter(s) of the Classic AQM. For example: | parameter(s) of the Classic AQM. For example: | |||
| - for the PI2 algorithm (Appendix A), the gain parameters of the | * for the PI2 algorithm (Appendix A), the gain parameters of the | |||
| PI algorithm depend on the maximum RTT. | PI algorithm depend on the maximum RTT. | |||
| - for the Curvy RED algorithm (Appendix B) the smoothing | * for the Curvy RED algorithm (Appendix B) the smoothing | |||
| parameter is chosen to filter out transients in the queue | parameter is chosen to filter out transients in the queue | |||
| within a maximum RTT. | within a maximum RTT. | |||
| Stability parameter(s) that are manually calculated assuming a | Stability parameter(s) that are manually calculated assuming a | |||
| maximum RTT might be directly configurable instead. | maximum RTT might be directly configurable instead. | |||
| * Coupling factor, k (see Appendix C.2); | o Coupling factor, k (see Appendix C.2); | |||
| * A limit to the conditional priority of L4S. This is scheduler- | o A limit to the conditional priority of L4S. This is scheduler- | |||
| dependent, but it SHOULD be expressed as a relation between the | dependent, but it SHOULD be expressed as a relation between the | |||
| max delay of a C packet and an L packet. For example: | max delay of a C packet and an L packet. For example: | |||
| - for a WRR scheduler a weight ratio between L and C of w:1 means | * for a WRR scheduler a weight ratio between L and C of w:1 means | |||
| that the maximum delay to a C packet is w times that of an L | that the maximum delay to a C packet is w times that of an L | |||
| packet. | packet. | |||
| - for a time-shifted FIFO (TS-FIFO) scheduler (see Section 4.2.2) | * for a time-shifted FIFO (TS-FIFO) scheduler (see Section 4.2.2) | |||
| a time-shift of tshift means that the maximum delay to a C | a time-shift of tshift means that the maximum delay to a C | |||
| packet is tshift greater than that of an L packet. tshift could | packet is tshift greater than that of an L packet. tshift could | |||
| be expressed as a multiple of the typical RTT rather than as an | be expressed as a multiple of the typical RTT rather than as an | |||
| absolute delay. | absolute delay. | |||
| * The maximum Classic ECN marking probability, p_Cmax, before | o The maximum Classic ECN marking probability, p_Cmax, before | |||
| introducing drop. | introducing drop. | |||
| 2.5.2.2. Monitoring | 2.5.2.2. Monitoring | |||
| An experimental DualQ Coupled AQM SHOULD allow the operator to | An experimental DualQ Coupled AQM SHOULD allow the operator to | |||
| monitor each of the following operational statistics on demand, per | monitor each of the following operational statistics on demand, per | |||
| queue and per configurable sample interval, for performance | queue and per configurable sample interval, for performance | |||
| monitoring and perhaps also for accounting in some cases: | monitoring and perhaps also for accounting in some cases: | |||
| * Bits forwarded, from which utilization can be calculated; | o Bits forwarded, from which utilization can be calculated; | |||
| * Total packets in the three categories: arrived, presented to the | o Total packets in the three categories: arrived, presented to the | |||
| AQM, and forwarded. The difference between the first two will | AQM, and forwarded. The difference between the first two will | |||
| measure any non-AQM tail discard. The difference between the last | measure any non-AQM tail discard. The difference between the last | |||
| two will measure proactive AQM discard; | two will measure proactive AQM discard; | |||
| * ECN packets marked, non-ECN packets dropped, ECN packets dropped, | o ECN packets marked, non-ECN packets dropped, ECN packets dropped, | |||
| which can be combined with the three total packet counts above to | which can be combined with the three total packet counts above to | |||
| calculate marking and dropping probabilities; | calculate marking and dropping probabilities; | |||
| * Queue delay (not including serialization delay of the head packet | o Queue delay (not including serialization delay of the head packet | |||
| or medium acquisition delay) - see further notes below. | or medium acquisition delay) - see further notes below. | |||
| Unlike the other statistics, queue delay cannot be captured in a | Unlike the other statistics, queue delay cannot be captured in a | |||
| simple accumulating counter. Therefore, the type of queue delay | simple accumulating counter. Therefore, the type of queue delay | |||
| statistics produced (mean, percentiles, etc.) will depend on | statistics produced (mean, percentiles, etc.) will depend on | |||
| implementation constraints. To facilitate comparative evaluation | implementation constraints. To facilitate comparative evaluation | |||
| of different implementations and approaches, an implementation | of different implementations and approaches, an implementation | |||
| SHOULD allow mean and 99th percentile queue delay to be derived | SHOULD allow mean and 99th percentile queue delay to be derived | |||
| (per queue per sample interval). A relatively simple way to do | (per queue per sample interval). A relatively simple way to do | |||
| this would be to store a coarse-grained histogram of queue delay. | this would be to store a coarse-grained histogram of queue delay. | |||
| skipping to change at page 22, line 10 ¶ | skipping to change at page 21, line 49 ¶ | |||
| a sample interval, each bin would accumulate a count of the number | a sample interval, each bin would accumulate a count of the number | |||
| of packets that had fallen within each range. The maximum queue | of packets that had fallen within each range. The maximum queue | |||
| delay per queue per interval MAY also be recorded, to aid | delay per queue per interval MAY also be recorded, to aid | |||
| diagnosis of faults and anomalous events. | diagnosis of faults and anomalous events. | |||
| 2.5.2.3. Anomaly Detection | 2.5.2.3. Anomaly Detection | |||
| An experimental DualQ Coupled AQM SHOULD asynchronously report the | An experimental DualQ Coupled AQM SHOULD asynchronously report the | |||
| following data about anomalous conditions: | following data about anomalous conditions: | |||
| * Start-time and duration of overload state. | o Start-time and duration of overload state. | |||
| A hysteresis mechanism SHOULD be used to prevent flapping in and | A hysteresis mechanism SHOULD be used to prevent flapping in and | |||
| out of overload causing an event storm. For instance, exit from | out of overload causing an event storm. For instance, exit from | |||
| overload state could trigger one report, but also latch a timer. | overload state could trigger one report, but also latch a timer. | |||
| Then, during that time, if the AQM enters and exits overload state | Then, during that time, if the AQM enters and exits overload state | |||
| any number of times, the duration in overload state is | any number of times, the duration in overload state is | |||
| accumulated, but no new report is generated until the first time | accumulated, but no new report is generated until the first time | |||
| the AQM is out of overload once the timer has expired. | the AQM is out of overload once the timer has expired. | |||
| 2.5.2.4. Deployment, Coexistence and Scaling | 2.5.2.4. Deployment, Coexistence and Scaling | |||
| skipping to change at page 24, line 5 ¶ | skipping to change at page 23, line 38 ¶ | |||
| A trade-off needs to be made between complexity and the risk of | A trade-off needs to be made between complexity and the risk of | |||
| either traffic class harming the other. In overloaded conditions the | either traffic class harming the other. In overloaded conditions the | |||
| higher priority L4S service will have to sacrifice some aspect of its | higher priority L4S service will have to sacrifice some aspect of its | |||
| performance. Depending on the degree of overload, alternative | performance. Depending on the degree of overload, alternative | |||
| solutions may relax a different factor: e.g. throughput, delay, drop. | solutions may relax a different factor: e.g. throughput, delay, drop. | |||
| These choices need to be made either by the developer or by operator | These choices need to be made either by the developer or by operator | |||
| policy, rather than by the IETF. Subsequent subsections discuss | policy, rather than by the IETF. Subsequent subsections discuss | |||
| aspects relating to handling of different degrees of overload: | aspects relating to handling of different degrees of overload: | |||
| * Unresponsive flows (L and/or C) but not overloaded, i.e. the sum | o Unresponsive flows (L and/or C) but not overloaded, i.e. the sum | |||
| of unresponsive load before adding any responsive traffic is below | of unresponsive load before adding any responsive traffic is below | |||
| capacity; | capacity; | |||
| This case is handled by the regular Coupled DualQ (Section 2.1) | This case is handled by the regular Coupled DualQ (Section 2.1) | |||
| but not discussed there. So below, Section 4.2.1 explains the | but not discussed there. So below, Section 4.2.1 explains the | |||
| design goal, and how it is achieved in practice; | design goal, and how it is achieved in practice; | |||
| * Unresponsive flows (L and/or C) causing persistent overload, | o Unresponsive flows (L and/or C) causing persistent overload, | |||
| i.e. the sum of unresponsive load even before adding any | i.e. the sum of unresponsive load even before adding any | |||
| responsive traffic persistently exceeds capacity; | responsive traffic persistently exceeds capacity; | |||
| This case is not covered by the regular Coupled DualQ mechanism | This case is not covered by the regular Coupled DualQ mechanism | |||
| (Section 2.1) but the last para in Section 2.5.1.1 sets out a | (Section 2.1) but the last para in Section 2.5.1.1 sets out a | |||
| requirement to handle the case where ECN-capable traffic could | requirement to handle the case where ECN-capable traffic could | |||
| starve non-ECN-capable traffic. Section 4.2.3 below discusses | starve non-ECN-capable traffic. Section 4.2.3 below discusses | |||
| the general options and gives specific examples. | the general options and gives specific examples. | |||
| * Short-term overload that lies between the 'not overloaded' and | o Short-term overload that lies between the 'not overloaded' and | |||
| 'persistently overloaded' cases. | 'persistently overloaded' cases. | |||
| For the period before overload is deemed persistent, | For the period before overload is deemed persistent, | |||
| Section 4.2.2 discusses options for more immediate mechanisms | Section 4.2.2 discusses options for more immediate mechanisms | |||
| at the scheduler timescale. These prevent short-term | at the scheduler timescale. These prevent short-term | |||
| starvation of the C queue by making the priority of the L queue | starvation of the C queue by making the priority of the L queue | |||
| conditional, as required in Section 2.5.1. | conditional, as required in Section 2.5.1. | |||
| 4.2.1. Unresponsive Traffic without Overload | 4.2.1. Unresponsive Traffic without Overload | |||
| When one or more L flows and/or C flows are unresponsive, but their | When one or more L flows and/or C flows are unresponsive, but their | |||
| total load is within the link capacity so that they do not saturate | total load is within the link capacity so that they do not saturate | |||
| the coupled marking (below 100%), the goal of a DualQ AQM is to | the coupled marking (below 100%), the goal of a DualQ AQM is to | |||
| behave no worse than a single-queue AQM. | behave no worse than a single-queue AQM. | |||
| Tests have shown that this is indeed the case with no additional | Tests have shown that this is indeed the case with no additional | |||
| mechanism beyond the regular Coupled DualQ of Section 2.1 (see the | mechanism beyond the regular Coupled DualQ of Section 2.1 (see the | |||
| results of 'overload experiments' in [DCttH19]). Perhaps counter- | results of 'overload experiments' in [L4Seval22]). Perhaps counter- | |||
| intuitively, whether the unresponsive flow classifies itself into the | intuitively, whether the unresponsive flow classifies itself into the | |||
| L or the C queue, the DualQ system behaves as if it has subtracted | L or the C queue, the DualQ system behaves as if it has subtracted | |||
| from the overall link capacity. Then, the coupling shares out the | from the overall link capacity. Then, the coupling shares out the | |||
| remaining capacity between any competing responsive flows (in either | remaining capacity between any competing responsive flows (in either | |||
| queue). See also Section 4.2.2, which discusses scheduler-specific | queue). See also Section 4.2.2, which discusses scheduler-specific | |||
| details. | details. | |||
| 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S Throughput | 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S Throughput | |||
| or Delay? | or Delay? | |||
| skipping to change at page 25, line 19 ¶ | skipping to change at page 24, line 47 ¶ | |||
| Section 2.5.1) to avoid short-term starvation of Classic. Otherwise, | Section 2.5.1) to avoid short-term starvation of Classic. Otherwise, | |||
| as explained in Section 2.4, even a lone responsive L4S flow could | as explained in Section 2.4, even a lone responsive L4S flow could | |||
| temporarily block a small finite set of C packets (e.g. an initial | temporarily block a small finite set of C packets (e.g. an initial | |||
| window or DNS request). The blockage would only be brief, but it | window or DNS request). The blockage would only be brief, but it | |||
| could be longer for certain AQM implementations that can only | could be longer for certain AQM implementations that can only | |||
| increase the congestion signal coupled from the C queue when C | increase the congestion signal coupled from the C queue when C | |||
| packets are actually being dequeued. There is then the question of | packets are actually being dequeued. There is then the question of | |||
| whether to sacrifice L4S throughput or L4S delay (or some other | whether to sacrifice L4S throughput or L4S delay (or some other | |||
| policy) to make the priority conditional: | policy) to make the priority conditional: | |||
| Sacrifice L4S throughput: By using weighted round-robin as the | Sacrifice L4S throughput: By using weighted round-robin as the | |||
| conditional priority scheduler, the L4S service can sacrifice some | conditional priority scheduler, the L4S service can sacrifice some | |||
| throughput during overload. This can either be thought of as | throughput during overload. This can either be thought of as | |||
| guaranteeing a minimum throughput service for Classic traffic, or | guaranteeing a minimum throughput service for Classic traffic, or | |||
| as guaranteeing a maximum delay for a packet at the head of the | as guaranteeing a maximum delay for a packet at the head of the | |||
| Classic queue. | Classic queue. | |||
| Cautionary note: a WRR scheduler can only guarantee Classic | Cautionary note: a WRR scheduler can only guarantee Classic | |||
| throughput if Classic sources are sending enough to use it -- | throughput if Classic sources are sending enough to use it -- | |||
| congestion signals can undermine scheduling because they determine | congestion signals can undermine scheduling because they determine | |||
| how much responsive traffic of each class arrives for scheduling | how much responsive traffic of each class arrives for scheduling | |||
| skipping to change at page 28, line 30 ¶ | skipping to change at page 28, line 10 ¶ | |||
| As an example, experiments with the DualPI2 AQM (Appendix A) have | As an example, experiments with the DualPI2 AQM (Appendix A) have | |||
| shown that introducing 'drop on saturation' at 100% coupled L4S | shown that introducing 'drop on saturation' at 100% coupled L4S | |||
| marking addresses this problem with unresponsive ECN as well as | marking addresses this problem with unresponsive ECN as well as | |||
| addressing the saturation problem. At saturation, DualPI2 switches | addressing the saturation problem. At saturation, DualPI2 switches | |||
| into overload mode, where the base AQM is driven by the max delay of | into overload mode, where the base AQM is driven by the max delay of | |||
| both queues and it introduces probabilistic drop to both queues | both queues and it introduces probabilistic drop to both queues | |||
| equally. It leaves only a small range of congestion levels just | equally. It leaves only a small range of congestion levels just | |||
| below saturation where unresponsive traffic gains any advantage from | below saturation where unresponsive traffic gains any advantage from | |||
| using the ECN capability (relative to being unresponsive without | using the ECN capability (relative to being unresponsive without | |||
| ECN), and the advantage is hardly detectable (see [DualQ-Test] and | ECN), and the advantage is hardly detectable (see [DualQ-Test] and | |||
| section IV-E of [DCttH19]. Also overload with an unresponsive ECT(1) | section IV-G of [L4Seval22]. Also overload with an unresponsive | |||
| flow gets no more bandwidth advantage than with ECT(0). | ECT(1) flow gets no more bandwidth advantage than with ECT(0). | |||
| 5. References | 5. References | |||
| 5.1. Normative References | 5.1. Normative References | |||
| [I-D.ietf-tsvwg-ecn-l4s-id] | [I-D.ietf-tsvwg-ecn-l4s-id] | |||
| Schepper, K. D. and B. Briscoe, "Explicit Congestion | Schepper, K. and B. Briscoe, "Explicit Congestion | |||
| Notification (ECN) Protocol for Very Low Queuing Delay | Notification (ECN) Protocol for Ultra-Low Queuing Delay | |||
| (L4S)", Work in Progress, Internet-Draft, draft-ietf- | (L4S)", draft-ietf-tsvwg-ecn-l4s-id-14 (work in progress), | |||
| tsvwg-ecn-l4s-id-28, 8 August 2022, | March 2021. | |||
| <https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
| ietf-tsvwg-ecn-l4s-id/>. | ||||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
| of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
| RFC 3168, DOI 10.17487/RFC3168, September 2001, | RFC 3168, DOI 10.17487/RFC3168, September 2001, | |||
| <https://www.rfc-editor.org/info/rfc3168>. | <https://www.rfc-editor.org/info/rfc3168>. | |||
| skipping to change at page 29, line 32 ¶ | skipping to change at page 29, line 7 ¶ | |||
| [AQMmetrics] | [AQMmetrics] | |||
| Kwon, M. and S. Fahmy, "A Comparison of Load-based and | Kwon, M. and S. Fahmy, "A Comparison of Load-based and | |||
| Queue- based Active Queue Management Algorithms", Proc. | Queue- based Active Queue Management Algorithms", Proc. | |||
| Int'l Soc. for Optical Engineering (SPIE) 4866:35--46 DOI: | Int'l Soc. for Optical Engineering (SPIE) 4866:35--46 DOI: | |||
| 10.1117/12.473021, 2002, | 10.1117/12.473021, 2002, | |||
| <https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>. | <https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>. | |||
| [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An | [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An | |||
| Algorithm for Increasing the Robustness of RED's Active | Algorithm for Increasing the Robustness of RED's Active | |||
| Queue Management", ACIRI Technical Report , August 2001, | Queue Management", ACIRI Technical Report 301, August | |||
| <https://www.icir.org/floyd/red.html>. | 2001, <http://www.icsi.berkeley.edu/icsi/node/2032>. | |||
| [BBRv2] Cardwell, N., "BRTCP BBR v2 Alpha/Preview Release", GitHub | [BBRv2] Cardwell, N., "BRTCP BBR v2 Alpha/Preview Release", GitHub | |||
| repository; Linux congestion control module, | repository; Linux congestion control module, | |||
| <https://github.com/google/bbr/blob/v2alpha/README.md>. | <https://github.com/google/bbr/blob/v2alpha/README.md>. | |||
| [Boru20] Boru Oljira, D., Grinnemo, K-J., Brunstrom, A., and J. | [Boru20] Boru Oljira, D., Grinnemo, K-J., Brunstrom, A., and J. | |||
| Taheri, "Validating the Sharing Behavior and Latency | Taheri, "Validating the Sharing Behavior and Latency | |||
| Characteristics of the L4S Architecture", ACM CCR | Characteristics of the L4S Architecture", ACM CCR | |||
| 50(2):37--44, May 2020, | 50(2):37--44, May 2020, | |||
| <https://dl.acm.org/doi/abs/10.1145/3402413.3402419>. | <https://dl.acm.org/doi/abs/10.1145/3402413.3402419>. | |||
| skipping to change at page 30, line 15 ¶ | skipping to change at page 29, line 37 ¶ | |||
| [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", | [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", | |||
| ACM Queue 10(5), May 2012, | ACM Queue 10(5), May 2012, | |||
| <https://queue.acm.org/issuedetail.cfm?issue=2208917>. | <https://queue.acm.org/issuedetail.cfm?issue=2208917>. | |||
| [CRED_Insights] | [CRED_Insights] | |||
| Briscoe, B., "Insights from Curvy RED (Random Early | Briscoe, B., "Insights from Curvy RED (Random Early | |||
| Detection)", BT Technical Report TR-TUB8-2015-003 | Detection)", BT Technical Report TR-TUB8-2015-003 | |||
| arXiv:1904.07339 [cs.NI], July 2015, | arXiv:1904.07339 [cs.NI], July 2015, | |||
| <https://arxiv.org/abs/1904.07339>. | <https://arxiv.org/abs/1904.07339>. | |||
| [DCttH19] De Schepper, K., Bondarenko, O., Tilmans, O., and B. | ||||
| Briscoe, "`Data Centre to the Home': Ultra-Low Latency for | ||||
| All", Updated RITE project Technical Report , July 2019, | ||||
| <https://bobbriscoe.net/pubs.html#DCttH_TR>. | ||||
| [DOCSIS3.1] | [DOCSIS3.1] | |||
| CableLabs, "MAC and Upper Layer Protocols Interface | CableLabs, "MAC and Upper Layer Protocols Interface | |||
| (MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable | (MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable | |||
| Service Interface Specifications DOCSIS® 3.1 Version i17 | Service Interface Specifications DOCSIS(R) 3.1 Version i17 | |||
| or later, 21 January 2019, <https://specification- | or later, January 2019, <https://specification- | |||
| search.cablelabs.com/CM-SP-MULPIv3.1>. | search.cablelabs.com/CM-SP-MULPIv3.1>. | |||
| [DualPI2Linux] | [DualPI2Linux] | |||
| Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O., | Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O., | |||
| and H. Steen, "DUALPI2 - Low Latency, Low Loss and | and H. Steen, "DUALPI2 - Low Latency, Low Loss and | |||
| Scalable (L4S) AQM", Proc. Linux Netdev 0x13 , March 2019, | Scalable (L4S) AQM", Proc. Linux Netdev 0x13 , March 2019, | |||
| <https://www.netdevconf.org/0x13/session.html?talk- | <https://www.netdevconf.org/0x13/session.html?talk- | |||
| DUALPI2-AQM>. | DUALPI2-AQM>. | |||
| [DualQ-Test] | [DualQ-Test] | |||
| Steen, H., "Destruction Testing: Ultra-Low Delay using | Steen, H., "Destruction Testing: Ultra-Low Delay using | |||
| Dual Queue Coupled Active Queue Management", Master's | Dual Queue Coupled Active Queue Management", Master's | |||
| Thesis, Dept of Informatics, Uni Oslo , May 2017, | Thesis, Dept of Informatics, Uni Oslo , May 2017. | |||
| <https://www.duo.uio.no/bitstream/handle/10852/57424/ | ||||
| thesis-henrste.pdf?sequence=1>. | ||||
| [Dukkipati06] | [Dukkipati06] | |||
| Dukkipati, N. and N. McKeown, "Why Flow-Completion Time is | Dukkipati, N. and N. McKeown, "Why Flow-Completion Time is | |||
| the Right Metric for Congestion Control", ACM CCR | the Right Metric for Congestion Control", ACM CCR | |||
| 36(1):59--62, January 2006, | 36(1):59--62, January 2006, | |||
| <https://dl.acm.org/doi/10.1145/1111322.1111336>. | <https://dl.acm.org/doi/10.1145/1111322.1111336>. | |||
| [Heist21] Heist, P. and J. Morton, "L4S Tests", GitHub README, | [Heist21] Heist, P. and J. Morton, "L4S Tests", GitHub README, | |||
| August 2021, <https://github.com/heistp/l4s- | August 2021, <https://github.com/heistp/l4s- | |||
| tests/#underutilization-with-bursty-traffic>. | tests/#underutilization-with-bursty-traffic>. | |||
| [I-D.briscoe-docsis-q-protection] | [I-D.briscoe-docsis-q-protection] | |||
| Briscoe, B. and G. White, "The DOCSIS(r) Queue Protection | Briscoe, B. and G. White, "Queue Protection to Preserve | |||
| Algorithm to Preserve Low Latency", Work in Progress, | Low Latency", draft-briscoe-docsis-q-protection-00 (work | |||
| Internet-Draft, draft-briscoe-docsis-q-protection-06, 13 | in progress), July 2019. | |||
| May 2022, | ||||
| <https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
| briscoe-docsis-q-protection/>. | ||||
| [I-D.briscoe-iccrg-prague-congestion-control] | [I-D.briscoe-iccrg-prague-congestion-control] | |||
| Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague | Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague | |||
| Congestion Control", Work in Progress, Internet-Draft, | Congestion Control", draft-briscoe-iccrg-prague- | |||
| draft-briscoe-iccrg-prague-congestion-control-01, 11 July | congestion-control-01 (work in progress), July 2022, | |||
| 2022, <https://datatracker.ietf.org/api/v1/doc/document/ | <https://www.ietf.org/archive/id/draft-briscoe-iccrg- | |||
| draft-briscoe-iccrg-prague-congestion-control/>. | prague-congestion-control-01.txt>. | |||
| [I-D.briscoe-tsvwg-l4s-diffserv] | [I-D.briscoe-tsvwg-l4s-diffserv] | |||
| Briscoe, B., "Interactions between Low Latency, Low Loss, | Briscoe, B., "Interactions between Low Latency, Low Loss, | |||
| Scalable Throughput (L4S) and Differentiated Services", | Scalable Throughput (L4S) and Differentiated Services", | |||
| Work in Progress, Internet-Draft, draft-briscoe-tsvwg-l4s- | draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress), | |||
| diffserv-02, 2 July 2018, | November 2018. | |||
| <https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
| briscoe-tsvwg-l4s-diffserv/>. | ||||
| [I-D.cardwell-iccrg-bbr-congestion-control] | [I-D.cardwell-iccrg-bbr-congestion-control] | |||
| Cardwell, N., Cheng, Y., Yeganeh, S. H., Swett, I., and V. | Cardwell, N., Cheng, Y., Yeganeh, S., and V. Jacobson, | |||
| Jacobson, "BBR Congestion Control", Work in Progress, | "BBR Congestion Control", draft-cardwell-iccrg-bbr- | |||
| Internet-Draft, draft-cardwell-iccrg-bbr-congestion- | congestion-control-00 (work in progress), July 2017. | |||
| control-02, 7 March 2022, | ||||
| <https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
| cardwell-iccrg-bbr-congestion-control/>. | ||||
| [I-D.ietf-tsvwg-l4s-arch] | [I-D.ietf-tsvwg-l4s-arch] | |||
| Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White, | Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low | |||
| "Low Latency, Low Loss, Scalable Throughput (L4S) Internet | Latency, Low Loss, Scalable Throughput (L4S) Internet | |||
| Service: Architecture", Work in Progress, Internet-Draft, | Service: Architecture", draft-ietf-tsvwg-l4s-arch-08 (work | |||
| draft-ietf-tsvwg-l4s-arch-19, 27 July 2022, | in progress), November 2020. | |||
| <https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
| ietf-tsvwg-l4s-arch/>. | ||||
| [I-D.mathis-iccrg-relentless-tcp] | [I-D.mathis-iccrg-relentless-tcp] | |||
| Mathis, M., "Relentless Congestion Control", Work in | Mathis, M., "Relentless Congestion Control", draft-mathis- | |||
| Progress, Internet-Draft, draft-mathis-iccrg-relentless- | iccrg-relentless-tcp-00 (work in progress), March 2009. | |||
| tcp-00, 4 March 2009, <https://www.ietf.org/archive/id/ | ||||
| draft-mathis-iccrg-relentless-tcp-00.txt>. | [L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Oestberg, C., | |||
| Johansson, I., Strand, J., Ledl, P., and D. Schnieders, | ||||
| "Enabling time-critical applications over 5G with rate | ||||
| adaptation", Ericsson - Deutsche Telekom White Paper BNEW- | ||||
| 21:025455 Uen, May 2021, <https://www.ericsson.com/en/ | ||||
| reports-and-papers/white-papers/enabling-time-critical- | ||||
| applications-over-5g-with-rate-adaptation>. | ||||
| [L4Sdemo16] | [L4Sdemo16] | |||
| Bondarenko, O., De Schepper, K., Tsang, I., and B. | Bondarenko, O., De Schepper, K., Tsang, I., and B. | |||
| Briscoe, "Ultra-Low Delay for All: Live Experience, Live | Briscoe, "Ultra-Low Delay for All: Live Experience, Live | |||
| Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, | Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, | |||
| <https//dl.acm.org/citation.cfm?doid=2910017.2910633 | <https//dl.acm.org/citation.cfm?doid=2910017.2910633 | |||
| (videos of demos: | (videos of demos: | |||
| https://riteproject.eu/dctth/#1511dispatchwg )>. | https://riteproject.eu/dctth/#1511dispatchwg )>. | |||
| [L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Östberg, C., | [L4Seval22] | |||
| Johansson, I., Strand, J., Lédl, P., and D. Schnieders, | De Schepper, K., Albisser, O., Tilmans, O., and B. | |||
| "Enabling time-critical applications over 5G with rate | Briscoe, "Dual Queue Coupled AQM: Deployable Very Low | |||
| adaptation", Ericsson - Deutsche Telekom White Paper BNEW- | Queuing Delay for All", Preprint submitted to IEEE/ACM | |||
| 21:025455 Uen, May 2021, <https://www.ericsson.com/en/ | Transactions on Networking arXiv:2209.01078 [cs.NI], | |||
| reports-and-papers/white-papers/enabling-time-critical- | September 2022, <https://arxiv.org/abs/2209.01078>. | |||
| applications-over-5g-with-rate-adaptation>. | ||||
| [Labovitz10] | [Labovitz10] | |||
| Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, | Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, | |||
| J., and F. Jahanian, "Internet Inter-Domain Traffic", Proc | J., and F. Jahanian, "Internet Inter-Domain Traffic", Proc | |||
| ACM SIGCOMM; ACM CCR 40(4):75--86, August 2010, | ACM SIGCOMM; ACM CCR 40(4):75--86, August 2010, | |||
| <https://doi.org/10.1145/1851275.1851194>. | <https://doi.org/10.1145/1851275.1851194>. | |||
| [LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency | [LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency | |||
| DOCSIS: Technology Overview", CableLabs White Paper , | DOCSIS: Technology Overview", CableLabs White Paper , | |||
| February 2019, <https://cablela.bs/low-latency-docsis- | February 2019, <https://cablela.bs/low-latency-docsis- | |||
| technology-overview-february-2019>. | technology-overview-february-2019>. | |||
| [MEDF] Menth, M., Schmid, M., Heiss, H., and T. Reim, "MEDF - a | [MEDF] Menth, M., Schmid, M., Heiss, H., and T. Reim, "MEDF - a | |||
| simple scheduling algorithm for two real-time transport | simple scheduling algorithm for two real-time transport | |||
| service classes with application in the UTRAN", Proc. IEEE | service classes with application in the UTRAN", Proc. IEEE | |||
| Conference on Computer Communications (INFOCOM'03) Vol.2 | Conference on Computer Communications (INFOCOM'03) Vol.2 | |||
| pp.1116-1122, March 2003, | pp.1116-1122, March 2003. | |||
| <https://infocom2003.ieee-infocom.org/papers/27_04.PDF>. | ||||
| [PI2] De Schepper, K., Bondarenko, O., Briscoe, B., and I. | [PI2] De Schepper, K., Bondarenko, O., Briscoe, B., and I. | |||
| Tsang, "PI2: A Linearized AQM for both Classic and | Tsang, "PI2: A Linearized AQM for both Classic and | |||
| Scalable TCP", ACM CoNEXT'16 , December 2016, | Scalable TCP", ACM CoNEXT'16 , December 2016, | |||
| <https://riteproject.files.wordpress.com/2015/10/ | <https://dl.acm.org/doi/10.1145/2999572.2999578>. | |||
| pi2_conext.pdf>. | ||||
| [PI2param] Briscoe, B., "PI2 Parameters", Technical Report TR-BB- | [PI2param] | |||
| Briscoe, B., "PI2 Parameters", Technical Report TR-BB- | ||||
| 2021-001 arXiv:2107.01003 [cs.NI], July 2021, | 2021-001 arXiv:2107.01003 [cs.NI], July 2021, | |||
| <https://arxiv.org/abs/2107.01003>. | <https://arxiv.org/abs/2107.01003>. | |||
| [PragueLinux] | [PragueLinux] | |||
| Briscoe, B., De Schepper, K., Albisser, O., Misund, J., | Briscoe, B., De Schepper, K., Albisser, O., Misund, J., | |||
| Tilmans, O., Kühlewind, M., and A.S. Ahmed, "Implementing | Tilmans, O., Kuehlewind, M., and A. Ahmed, "Implementing | |||
| the `TCP Prague' Requirements for Low Latency Low Loss | the `TCP Prague' Requirements for Low Latency Low Loss | |||
| Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 , | Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 , | |||
| March 2019, <https://www.netdevconf.org/0x13/ | March 2019, <https://www.netdevconf.org/0x13/ | |||
| session.html?talk-tcp-prague-l4s>. | session.html?talk-tcp-prague-l4s>. | |||
| [RFC0970] Nagle, J., "On Packet Switches With Infinite Storage", | [RFC0970] Nagle, J., "On Packet Switches With Infinite Storage", | |||
| RFC 970, DOI 10.17487/RFC0970, December 1985, | RFC 970, DOI 10.17487/RFC0970, December 1985, | |||
| <https://www.rfc-editor.org/info/rfc970>. | <https://www.rfc-editor.org/info/rfc970>. | |||
| [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | |||
| skipping to change at page 33, line 21 ¶ | skipping to change at page 32, line 34 ¶ | |||
| Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | |||
| S., Wroclawski, J., and L. Zhang, "Recommendations on | S., Wroclawski, J., and L. Zhang, "Recommendations on | |||
| Queue Management and Congestion Avoidance in the | Queue Management and Congestion Avoidance in the | |||
| Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, | Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, | |||
| <https://www.rfc-editor.org/info/rfc2309>. | <https://www.rfc-editor.org/info/rfc2309>. | |||
| [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, | [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, | |||
| RFC 2914, DOI 10.17487/RFC2914, September 2000, | RFC 2914, DOI 10.17487/RFC2914, September 2000, | |||
| <https://www.rfc-editor.org/info/rfc2914>. | <https://www.rfc-editor.org/info/rfc2914>. | |||
| [RFC3246] Davie, B., Charny, A., Bennet, J.C.R., Benson, K., Le | [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, | |||
| Boudec, J.Y., Courtney, W., Davari, S., Firoiu, V., and D. | J., Courtney, W., Davari, S., Firoiu, V., and D. | |||
| Stiliadis, "An Expedited Forwarding PHB (Per-Hop | Stiliadis, "An Expedited Forwarding PHB (Per-Hop | |||
| Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, | Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, | |||
| <https://www.rfc-editor.org/info/rfc3246>. | <https://www.rfc-editor.org/info/rfc3246>. | |||
| [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", | [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", | |||
| RFC 3649, DOI 10.17487/RFC3649, December 2003, | RFC 3649, DOI 10.17487/RFC3649, December 2003, | |||
| <https://www.rfc-editor.org/info/rfc3649>. | <https://www.rfc-editor.org/info/rfc3649>. | |||
| [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion | [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion | |||
| Control Algorithms", BCP 133, RFC 5033, | Control Algorithms", BCP 133, RFC 5033, | |||
| skipping to change at page 35, line 9 ¶ | skipping to change at page 34, line 19 ¶ | |||
| [RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of | [RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of | |||
| Pervasive Encryption on Operators", RFC 8404, | Pervasive Encryption on Operators", RFC 8404, | |||
| DOI 10.17487/RFC8404, July 2018, | DOI 10.17487/RFC8404, July 2018, | |||
| <https://www.rfc-editor.org/info/rfc8404>. | <https://www.rfc-editor.org/info/rfc8404>. | |||
| [SCReAM] Johansson, I., "SCReAM", GitHub repository; , | [SCReAM] Johansson, I., "SCReAM", GitHub repository; , | |||
| <https://github.com/EricssonResearch/scream/blob/master/ | <https://github.com/EricssonResearch/scream/blob/master/ | |||
| README.md>. | README.md>. | |||
| [SigQ-Dyn] Briscoe, B., "Rapid Signalling of Queue Dynamics", | [SigQ-Dyn] | |||
| Briscoe, B., "Rapid Signalling of Queue Dynamics", | ||||
| Technical Report TR-BB-2017-001 arXiv:1904.07044 [cs.NI], | Technical Report TR-BB-2017-001 arXiv:1904.07044 [cs.NI], | |||
| September 2017, <https://arxiv.org/abs/1904.07044>. | September 2017, <https://arxiv.org/abs/1904.07044>. | |||
| Appendix A. Example DualQ Coupled PI2 Algorithm | Appendix A. Example DualQ Coupled PI2 Algorithm | |||
| As a first concrete example, the pseudocode below gives the DualPI2 | As a first concrete example, the pseudocode below gives the DualPI2 | |||
| algorithm. DualPI2 follows the structure of the DualQ Coupled AQM | algorithm. DualPI2 follows the structure of the DualQ Coupled AQM | |||
| framework in Figure 1. A simple ramp function (configured in units | framework in Figure 1. A simple ramp function (configured in units | |||
| of queuing time) with unsmoothed ECN marking is used for the Native | of queuing time) with unsmoothed ECN marking is used for the Native | |||
| L4S AQM. The ramp can also be configured as a step function. The | L4S AQM. The ramp can also be configured as a step function. The | |||
| skipping to change at page 36, line 5 ¶ | skipping to change at page 35, line 13 ¶ | |||
| [DualPI2Linux]. The specification of the DualQ Coupled AQM for | [DualPI2Linux]. The specification of the DualQ Coupled AQM for | |||
| DOCSIS cable modems and CMTSs is available in [DOCSIS3.1] and | DOCSIS cable modems and CMTSs is available in [DOCSIS3.1] and | |||
| explained in [LLD]. | explained in [LLD]. | |||
| A.1. Pass #1: Core Concepts | A.1. Pass #1: Core Concepts | |||
| The pseudocode manipulates three main structures of variables: the | The pseudocode manipulates three main structures of variables: the | |||
| packet (pkt), the L4S queue (lq) and the Classic queue (cq). The | packet (pkt), the L4S queue (lq) and the Classic queue (cq). The | |||
| pseudocode consists of the following six functions: | pseudocode consists of the following six functions: | |||
| * The initialization function dualpi2_params_init(...) (Figure 2) | o The initialization function dualpi2_params_init(...) (Figure 2) | |||
| that sets parameter defaults (the API for setting non-default | that sets parameter defaults (the API for setting non-default | |||
| values is omitted for brevity) | values is omitted for brevity) | |||
| * The enqueue function dualpi2_enqueue(lq, cq, pkt) (Figure 3) | o The enqueue function dualpi2_enqueue(lq, cq, pkt) (Figure 3) | |||
| * The dequeue function dualpi2_dequeue(lq, cq, pkt) (Figure 4) | o The dequeue function dualpi2_dequeue(lq, cq, pkt) (Figure 4) | |||
| * The recurrence function recur(q, likelihood) for de-randomized ECN | o The recurrence function recur(q, likelihood) for de-randomized ECN | |||
| marking (shown at the end of Figure 4). | marking (shown at the end of Figure 4). | |||
| * The L4S AQM function laqm(qdelay) (Figure 5) used to calculate the | o The L4S AQM function laqm(qdelay) (Figure 5) used to calculate the | |||
| ECN-marking probability for the L4S queue | ECN-marking probability for the L4S queue | |||
| * The base AQM function that implements the PI algorithm | o The base AQM function that implements the PI algorithm | |||
| dualpi2_update(lq, cq) (Figure 6) used to regularly update the | dualpi2_update(lq, cq) (Figure 6) used to regularly update the | |||
| base probability (p'), which is squared for the Classic AQM as | base probability (p'), which is squared for the Classic AQM as | |||
| well as being coupled across to the L4S queue. | well as being coupled across to the L4S queue. | |||
| It also uses the following functions that are not shown in full here: | It also uses the following functions that are not shown in full here: | |||
| * scheduler(), which selects between the head packets of the two | o scheduler(), which selects between the head packets of the two | |||
| queues; the choice of scheduler technology is discussed later; | queues; the choice of scheduler technology is discussed later; | |||
| * cq.byt() or lq.byt() returns the current length (aka. backlog) of | o cq.byt() or lq.byt() returns the current length (aka. backlog) of | |||
| the relevant queue in bytes; | the relevant queue in bytes; | |||
| * cq.len() or lq.len() returns the current length of the relevant | o cq.len() or lq.len() returns the current length of the relevant | |||
| queue in packets; | queue in packets; | |||
| * cq.time() or lq.time() returns the current queuing delay of the | o cq.time() or lq.time() returns the current queuing delay of the | |||
| relevant queue in units of time (see Note a); | relevant queue in units of time (see Note a); | |||
| * mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; | o mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; | |||
| In experiments so far (building on experiments with PIE) on broadband | In experiments so far (building on experiments with PIE) on broadband | |||
| access links ranging from 4 Mb/s to 200 Mb/s with base RTTs from 5 ms | access links ranging from 4 Mb/s to 200 Mb/s with base RTTs from 5 ms | |||
| to 100 ms, DualPI2 achieves good results with the default parameters | to 100 ms, DualPI2 achieves good results with the default parameters | |||
| in Figure 2. The parameters are categorised by whether they relate | in Figure 2. The parameters are categorised by whether they relate | |||
| to the Base PI2 AQM, the L4S AQM or the framework coupling them | to the Base PI2 AQM, the L4S AQM or the framework coupling them | |||
| together. Constants and variables derived from these parameters are | together. Constants and variables derived from these parameters are | |||
| also included at the end of each category. Each parameter is | also included at the end of each category. Each parameter is | |||
| explained as it is encountered in the walk-through of the pseudocode | explained as it is encountered in the walk-through of the pseudocode | |||
| below, and the rationale for the chosen defaults are given so that | below, and the rationale for the chosen defaults are given so that | |||
| skipping to change at page 38, line 16 ¶ | skipping to change at page 37, line 16 ¶ | |||
| 2: if ( lq.byt() + cq.byt() + MTU > limit) | 2: if ( lq.byt() + cq.byt() + MTU > limit) | |||
| 3: drop(pkt) % drop packet if buffer is full | 3: drop(pkt) % drop packet if buffer is full | |||
| 4: timestamp(pkt) % only needed if using the sojourn technique | 4: timestamp(pkt) % only needed if using the sojourn technique | |||
| 5: % Packet classifier | 5: % Packet classifier | |||
| 6: if ( ecn(pkt) modulo 2 == 1 ) % ECN bits = ECT(1) or CE | 6: if ( ecn(pkt) modulo 2 == 1 ) % ECN bits = ECT(1) or CE | |||
| 7: lq.enqueue(pkt) | 7: lq.enqueue(pkt) | |||
| 8: else % ECN bits = not-ECT or ECT(0) | 8: else % ECN bits = not-ECT or ECT(0) | |||
| 9: cq.enqueue(pkt) | 9: cq.enqueue(pkt) | |||
| 10: } | 10: } | |||
| Figure 3: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM | Figure 3: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM | |||
| 1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | 1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | |||
| 2: while ( lq.byt() + cq.byt() > 0 ) { | 2: while ( lq.byt() + cq.byt() > 0 ) { | |||
| 3: if ( scheduler() == lq ) { | 3: if ( scheduler() == lq ) { | |||
| 4: lq.dequeue(pkt) % Scheduler chooses lq | 4: lq.dequeue(pkt) % Scheduler chooses lq | |||
| 5: p'_L = laqm(lq.time()) % Native LAQM | 5: p'_L = laqm(lq.time()) % Native LAQM | |||
| 6: p_L = max(p'_L, p_CL) % Combining function | 6: p_L = max(p'_L, p_CL) % Combining function | |||
| 7: if ( recur(lq, p_L) ) % Linear marking | 7: if ( recur(lq, p_L) ) % Linear marking | |||
| 8: mark(pkt) | 8: mark(pkt) | |||
| 9: } else { | 9: } else { | |||
| skipping to change at page 38, line 50 ¶ | skipping to change at page 37, line 50 ¶ | |||
| 23: recur(q, likelihood) { % Returns TRUE with a certain likelihood | 23: recur(q, likelihood) { % Returns TRUE with a certain likelihood | |||
| 24: q.count += likelihood | 24: q.count += likelihood | |||
| 25: if (q.count > 1) { | 25: if (q.count > 1) { | |||
| 26: q.count -= 1 | 26: q.count -= 1 | |||
| 27: return TRUE | 27: return TRUE | |||
| 28: } | 28: } | |||
| 29: return FALSE | 29: return FALSE | |||
| 30: } | 30: } | |||
| Figure 4: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | Figure 4: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | |||
| When packets arrive, first a common queue limit is checked as shown | When packets arrive, first a common queue limit is checked as shown | |||
| in line 2 of the enqueuing pseudocode in Figure 3. This assumes a | in line 2 of the enqueuing pseudocode in Figure 3. This assumes a | |||
| shared buffer for the two queues (Note b discusses the merits of | shared buffer for the two queues (Note b discusses the merits of | |||
| separate buffers). In order to avoid any bias against larger | separate buffers). In order to avoid any bias against larger | |||
| packets, 1 MTU of space is always allowed, and the limit is | packets, 1 MTU of space is always allowed, and the limit is | |||
| deliberately tested before enqueue. | deliberately tested before enqueue. | |||
| If limit is not exceeded, the packet is timestamped in line 4 (only | If limit is not exceeded, the packet is timestamped in line 4 (only | |||
| if the sojourn time technique is being used to measure queue delay; | if the sojourn time technique is being used to measure queue delay; | |||
| skipping to change at page 39, line 43 ¶ | skipping to change at page 38, line 43 ¶ | |||
| loop sloppier (for a typical RTT it would double the Classic queue's | loop sloppier (for a typical RTT it would double the Classic queue's | |||
| feedback delay). | feedback delay). | |||
| All the dequeue code is contained within a large while loop so that | All the dequeue code is contained within a large while loop so that | |||
| if it decides to drop a packet, it will continue until it selects a | if it decides to drop a packet, it will continue until it selects a | |||
| packet to schedule. Line 3 of the dequeue pseudocode is where the | packet to schedule. Line 3 of the dequeue pseudocode is where the | |||
| scheduler chooses between the L4S queue (lq) and the Classic queue | scheduler chooses between the L4S queue (lq) and the Classic queue | |||
| (cq). Detailed implementation of the scheduler is not shown (see | (cq). Detailed implementation of the scheduler is not shown (see | |||
| discussion later). | discussion later). | |||
| * If an L4S packet is scheduled, in lines 7 and 8 the packet is ECN- | o If an L4S packet is scheduled, in lines 7 and 8 the packet is ECN- | |||
| marked with likelihood p_L. The recur() function at the end of | marked with likelihood p_L. The recur() function at the end of | |||
| Figure 4 is used, which is preferred over random marking because | Figure 4 is used, which is preferred over random marking because | |||
| it avoids delay due to randomization when interpreting congestion | it avoids delay due to randomization when interpreting congestion | |||
| signals, but it still desynchronizes the saw-teeth of the flows. | signals, but it still desynchronizes the saw-teeth of the flows. | |||
| Line 6 calculates p_L as the maximum of the coupled L4S | Line 6 calculates p_L as the maximum of the coupled L4S | |||
| probability p_CL and the probability from the native L4S AQM p'_L. | probability p_CL and the probability from the native L4S AQM p'_L. | |||
| This implements the max() function shown in Figure 1 to couple the | This implements the max() function shown in Figure 1 to couple the | |||
| outputs of the two AQMs together. Of the two probabilities input | outputs of the two AQMs together. Of the two probabilities input | |||
| to p_L in line 6: | to p_L in line 6: | |||
| - p'_L is calculated per packet in line 5 by the laqm() function | * p'_L is calculated per packet in line 5 by the laqm() function | |||
| (see Figure 5), | (see Figure 5), | |||
| - Whereas p_CL is maintained by the dualpi2_update() function | * Whereas p_CL is maintained by the dualpi2_update() function | |||
| which runs every Tupdate (Tupdate is set in line 12 of | which runs every Tupdate (Tupdate is set in line 12 of | |||
| Figure 2). | Figure 2). | |||
| * If a Classic packet is scheduled, lines 10 to 17 drop or mark the | o If a Classic packet is scheduled, lines 10 to 17 drop or mark the | |||
| packet with probability p_C. | packet with probability p_C. | |||
| The Native L4S AQM algorithm (Figure 5) is a ramp function, similar | The Native L4S AQM algorithm (Figure 5) is a ramp function, similar | |||
| to the RED algorithm, but simplified as follows: | to the RED algorithm, but simplified as follows: | |||
| * The extent of the ramp is defined in units of queuing delay, not | o The extent of the ramp is defined in units of queuing delay, not | |||
| bytes, so that configuration remains invariant as the queue | bytes, so that configuration remains invariant as the queue | |||
| departure rate varies. | departure rate varies. | |||
| * It uses instantaneous queueing delay, which avoids the complexity | o It uses instantaneous queueing delay, which avoids the complexity | |||
| of smoothing, but also avoids embedding a worst-case RTT of | of smoothing, but also avoids embedding a worst-case RTT of | |||
| smoothing delay in the network (see Section 2.1). | smoothing delay in the network (see Section 2.1). | |||
| * The ramp rises linearly directly from 0 to 1, not to an | o The ramp rises linearly directly from 0 to 1, not to an | |||
| intermediate value of p'_L as RED would, because there is no need | intermediate value of p'_L as RED would, because there is no need | |||
| to keep ECN marking probability low. | to keep ECN marking probability low. | |||
| * Marking does not have to be randomized. Determinism is used | o Marking does not have to be randomized. Determinism is used | |||
| instead of randomness; to reduce the delay necessary to smooth out | instead of randomness; to reduce the delay necessary to smooth out | |||
| the noise of randomness from the signal. | the noise of randomness from the signal. | |||
| The ramp function requires two configuration parameters, the minimum | The ramp function requires two configuration parameters, the minimum | |||
| threshold (minTh) and the width of the ramp (range), both in units of | threshold (minTh) and the width of the ramp (range), both in units of | |||
| queuing time, as shown in lines 17 & 18 of the initialization | queuing time, as shown in lines 17 & 18 of the initialization | |||
| function in Figure 2. The ramp function can be configured as a step | function in Figure 2. The ramp function can be configured as a step | |||
| (see Note c). | (see Note c). | |||
| Although the DCTCP paper [Alizadeh-stability] recommends an ECN | Although the DCTCP paper [Alizadeh-stability] recommends an ECN | |||
| skipping to change at page 41, line 24 ¶ | skipping to change at page 40, line 24 ¶ | |||
| Figure 5: Example Pseudocode for the Native L4S AQM | Figure 5: Example Pseudocode for the Native L4S AQM | |||
| 1: dualpi2_update(lq, cq) { % Update p' every Tupdate | 1: dualpi2_update(lq, cq) { % Update p' every Tupdate | |||
| 2: curq = cq.time() % use queuing time of first-in Classic packet | 2: curq = cq.time() % use queuing time of first-in Classic packet | |||
| 3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | 3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | |||
| 4: p_CL = k * p' % Coupled L4S prob = base prob * coupling factor | 4: p_CL = k * p' % Coupled L4S prob = base prob * coupling factor | |||
| 5: p_C = p'^2 % Classic prob = (base prob)^2 | 5: p_C = p'^2 % Classic prob = (base prob)^2 | |||
| 6: prevq = curq | 6: prevq = curq | |||
| 7: } | 7: } | |||
| Figure 6: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | ||||
| (Clamping p' within the range [0,1] omitted for clarity - see text) | (Clamping p' within the range [0,1] omitted for clarity - see text) | |||
| Figure 6: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | ||||
| The coupled marking probability, p_CL depends on the base probability | The coupled marking probability, p_CL depends on the base probability | |||
| (p'), which is kept up to date by the core PI algorithm in Figure 6 | (p'), which is kept up to date by the core PI algorithm in Figure 6 | |||
| executed every Tupdate. | executed every Tupdate. | |||
| Note that p' solely depends on the queuing time in the Classic queue. | Note that p' solely depends on the queuing time in the Classic queue. | |||
| In line 2, the current queuing delay (curq) is evaluated from how | In line 2, the current queuing delay (curq) is evaluated from how | |||
| long the head packet was in the Classic queue (cq). The function | long the head packet was in the Classic queue (cq). The function | |||
| cq.time() (not shown) subtracts the time stamped at enqueue from the | cq.time() (not shown) subtracts the time stamped at enqueue from the | |||
| current time (see Note a) and implicitly takes the current queuing | current time (see Note a) and implicitly takes the current queuing | |||
| delay as 0 if the queue is empty. | delay as 0 if the queue is empty. | |||
| skipping to change at page 42, line 13 ¶ | skipping to change at page 41, line 12 ¶ | |||
| [PI2param], the target queuing delay on line 9 of Figure 2 is related | [PI2param], the target queuing delay on line 9 of Figure 2 is related | |||
| to the typical base RTT worldwide, RTT_typ, by two factors: target = | to the typical base RTT worldwide, RTT_typ, by two factors: target = | |||
| RTT_typ * g * f. Below we summarize the rationale behind these | RTT_typ * g * f. Below we summarize the rationale behind these | |||
| factors and introduce a further adjustment. The two factors ensure | factors and introduce a further adjustment. The two factors ensure | |||
| that, in a large proportion of cases (say 90%), the sawtooth | that, in a large proportion of cases (say 90%), the sawtooth | |||
| variations in RTT of a single flow will fit within the buffer without | variations in RTT of a single flow will fit within the buffer without | |||
| underutilizing the link. Frankly, these factors are educated | underutilizing the link. Frankly, these factors are educated | |||
| guesses, but with the emphasis closer to 'educated' than to 'guess' | guesses, but with the emphasis closer to 'educated' than to 'guess' | |||
| (see [PI2param] for full background): | (see [PI2param] for full background): | |||
| * RTT_typ is taken as 25 ms. This is based on an average CDN | o RTT_typ is taken as 25 ms. This is based on an average CDN | |||
| latency measured in each country weighted by the number of | latency measured in each country weighted by the number of | |||
| Internet users in that country to produce an overall weighted | Internet users in that country to produce an overall weighted | |||
| average for the Internet [PI2param]. Countries were ranked by | average for the Internet [PI2param]. Countries were ranked by | |||
| number of Internet users, and once 90% of Internet users were | number of Internet users, and once 90% of Internet users were | |||
| covered, smaller countries were excluded to avoid | covered, smaller countries were excluded to avoid | |||
| unrepresentatively small sample sizes. Also, importantly, the | unrepresentatively small sample sizes. Also, importantly, the | |||
| data for the average CDN latency in China (with the largest number | data for the average CDN latency in China (with the largest number | |||
| of Internet users) has been removed, because the CDN latency was a | of Internet users) has been removed, because the CDN latency was a | |||
| significant outlier and, on reflection, the experimental technique | significant outlier and, on reflection, the experimental technique | |||
| seemed inappropriate to the CDN market in China. | seemed inappropriate to the CDN market in China. | |||
| * g is taken as 0.38. The factor g is a geometry factor that | o g is taken as 0.38. The factor g is a geometry factor that | |||
| characterizes the shape of the sawteeth of prevalent Classic | characterizes the shape of the sawteeth of prevalent Classic | |||
| congestion controllers. The geometry factor is the fraction of | congestion controllers. The geometry factor is the fraction of | |||
| the amplitude of the sawtooth variability in queue delay that lies | the amplitude of the sawtooth variability in queue delay that lies | |||
| below the AQM's target. For instance, at low bit rate, the | below the AQM's target. For instance, at low bit rate, the | |||
| geometry factor of standard Reno is 0.5, but at higher rates it | geometry factor of standard Reno is 0.5, but at higher rates it | |||
| tends to just under 1. According to the census of congestion | tends to just under 1. According to the census of congestion | |||
| controllers conducted by Mishra et al. in Jul-Oct | controllers conducted by Mishra et al. in Jul-Oct | |||
| 2019 [CCcensus19], most Classic TCP traffic uses Cubic. And, | 2019 [CCcensus19], most Classic TCP traffic uses Cubic. And, | |||
| according to the analysis in [PI2param], if running over a PI2 | according to the analysis in [PI2param], if running over a PI2 | |||
| AQM, a large proportion of this Cubic traffic would be in its | AQM, a large proportion of this Cubic traffic would be in its | |||
| Reno-Friendly mode, which has a geometry factor of ~0.39 (all | Reno-Friendly mode, which has a geometry factor of ~0.39 (all | |||
| known implementations). The rest of the Cubic traffic would be in | known implementations). The rest of the Cubic traffic would be in | |||
| true Cubic mode, which has a geometry factor of ~0.36. Without | true Cubic mode, which has a geometry factor of ~0.36. Without | |||
| modelling the sawtooth profiles from all the other less prevalent | modelling the sawtooth profiles from all the other less prevalent | |||
| congestion controllers, we estimate a 7:3 weighted average of | congestion controllers, we estimate a 7:3 weighted average of | |||
| these two, resulting in an average geometry factor of 0.38. | these two, resulting in an average geometry factor of 0.38. | |||
| * f is taken as 2. The factor f is a safety factor that increases | o f is taken as 2. The factor f is a safety factor that increases | |||
| the target queue to allow for the distribution of RTT_typ around | the target queue to allow for the distribution of RTT_typ around | |||
| its mean. Otherwise, the target queue would only avoid | its mean. Otherwise, the target queue would only avoid | |||
| underutilization for those users below the mean. It also provides | underutilization for those users below the mean. It also provides | |||
| a safety margin for the proportion of paths in use that span | a safety margin for the proportion of paths in use that span | |||
| beyond the distance between a user and their local CDN. | beyond the distance between a user and their local CDN. | |||
| Currently, no data is available on the variance of queue delay | Currently, no data is available on the variance of queue delay | |||
| around the mean in each region, so there is plenty of room for | around the mean in each region, so there is plenty of room for | |||
| this guess to become more educated. | this guess to become more educated. | |||
| * [PI2param] recommends target = RTT_typ * g * f = 25ms * 0.38 * 2 = | o [PI2param] recommends target = RTT_typ * g * f = 25ms * 0.38 * 2 = | |||
| 19 ms. However, a further adjustment is warranted, because target | 19 ms. However, a further adjustment is warranted, because target | |||
| is moving year-on-year. The paper is based on data collected in | is moving year-on-year. The paper is based on data collected in | |||
| 2019, and it mentions evidence from speedtest.net that suggests | 2019, and it mentions evidence from speedtest.net that suggests | |||
| RTT_typ reduced by 17% (fixed) or 12% (mobile) between 2020 and | RTT_typ reduced by 17% (fixed) or 12% (mobile) between 2020 and | |||
| 2021. Therefore, we recommend a default of target = 15 ms at the | 2021. Therefore, we recommend a default of target = 15 ms at the | |||
| time of writing (2021). | time of writing (2021). | |||
| Operators can always use the data and discussion in [PI2param] to | Operators can always use the data and discussion in [PI2param] to | |||
| configure a more appropriate target for their environment. For | configure a more appropriate target for their environment. For | |||
| instance, an operator might wish to question the assumptions called | instance, an operator might wish to question the assumptions called | |||
| skipping to change at page 45, line 5 ¶ | skipping to change at page 43, line 48 ¶ | |||
| Tupdate dependent on p'. Instead, in PI2, alpha and beta are | Tupdate dependent on p'. Instead, in PI2, alpha and beta are | |||
| independent of p' because the squaring applied to Classic traffic | independent of p' because the squaring applied to Classic traffic | |||
| tunes them inherently. This is explained in [PI2], which also | tunes them inherently. This is explained in [PI2], which also | |||
| explains why this more principled approach removes the need for most | explains why this more principled approach removes the need for most | |||
| of the heuristics that had to be added to PIE. | of the heuristics that had to be added to PIE. | |||
| Nonetheless, an implementer might wish to add selected details to | Nonetheless, an implementer might wish to add selected details to | |||
| either AQM. For instance the Linux reference DualPI2 implementation | either AQM. For instance the Linux reference DualPI2 implementation | |||
| includes the following (not shown in the pseudocode above): | includes the following (not shown in the pseudocode above): | |||
| * Classic and coupled marking or dropping (i.e. based on p_C and | o Classic and coupled marking or dropping (i.e. based on p_C and | |||
| p_CL from the PI controller) is not applied to a packet if the | p_CL from the PI controller) is not applied to a packet if the | |||
| aggregate queue length in bytes is < 2 MTU (prior to enqueuing the | aggregate queue length in bytes is < 2 MTU (prior to enqueuing the | |||
| packet or dequeuing it, depending on whether the AQM is configured | packet or dequeuing it, depending on whether the AQM is configured | |||
| to be applied at enqueue or dequeue); | to be applied at enqueue or dequeue); | |||
| * In the WRR scheduler, the 'credit' indicating which queue should | o In the WRR scheduler, the 'credit' indicating which queue should | |||
| transmit is only changed if there are packets in both queues | transmit is only changed if there are packets in both queues | |||
| (i.e. if there is actual resource contention). This means that a | (i.e. if there is actual resource contention). This means that a | |||
| properly paced L flow might never be delayed by the WRR. The WRR | properly paced L flow might never be delayed by the WRR. The WRR | |||
| credit is reset in favour of the L queue when the link is idle. | credit is reset in favour of the L queue when the link is idle. | |||
| An implementer might also wish to add other heuristics, e.g. burst | An implementer might also wish to add other heuristics, e.g. burst | |||
| protection [RFC8033] or enhanced burst protection [RFC8034]. | protection [RFC8033] or enhanced burst protection [RFC8034]. | |||
| Notes: | Notes: | |||
| skipping to change at page 47, line 23 ¶ | skipping to change at page 46, line 20 ¶ | |||
| The pseudocode given here applies where the link rate is unknown, | The pseudocode given here applies where the link rate is unknown, | |||
| which is more common for software implementations that might be | which is more common for software implementations that might be | |||
| deployed in scenarios where the link is shared with other queues. In | deployed in scenarios where the link is shared with other queues. In | |||
| lines 5a to 5d in Figure 7 the native L4S marking probability, p'_L, | lines 5a to 5d in Figure 7 the native L4S marking probability, p'_L, | |||
| is zeroed if the queue is only 1 packet (in the default | is zeroed if the queue is only 1 packet (in the default | |||
| configuration). | configuration). | |||
| Linux implementation note: | Linux implementation note: | |||
| * In Linux, the check that the queue exceeds Th_len before marking | o In Linux, the check that the queue exceeds Th_len before marking | |||
| with the native L4S AQM is actually at enqueue, not dequeue, | with the native L4S AQM is actually at enqueue, not dequeue, | |||
| otherwise it would exempt the last packet of a burst from being | otherwise it would exempt the last packet of a burst from being | |||
| marked. The result of the check is conveyed from enqueue to the | marked. The result of the check is conveyed from enqueue to the | |||
| dequeue function via a boolean in the packet metadata. | dequeue function via a boolean in the packet metadata. | |||
| Persistent overload is deemed to have occurred when Classic drop/ | Persistent overload is deemed to have occurred when Classic drop/ | |||
| marking probability reaches p_Cmax. Above this point, the Classic | marking probability reaches p_Cmax. Above this point, the Classic | |||
| drop probability is applied to both L and C queues, irrespective of | drop probability is applied to both L and C queues, irrespective of | |||
| whether any packet is ECN-capable. ECT packets that are not dropped | whether any packet is ECN-capable. ECT packets that are not dropped | |||
| can still be ECN-marked. | can still be ECN-marked. | |||
| skipping to change at page 49, line 41 ¶ | skipping to change at page 48, line 41 ¶ | |||
| 14: continue % continue to the top of the while loop | 14: continue % continue to the top of the while loop | |||
| 15: } | 15: } | |||
| 16: mark(pkt) % squared mark | 16: mark(pkt) % squared mark | |||
| 17: } | 17: } | |||
| 18: } | 18: } | |||
| 19: return(pkt) % return the packet and stop | 19: return(pkt) % return the packet and stop | |||
| 20: } | 20: } | |||
| 21: return(NULL) % no packet to dequeue | 21: return(NULL) % no packet to dequeue | |||
| 22: } | 22: } | |||
| Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | |||
| (Including Code for Edge-Cases) | (Including Code for Edge-Cases) | |||
| 1: dualpi2_update(lq, cq) { % Update p' every Tupdate | 1: dualpi2_update(lq, cq) { % Update p' every Tupdate | |||
| 2a: curq = max(cq.time(), lq.time()) % use greatest queuing time | 2a: curq = max(cq.time(), lq.time()) % use greatest queuing time | |||
| 3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | 3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | |||
| 4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor | 4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor | |||
| 5: p_C = p'^2 % Classic prob = (base prob)^2 | 5: p_C = p'^2 % Classic prob = (base prob)^2 | |||
| 6: prevq = curq | 6: prevq = curq | |||
| 7: } | 7: } | |||
| Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | ||||
| Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | ||||
| (Including Overload Code) | (Including Overload Code) | |||
| The choice of scheduler technology is critical to overload protection | The choice of scheduler technology is critical to overload protection | |||
| (see Section 4.2.2). | (see Section 4.2.2). | |||
| * A well-understood weighted scheduler such as weighted round-robin | o A well-understood weighted scheduler such as weighted round-robin | |||
| (WRR) is recommended. As long as the scheduler weight for Classic | (WRR) is recommended. As long as the scheduler weight for Classic | |||
| is small (e.g. 1/16), its exact value is unimportant because it | is small (e.g. 1/16), its exact value is unimportant because it | |||
| does not normally determine capacity shares. The weight is only | does not normally determine capacity shares. The weight is only | |||
| important to prevent unresponsive L4S traffic starving Classic | important to prevent unresponsive L4S traffic starving Classic | |||
| traffic in the short term (see Section 4.2.2). This is because | traffic in the short term (see Section 4.2.2). This is because | |||
| capacity sharing between the queues is normally determined by the | capacity sharing between the queues is normally determined by the | |||
| coupled congestion signal, which overrides the scheduler, by | coupled congestion signal, which overrides the scheduler, by | |||
| making L4S sources leave roughly equal per-flow capacity available | making L4S sources leave roughly equal per-flow capacity available | |||
| for Classic flows. | for Classic flows. | |||
| * Alternatively, a time-shifted FIFO (TS-FIFO) could be used. It | o Alternatively, a time-shifted FIFO (TS-FIFO) could be used. It | |||
| works by selecting the head packet that has waited the longest, | works by selecting the head packet that has waited the longest, | |||
| biased against the Classic traffic by a time-shift of tshift. To | biased against the Classic traffic by a time-shift of tshift. To | |||
| implement time-shifted FIFO, the scheduler() function in line 3 of | implement time-shifted FIFO, the scheduler() function in line 3 of | |||
| the dequeue code would simply be implemented as the scheduler() | the dequeue code would simply be implemented as the scheduler() | |||
| function at the bottom of Figure 10 in Appendix B. For the public | function at the bottom of Figure 10 in Appendix B. For the public | |||
| Internet a good value for tshift is 50ms. For private networks | Internet a good value for tshift is 50ms. For private networks | |||
| with smaller diameter, about 4*target would be reasonable. TS- | with smaller diameter, about 4*target would be reasonable. TS- | |||
| FIFO is a very simple scheduler, but complexity might need to be | FIFO is a very simple scheduler, but complexity might need to be | |||
| added to address some deficiencies (which is why it is not | added to address some deficiencies (which is why it is not | |||
| recommended over WRR): | recommended over WRR): | |||
| - TS-FIFO does not fully isolate latency in the L4S queue from | * TS-FIFO does not fully isolate latency in the L4S queue from | |||
| uncontrolled bursts in the Classic queue; | uncontrolled bursts in the Classic queue; | |||
| - Using sojourn time for TS-FIFO is only appropriate if time- | * Using sojourn time for TS-FIFO is only appropriate if time- | |||
| stamping of packets is feasible; | stamping of packets is feasible; | |||
| - Even if time-stamping is supported, the sojourn time of the | * Even if time-stamping is supported, the sojourn time of the | |||
| head packet is always stale, so a more instantaneous measure of | head packet is always stale, so a more instantaneous measure of | |||
| queue delay could be used (see Note a in Appendix A.1). | queue delay could be used (see Note a in Appendix A.1). | |||
| * A strict priority scheduler would be inappropriate as discussed in | o A strict priority scheduler would be inappropriate as discussed in | |||
| Section 4.2.2. | Section 4.2.2. | |||
| Appendix B. Example DualQ Coupled Curvy RED Algorithm | Appendix B. Example DualQ Coupled Curvy RED Algorithm | |||
| As another example of a DualQ Coupled AQM algorithm, the pseudocode | As another example of a DualQ Coupled AQM algorithm, the pseudocode | |||
| below gives the Curvy RED based algorithm. Although the AQM was | below gives the Curvy RED based algorithm. Although the AQM was | |||
| designed to be efficient in integer arithmetic, to aid understanding | designed to be efficient in integer arithmetic, to aid understanding | |||
| it is first given using floating point arithmetic (Figure 10). Then, | it is first given using floating point arithmetic (Figure 10). Then, | |||
| one possible optimization for integer arithmetic is given, also in | one possible optimization for integer arithmetic is given, also in | |||
| pseudocode (Figure 11). To aid comparison, the line numbers are kept | pseudocode (Figure 11). To aid comparison, the line numbers are kept | |||
| in step between the two by using letter suffixes where the longer | in step between the two by using letter suffixes where the longer | |||
| code needs extra lines. | code needs extra lines. | |||
| B.1. Curvy RED in Pseudocode | B.1. Curvy RED in Pseudocode | |||
| The pseudocode manipulates three main structures of variables: the | The pseudocode manipulates three main structures of variables: the | |||
| packet (pkt), the L4S queue (lq) and the Classic queue (cq) and | packet (pkt), the L4S queue (lq) and the Classic queue (cq) and | |||
| consists of the following five functions: | consists of the following five functions: | |||
| * The initialization function cred_params_init(...) (Figure 2) that | o The initialization function cred_params_init(...) (Figure 2) that | |||
| sets parameter defaults (the API for setting non-default values is | sets parameter defaults (the API for setting non-default values is | |||
| omitted for brevity); | omitted for brevity); | |||
| * The dequeue function cred_dequeue(lq, cq, pkt) (Figure 4); | o The dequeue function cred_dequeue(lq, cq, pkt) (Figure 4); | |||
| * The scheduling function scheduler(), which selects between the | o The scheduling function scheduler(), which selects between the | |||
| head packets of the two queues. | head packets of the two queues. | |||
| It also uses the following functions that are either shown elsewhere, | It also uses the following functions that are either shown elsewhere, | |||
| or not shown in full here: | or not shown in full here: | |||
| * The enqueue function, which is identical to that used for DualPI2, | o The enqueue function, which is identical to that used for DualPI2, | |||
| dualpi2_enqueue(lq, cq, pkt) in Figure 3; | dualpi2_enqueue(lq, cq, pkt) in Figure 3; | |||
| * mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; | o mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; | |||
| * cq.byt() or lq.byt() returns the current length (aka. backlog) of | o cq.byt() or lq.byt() returns the current length (aka. backlog) of | |||
| the relevant queue in bytes; | the relevant queue in bytes; | |||
| * cq.time() or lq.time() returns the current queuing delay of the | o cq.time() or lq.time() returns the current queuing delay of the | |||
| relevant queue in units of time (see Note a in Appendix A.1). | relevant queue in units of time (see Note a in Appendix A.1). | |||
| Because Curvy RED was evaluated before DualPI2, certain improvements | Because Curvy RED was evaluated before DualPI2, certain improvements | |||
| introduced for DualPI2 were not evaluated for Curvy RED. In the | introduced for DualPI2 were not evaluated for Curvy RED. In the | |||
| pseudocode below, the straightforward improvements have been added on | pseudocode below, the straightforward improvements have been added on | |||
| the assumption they will provide similar benefits, but that has not | the assumption they will provide similar benefits, but that has not | |||
| been proven experimentally. They are: i) a conditional priority | been proven experimentally. They are: i) a conditional priority | |||
| scheduler instead of strict priority ii) a time-based threshold for | scheduler instead of strict priority ii) a time-based threshold for | |||
| the native L4S AQM; iii) ECN support for the Classic AQM. A recent | the native L4S AQM; iii) ECN support for the Classic AQM. A recent | |||
| evaluation has proved that a minimum ECN-marking threshold (minTh) | evaluation has proved that a minimum ECN-marking threshold (minTh) | |||
| skipping to change at page 58, line 45 ¶ | skipping to change at page 57, line 37 ¶ | |||
| 13: continue % continue to the top of the while loop | 13: continue % continue to the top of the while loop | |||
| 14: } | 14: } | |||
| 15: mark(pkt) | 15: mark(pkt) | |||
| 16: } | 16: } | |||
| 17: } | 17: } | |||
| 18: return(pkt) % return the packet and stop here | 18: return(pkt) % return the packet and stop here | |||
| 19: } | 19: } | |||
| 20: return(NULL) % no packet to dequeue | 20: return(NULL) % no packet to dequeue | |||
| 21: } | 21: } | |||
| Figure 11: Optimised Example Dequeue Pseudocode for DualQ Coupled | Figure 11: Optimised Example Dequeue Pseudocode for DualQ Coupled AQM | |||
| AQM using Integer Arithmetic | using Integer Arithmetic | |||
| The two ranges, range_L and range_C are expressed as powers of 2 so | The two ranges, range_L and range_C are expressed as powers of 2 so | |||
| that division can be implemented as a right bit-shift (>>) in lines 5 | that division can be implemented as a right bit-shift (>>) in lines 5 | |||
| and 10 of the integer variant of the pseudocode (Figure 11). | and 10 of the integer variant of the pseudocode (Figure 11). | |||
| For the integer variant of the pseudocode, an integer version of the | For the integer variant of the pseudocode, an integer version of the | |||
| rand() function used at line 25 of the maxrand(function) in Figure 10 | rand() function used at line 25 of the maxrand(function) in Figure 10 | |||
| would be arranged to return an integer in the range 0 <= maxrand() < | would be arranged to return an integer in the range 0 <= maxrand() < | |||
| 2^32 (not shown). This would scale up all the floating point | 2^32 (not shown). This would scale up all the floating point | |||
| probabilities in the range [0,1] by 2^32. | probabilities in the range [0,1] by 2^32. | |||
| skipping to change at page 60, line 38 ¶ | skipping to change at page 59, line 34 ¶ | |||
| operator needs to decide at what base RTT it wants L4S and Classic | operator needs to decide at what base RTT it wants L4S and Classic | |||
| flows to have roughly equal throughput, once the effect of the | flows to have roughly equal throughput, once the effect of the | |||
| additional Classic queue on Classic throughput has been taken into | additional Classic queue on Classic throughput has been taken into | |||
| account. With this approach, a network operator can determine a good | account. With this approach, a network operator can determine a good | |||
| coupling factor without knowing the precise L4S algorithm for | coupling factor without knowing the precise L4S algorithm for | |||
| reducing RTT-dependence - or even in the absence of any algorithm. | reducing RTT-dependence - or even in the absence of any algorithm. | |||
| The following additional terminology will be used, with appropriate | The following additional terminology will be used, with appropriate | |||
| subscripts: | subscripts: | |||
| r: Packet rate [pkt/s] | r: Packet rate [pkt/s] | |||
| R: RTT [s/round] | R: RTT [s/round] | |||
| p: ECN marking probability [] | p: ECN marking probability [] | |||
| On the Classic side, we consider Reno as the most sensitive and | On the Classic side, we consider Reno as the most sensitive and | |||
| therefore worst-case Classic congestion control. We will also | therefore worst-case Classic congestion control. We will also | |||
| consider Cubic in its Reno-friendly mode ('CReno'), as the most | consider Cubic in its Reno-friendly mode ('CReno'), as the most | |||
| prevalent congestion control, according to the references and | prevalent congestion control, according to the references and | |||
| analysis in [PI2param]. In either case, the Classic packet rate in | analysis in [PI2param]. In either case, the Classic packet rate in | |||
| steady state is given by the well-known square root formula for Reno | steady state is given by the well-known square root formula for Reno | |||
| congestion control: | congestion control: | |||
| r_C = 1.22 / (R_C * p_C^0.5) (5) | r_C = 1.22 / (R_C * p_C^0.5) (5) | |||
| skipping to change at page 65, line 7 ¶ | skipping to change at page 64, line 7 ¶ | |||
| Henderson <tomh@tomh.org> updated that earlier model and created a | Henderson <tomh@tomh.org> updated that earlier model and created a | |||
| model for the DualQ variant specified as part of the Low Latency | model for the DualQ variant specified as part of the Low Latency | |||
| DOCSIS specification, as well as conducting extensive evaluations. | DOCSIS specification, as well as conducting extensive evaluations. | |||
| Ing Jyh (Inton) Tsang of Nokia, Belgium built the End-to-End Data | Ing Jyh (Inton) Tsang of Nokia, Belgium built the End-to-End Data | |||
| Centre to the Home broadband testbed on which DualQ Coupled AQM | Centre to the Home broadband testbed on which DualQ Coupled AQM | |||
| implementations were tested. | implementations were tested. | |||
| Authors' Addresses | Authors' Addresses | |||
| Koen De Schepper | Koen De Schepper | |||
| Nokia Bell Labs | Nokia Bell Labs | |||
| Antwerp | Antwerp | |||
| Belgium | Belgium | |||
| Email: koen.de_schepper@nokia.com | ||||
| URI: https://www.bell-labs.com/about/researcher-profiles/ | Email: koen.de_schepper@nokia.com | |||
| koende_schepper/ | URI: https://www.bell-labs.com/about/researcher-profiles/koende_schepper/ | |||
| Bob Briscoe (editor) | Bob Briscoe (editor) | |||
| Independent | Independent | |||
| United Kingdom | UK | |||
| Email: ietf@bobbriscoe.net | Email: ietf@bobbriscoe.net | |||
| URI: https://bobbriscoe.net/ | URI: https://bobbriscoe.net/ | |||
| Greg White | Greg White | |||
| CableLabs | CableLabs | |||
| Louisville, CO, | Louisville, CO | |||
| United States of America | US | |||
| Email: G.White@CableLabs.com | Email: G.White@CableLabs.com | |||
| End of changes. 125 change blocks. | ||||
| 201 lines changed or deleted | 192 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||