How It Works: Tunnels, Path Selection, and Dynamic Path Control

Learning Objectives

Section 1: Building the Fabric

Pre-Quiz — Test Your Knowledge

1. In the three-layer EdgeConnect model, what does the fabric layer actually consist of?

The physical MPLS, broadband, and LTE circuits bought from carriers Automatically built IPsec tunnels between sites, typically one per underlay per site pair The Business Intent Overlays that assign SLA and path policy to applications The Orchestrator management plane that pushes configuration

2. Why does keeping a tunnel over every transport up continuously matter for failover later on?

It lets Orchestrator skip the auto-discovery step entirely It means a switch to a backup path needs no IKE renegotiation or routing reconvergence — only a forwarding-decision change It reduces the number of tunnels the network has to maintain It forces all traffic onto MPLS by default for predictability

3. A retailer has 200 stores that mostly talk to two data centers and rarely to each other. Which topology fits best, and why?

Full mesh, because every site should have a direct tunnel to every other site Hub-and-spoke, because traffic is centralized and it keeps tunnel counts manageable Regional mesh, because the stores are geographically close No topology — tunnels should be configured by hand per store

4. What is the role of auto-discovery in Orchestrator?

It scans the Internet for unknown EdgeConnect appliances to add automatically It takes your expressed topology intent and computes which sites peer with which, over which labels It discovers application traffic and assigns it to overlays It measures loss and latency to discover the healthiest path

5. Once the fabric is built, what decides which tunnel actually carries a given flow's packets?

The routing protocol (OSPF or BGP) that advertised the LAN networks Dynamic Path Control, not the routing protocol The carrier's MPLS provider edge Whichever tunnel was built first during IKE negotiation

Key Points

  • EdgeConnect organizes the network into three layers: underlay (physical labeled circuits), fabric (auto-built IPsec tunnels), and overlays (Business Intent policy).
  • The fabric typically holds one IPsec tunnel per underlay per site pair — a branch with MPLS and Internet keeps two live, monitored tunnels to its hub.
  • Orchestrator auto-discovers peerings from your chosen topology and builds/maintains every tunnel automatically — no manual per-pair configuration.
  • Topology choices — hub-and-spoke, full mesh, regional mesh — trade direct connectivity against tunnel count.
  • Because every transport's tunnel stays up continuously, the system is always ready to switch paths without renegotiating anything — the foundation of fast failover.

Before EdgeConnect can make any clever forwarding decisions, it needs something to forward across. That "something" is the SD-WAN fabric: a web of encrypted tunnels connecting your sites over whatever physical circuits you happen to own.

Three layers: underlay, fabric, and overlays

It helps to picture the architecture as three stacked layers:

LayerWhat it isExample
UnderlayThe physical WAN circuits you buy from carriers, each tagged with a labelMPLS, INET (broadband/DIA), LTE/5G
FabricA set of IPsec tunnels built between sites over those circuits — typically one tunnel per underlay per site pairBranch↔Hub over MPLS and over Internet
OverlaysLogical "virtual WANs" — Business Intent Overlays — that group applications and assign SLA and path policy"RealTime" overlay for voice, "BestEffort" for backups

A useful analogy: the underlay is the set of physical roads (a toll highway, a free surface street, a backup gravel road). The fabric is the set of armored delivery routes you've pre-mapped on those roads. The overlays are your shipping policies — "perishable goods take the fastest clean route; bulk freight takes whatever is cheapest." Dynamic Path Control is the dispatcher who reads the policy and assigns each shipment to a route in real time.

An IPsec tunnel is an encrypted, authenticated connection between two appliances that protects traffic as it crosses an untrusted network like the public Internet. EdgeConnect builds these tunnels automatically and keeps them up continuously — the secret behind its fast failover.

Auto-discovery and automatic tunnel creation

In a traditional WAN an engineer configures every tunnel by hand. For a 100-site mesh that is thousands of tunnels and a configuration nightmare. EdgeConnect replaces this with orchestration:

  1. Site definition and WAN uplinks. Each appliance registers with Orchestrator. You declare each WAN interface, give it a label (MPLS, INET, LTE), and set addressing (static, DHCP, or behind NAT).
  2. Topology selection. You choose full mesh, hub-and-spoke, or regional hubs. Orchestrator computes which sites tunnel to which, over which labels — the auto-discovery step.
  3. Automatic IPsec tunnel creation. Orchestrator pushes endpoints, IKE/IPsec parameters, and NAT-traversal settings; appliances negotiate IKE, build the security associations, and register each tunnel as a logical "WAN path."
  4. Tunnel health monitoring. Every tunnel is then continuously monitored for latency, loss, jitter, and availability — the live data that feeds path selection.
  5. Route distribution. LAN networks learned via static, OSPF, or BGP are advertised across the fabric — but DPC, not the routing protocol, decides which tunnel carries which flow.

The result: between any two sites you typically have multiple parallel IPsec tunnels, one per underlay, all up and all monitored at once.

Figure 3.1: The three-layer EdgeConnect model.

graph TD subgraph Overlays["Overlays (Business Intent)"] RT["RealTime overlay (voice/video)"] BE["BestEffort overlay (backups)"] end subgraph Fabric["Fabric (IPsec tunnels)"] Branch["Branch appliance"] Hub["Hub appliance"] end subgraph Underlay["Underlay (physical circuits)"] MPLS["MPLS circuit"] INET["INET broadband circuit"] LTE["LTE/5G circuit"] end RT -.rides across.-> Branch BE -.rides across.-> Branch Branch ==IPsec tunnel over MPLS==> Hub Branch ==IPsec tunnel over INET==> Hub Branch -.runs on.-> MPLS Branch -.runs on.-> INET Branch -.runs on.-> LTE

Topology options

TopologyHow tunnels are builtBest forTrade-off
Hub-and-spokeEach branch tunnels only to one or more hubs; branch-to-branch transits a hubCentralized apps; simple, few tunnelsBranch-to-branch takes an extra hop (added latency)
Full meshEvery site tunnels directly to every other siteHeavy site-to-site traffic (VoIP, collaboration)Tunnel count grows with the square of site count
Regional meshSites mesh within a region; regions connect through regional hubsLarge, distributed networksBalances direct connectivity against tunnel scale

Animation: The three-layer fabric comes online

Underlay (physical circuits) MPLS INET broadband LTE / 5G Fabric (IPsec tunnels) Branch Hub tunnel over MPLS tunnel over INET Overlays (Business Intent) RealTime (voice / video) BestEffort (backups)

Circuits power up first, IPsec tunnels stitch the fabric together, then overlays ride across. Tunnels over MPLS and INET both stay up at once.

Post-Quiz — Check Your Understanding

1. In the three-layer EdgeConnect model, what does the fabric layer actually consist of?

The physical MPLS, broadband, and LTE circuits bought from carriers Automatically built IPsec tunnels between sites, typically one per underlay per site pair The Business Intent Overlays that assign SLA and path policy to applications The Orchestrator management plane that pushes configuration

2. Why does keeping a tunnel over every transport up continuously matter for failover later on?

It lets Orchestrator skip the auto-discovery step entirely It means a switch to a backup path needs no IKE renegotiation or routing reconvergence — only a forwarding-decision change It reduces the number of tunnels the network has to maintain It forces all traffic onto MPLS by default for predictability

3. A retailer has 200 stores that mostly talk to two data centers and rarely to each other. Which topology fits best, and why?

Full mesh, because every site should have a direct tunnel to every other site Hub-and-spoke, because traffic is centralized and it keeps tunnel counts manageable Regional mesh, because the stores are geographically close No topology — tunnels should be configured by hand per store

4. What is the role of auto-discovery in Orchestrator?

It scans the Internet for unknown EdgeConnect appliances to add automatically It takes your expressed topology intent and computes which sites peer with which, over which labels It discovers application traffic and assigns it to overlays It measures loss and latency to discover the healthiest path

5. Once the fabric is built, what decides which tunnel actually carries a given flow's packets?

The routing protocol (OSPF or BGP) that advertised the LAN networks Dynamic Path Control, not the routing protocol The carrier's MPLS provider edge Whichever tunnel was built first during IKE negotiation

Section 2: Dynamic Path Control (DPC)

Pre-Quiz — Test Your Knowledge

1. A Business Intent Overlay (BIO) primarily expresses which of the following?

The IKE encryption parameters used to build each IPsec tunnel A policy: what traffic matches, which transports it may use, its SLA thresholds, and forwarding behavior The physical wiring diagram of the carrier circuits A fixed static route table distributed by BGP

2. With per-flow steering (the default), how are packets of a single flow handled?

Every packet is independently hashed to a different tunnel The whole flow is pinned to one best tunnel until it fails or breaks SLA, preserving packet order Packets are duplicated onto all tunnels simultaneously The flow is dropped if more than one tunnel meets the SLA

3. Why is per-flow steering the sensible default for TCP and most transactional apps?

It guarantees the highest possible aggregate bandwidth per flow Because all packets take the same path, they arrive in order — which TCP prefers It encrypts each packet with a different key for security It avoids the need to measure loss and latency

4. EdgeConnect distinguishes a brownout from a blackout. What defines a brownout?

Probes stop returning entirely and the tunnel is declared down The tunnel is still up and passing traffic, but a metric (loss/latency/jitter) breaches the SLA — marked degraded The appliance loses power and reboots Orchestrator pushes a new configuration

5. When DPC detects a brownout on the path a voice flow is using, what is the typical behavior?

All flows are dropped until the path recovers New flows are steered to a healthier transport, and real-time flows like voice can migrate almost instantly Traffic is paused while a new IPsec tunnel is negotiated The degraded path is permanently removed and never reused

Key Points

  • Dynamic Path Control continuously matches each flow to a Business Intent Overlay, filters tunnels to those allowed and meeting the SLA, ranks survivors, and steers traffic to the best path.
  • A Business Intent Overlay (BIO) states: what traffic matches, which transports it may use (priority/roles), its SLA thresholds, and forwarding behavior (per-flow vs per-packet, FEC, QoS).
  • Per-flow steering (default) pins a 5-tuple flow to one tunnel so packets arrive in order — ideal for TCP. Per-packet steering (bonding) stripes packets for bandwidth/resiliency but can reorder them.
  • DPC keeps a real-time, per-tunnel, per-direction picture of loss, latency, and jitter from timestamped probes and overlay sequence numbers.
  • A hard blackout (probes stop) removes the tunnel from the candidate set; a brownout (SLA breached but tunnel up) marks it degraded and re-steers accordingly.

With the fabric in place, the question becomes: of the several tunnels available between two sites, which should carry a given packet right now? Answering continuously is the job of Dynamic Path Control — the engine that matches application traffic to policy and steers it onto the tunnel that currently meets the application's needs.

Business Intent Overlays: expressing what you want

DPC steers according to Business Intent Overlays (BIOs). Each BIO specifies:

You describe business intent once ("voice must stay under 80 ms and under 1% loss, preferring MPLS"), and DPC enforces it everywhere, automatically choosing tunnels to satisfy it.

Per-flow versus per-packet steering

Per-flow (default). A flow is identified by its 5-tuple. DPC filters out disallowed or SLA-violating tunnels, ranks the survivors, then hashes each new flow to the single best tunnel and pins it there until that tunnel fails or breaks SLA. Same path for every packet means in-order arrival — what TCP prefers.

Per-packet (tunnel bonding). DPC stripes the packets of a single flow across multiple tunnels at once for bandwidth and resiliency, but packets can arrive out of order, so the receiver must resequence them.

Per-flow (default)Per-packet (bonding)
GranularityWhole flow pinned to one tunnelIndividual packets spread across tunnels
Packet orderPreserved naturallyMust be reconstructed at the receiver
Best forTCP, transactional, most appsHigh-throughput and high-resiliency real-time
Main riskOne flow limited to one link's bandwidthReordering and added buffering on dissimilar links

Figure 3.4: The Dynamic Path Control decision flow.

flowchart TD Start(["New flow arrives"]) --> Match["Match flow to Business Intent Overlay"] Match --> Measure["Read live tunnel metrics: loss, latency, jitter"] Measure --> Filter{"Tunnels allowed by BIO and meeting SLA?"} Filter -->|"None"| Best["Fall back to least-degraded allowed tunnel"] Filter -->|"One or more"| Rank["Rank survivors by priority and health"] Rank --> Mode{"Bonding mode for overlay?"} Best --> Steer Mode -->|"Per-flow (default)"| Steer["Pin flow to single best tunnel"] Mode -->|"Per-packet (bonding)"| Stripe["Stripe packets across tunnels"] Steer --> Monitor["Monitor chosen path continuously"] Stripe --> Monitor Monitor --> Health{"Path healthy?"} Health -->|"Yes"| Monitor Health -->|"Brownout or blackout"| Measure

Brownout and blackout detection and failover

A link can fail two ways. A blackout is a hard failure: probes stop, the tunnel is declared down, and it leaves the candidate set immediately. A brownout is insidious: the tunnel is still up — probes and traffic pass — but a metric breaches the SLA (loss > 2%, latency > 150 ms). The path is marked degraded, not down, and can be direction-specific.

Each path lives in one of three states — Healthy, Degraded, or Down. New flows go to a healthier transport; real-time flows like voice can migrate almost instantly (sometimes with a brief burst of packet duplication for a hitless switch), while long-lived TCP sessions may stay put unless degradation is severe.

Figure 3.5: Path state machine.

stateDiagram-v2 [*] --> Healthy Healthy --> Degraded: SLA breached (brownout) Healthy --> Down: Probes stop (blackout) Degraded --> Down: Probes stop Degraded --> Healthy: Metrics back within SLA + hold-down elapsed Down --> Healthy: Probes return + hold-down elapsed Healthy: Healthy carries flows normally Degraded: Degraded new flows re-steered; voice migrates Down: Down removed from candidate set

Animation: DPC steers a voice flow — then re-steers on a brownout

Branch DPC Hub MPLS tunnel Healthy — 0% loss INET tunnel VoIP MPLS loss spikes to 5% → Degraded

The voice flow rides MPLS while it is healthy. When MPLS loss crosses the SLA, DPC marks it degraded and re-steers the very next packet onto the pre-built INET tunnel — no renegotiation, subsecond.

Post-Quiz — Check Your Understanding

1. A Business Intent Overlay (BIO) primarily expresses which of the following?

The IKE encryption parameters used to build each IPsec tunnel A policy: what traffic matches, which transports it may use, its SLA thresholds, and forwarding behavior The physical wiring diagram of the carrier circuits A fixed static route table distributed by BGP

2. With per-flow steering (the default), how are packets of a single flow handled?

Every packet is independently hashed to a different tunnel The whole flow is pinned to one best tunnel until it fails or breaks SLA, preserving packet order Packets are duplicated onto all tunnels simultaneously The flow is dropped if more than one tunnel meets the SLA

3. Why is per-flow steering the sensible default for TCP and most transactional apps?

It guarantees the highest possible aggregate bandwidth per flow Because all packets take the same path, they arrive in order — which TCP prefers It encrypts each packet with a different key for security It avoids the need to measure loss and latency

4. EdgeConnect distinguishes a brownout from a blackout. What defines a brownout?

Probes stop returning entirely and the tunnel is declared down The tunnel is still up and passing traffic, but a metric (loss/latency/jitter) breaches the SLA — marked degraded The appliance loses power and reboots Orchestrator pushes a new configuration

5. When DPC detects a brownout on the path a voice flow is using, what is the typical behavior?

All flows are dropped until the path recovers New flows are steered to a healthier transport, and real-time flows like voice can migrate almost instantly Traffic is paused while a new IPsec tunnel is negotiated The degraded path is permanently removed and never reused

Section 3: Tunnel Bonding and Resiliency

Pre-Quiz — Test Your Knowledge

1. In a simple 4+1 FEC scheme, what happens when a single data packet (say D3) is lost in transit?

The receiver requests a retransmission of D3 from the sender The receiver rebuilds D3 from the parity packet plus the three surviving data packets — no retransmission The whole block of four packets is discarded and re-sent TCP slow-start halves the congestion window before recovering

2. What problem does Packet Order Correction (POC) solve, and how?

It encrypts packets so they cannot be read in transit It reorders out-of-order packets using an EdgeConnect sequence ID and a resequencing buffer with a tunable reorder wait It compresses packets to fit more on a slow link It rebuilds lost packets from parity without retransmission

3. Why does out-of-order delivery hurt TCP throughput if left uncorrected?

TCP cannot decrypt packets that arrive in the wrong order TCP interprets it as loss — triggering duplicate ACKs, spurious fast-retransmits, and congestion-window cuts Out-of-order packets are always dropped by the receiver's NIC It doubles the MTU and fragments every packet

4. EdgeConnect's FEC is adaptive. What does that mean in practice?

It raises the FEC ratio when loss is high and lowers it when the link is clean, minimizing overhead It only applies FEC to TCP traffic, never UDP It switches encryption ciphers based on the time of day It disables FEC whenever any loss is detected

5. How should the POC reorder wait time generally be tuned for voice/real-time versus bulk TCP?

Longer for voice, shorter for TCP — voice tolerates delay better Shorter for voice/real-time (hates delay), longer for TCP (avoiding retransmits matters more than a few ms) The same fixed value for all traffic regardless of type Reorder wait should always be set to zero to minimize latency

Key Points

  • Tunnel bonding uses multiple IPsec tunnels per overlay on a per-packet basis — either for bandwidth aggregation or high availability (data on one link, parity on another).
  • FEC sends redundant parity packets in parallel with the data so the receiver can rebuild a lost packet (e.g. 4+1, XOR) with no retransmission and no TCP slow-start penalty.
  • FEC is adaptive — ratio rises with loss, falls when clean; both sliders at 100% approximates 1:1 HA FEC, at 0% disables it. Single-tunnel overhead caps near 20%.
  • POC resequences out-of-order packets using each packet's own EdgeConnect sequence ID and a tunable, RTT-based reorder wait time, sparing TCP from spurious retransmits.
  • Together, FEC + POC = path conditioning; independent Miercom testing preserved acceptable voice MOS at roughly 55% aggregate underlay loss.

Tunnel bonding policy modes

Bonding goalWhat it doesWhen to use
Bandwidth aggregationStripes a flow's packets across links to sum throughput — two 100-Mbps links let one flow approach ~200 MbpsBackups, replication, bulk transfers
High availability (HA)Sends data on one link and FEC/parity on another so loss on either is recoverableCritical real-time traffic that must survive a degraded link

Trade-offs are real: striping across links of different latency increases reordering, adds buffering delay, and parity/duplication add bandwidth overhead. So bonding is reserved for flows that genuinely benefit.

Forward Error Correction (FEC)

FEC trades a little extra bandwidth for a lot of reliability. The sender groups packets into blocks, computes one or more redundant parity packets per block, and transmits the parity alongside the originals — in parallel with the data, so the receiver reconstructs a lost packet without waiting for retransmission.

Mechanism (illustrative). Take D1–D4 and compute parity P (think XOR of the four). Transmit D1, D2, D3, D4, P. If one Dx is lost but P and the other three arrive, the receiver rebuilds the missing packet — no retransmission, no slow-start penalty. A 4+1 scheme cannot recover two losses in one block, which is why ratios are adjustable (4+1, 8+2).

FEC is adaptive: it raises the ratio when loss is high and lowers it when clean. Both sliders at 100% approximates HA-style 1:1 FEC; both at 0% disables it. With a single tunnel, FEC overhead caps around 20%. FEC pays off most on Internet/LTE/satellite at ~1–5% loss and for real-time UDP where retransmission is useless.

Figure 3.2: FEC reconstructing a lost packet.

sequenceDiagram participant S as Sender appliance participant W as WAN path participant R as Receiver appliance participant L as LAN S->>W: D1 S->>W: D2 S->>W: D3 S->>W: D4 S->>W: P (parity = XOR of D1..D4) W->>R: D1 W->>R: D2 Note over W: D3 dropped in transit (X) W->>R: D4 W->>R: P Note over R: Rebuild D3 from P + D1, D2, D4 R->>L: Deliver D1, D2, D3, D4 in order Note over R,L: No retransmission requested

Animation: FEC rebuilds a dropped packet from parity

Sender Receiver + LAN WAN path D1 D2 D3 D4 P × D3 dropped D3 rebuilt from P No retransmission requested

D1–D4 and parity P leave the sender. D3 is dropped mid-path. The receiver XORs P with the three survivors to regenerate D3 and delivers all four in order — no round-trip back to the sender.

Packet Order Correction (POC)

FEC fixes lost packets. POC fixes out-of-order packets — an unavoidable side effect of per-packet steering across links of different latency, ECMP, and asymmetric routing. To TCP, out-of-order delivery looks like loss: duplicate ACKs, spurious fast-retransmits, needless congestion-window cuts.

POC uses a resequencing buffer on the receiver. EdgeConnect stamps each overlay packet with its own sequence ID — independent of IP/TCP numbers — caches out-of-order packets, and uses a configurable, RTT-based reorder wait time before giving up on a late packet. Packets are then delivered to the LAN in order.

Example. Path A has 20 ms latency, Path B has 60 ms. The sender stripes 1→A, 2→B, 3→A; they arrive 1, 3, 2. Without POC, TCP may fast-retransmit. With POC, the receiver holds 3, waits briefly for 2, then delivers 1, 2, 3 — trading a few ms of buffering for a clean stream.

Tuning is a balance: too short and reordering leaks through; too long and added latency hurts interactive traffic. Rule of thumb: a shorter wait for voice/real-time, a longer wait for TCP and high-throughput flows.

Animation: POC resequences packets that arrive out of order

Receiver POC buffer LAN Path A 20ms Path B 60ms 1 3 2 holding 3, waiting for 2… 1 2 3 Delivered to LAN in order: 1, 2, 3

Packets arrive 1, 3, 2 because Path B is slower. POC buffers 3, waits briefly for the late 2, then releases 1, 2, 3 to the LAN in order — sparing TCP from a spurious fast-retransmit.

Path Conditioning: FEC and POC together

FEC and POC together are path conditioning, applied inside the overlay to make a messy WAN path look clean. On a typical Internet path, loss and reordering coexist; FEC handles loss, POC handles reordering, and endpoints experience something close to a private, in-order WAN.

The independent proof point comes from Miercom: path conditioning preserved acceptable voice MOS with up to roughly 55% aggregate underlay loss (50% on one link, 5% on another), because the parity stream on the second link reconstructed what the first dropped.

Post-Quiz — Check Your Understanding

1. In a simple 4+1 FEC scheme, what happens when a single data packet (say D3) is lost in transit?

The receiver requests a retransmission of D3 from the sender The receiver rebuilds D3 from the parity packet plus the three surviving data packets — no retransmission The whole block of four packets is discarded and re-sent TCP slow-start halves the congestion window before recovering

2. What problem does Packet Order Correction (POC) solve, and how?

It encrypts packets so they cannot be read in transit It reorders out-of-order packets using an EdgeConnect sequence ID and a resequencing buffer with a tunable reorder wait It compresses packets to fit more on a slow link It rebuilds lost packets from parity without retransmission

3. Why does out-of-order delivery hurt TCP throughput if left uncorrected?

TCP cannot decrypt packets that arrive in the wrong order TCP interprets it as loss — triggering duplicate ACKs, spurious fast-retransmits, and congestion-window cuts Out-of-order packets are always dropped by the receiver's NIC It doubles the MTU and fragments every packet

4. EdgeConnect's FEC is adaptive. What does that mean in practice?

It raises the FEC ratio when loss is high and lowers it when the link is clean, minimizing overhead It only applies FEC to TCP traffic, never UDP It switches encryption ciphers based on the time of day It disables FEC whenever any loss is detected

5. How should the POC reorder wait time generally be tuned for voice/real-time versus bulk TCP?

Longer for voice, shorter for TCP — voice tolerates delay better Shorter for voice/real-time (hates delay), longer for TCP (avoiding retransmits matters more than a few ms) The same fixed value for all traffic regardless of type Reorder wait should always be set to zero to minimize latency

Your Progress

Answer Explanations