← course  ·  MODULE 7 · THEORY + KPI OPTIMIZATION SCENARIOS

The Sound Itself

Voice quality end to end: the codec-rate loop the RAN can close, RLC's two worldviews, the 280-millisecond corridor and its queue defenders, packet-loss forensics, and the transport boundary where DSCP promises are proven — with five KPI-optimization case studies.

No fake things. Every FAJ/CXC, MO attribute, threshold, and feature state below is from the Ericsson CPI (LTE RAN 25.Q4.4), the live NYC node kget (25.Q3), the operator golden file, and the LTE PM counter inventory. 3GPP citations name exact specifications.

CH 1The loop the RAN can close

The phones negotiate the codec (Module 1's SDP); the network knows the radio. Classically the codec adapts blindly — after damage. VoLTE Rate Recommendation (FAJ 121 5014) closes the loop: the eNodeB sends in-band bit-rate recommendations (TS 36.321 MAC CE) and AMR-WB/EVS shift gears mid-call, per direction — down before the cell edge bites, up when headroom returns. The inversion worth teaching: a 23.85 kbps stream losing 8% sounds far worse than a 12.65 kbps stream losing nothing; downshifting is a quality feature, and the lower rate also survives deeper fades — retainability engineering disguised as audio engineering. On this node: bitRateRecommendationEnabled=false on the qci1 profile — staged dark, license-gated per the MOM, reopened by edge-concentrated loss evidence.

CH 2RLC — two worldviews

AM spends time to guarantee delivery (status reports, polls, retransmission); UM spends packets to guarantee time (numbering, a window, and nothing else). Voice can only choose UM: a frame delivered late by an RLC retry cycle is a dead frame the de-jitter buffer discards — integrity is timely delivery. SIP rides AM, where persistence is right. The numbers are cut to the deadline: voice runs 10-bit RLC / 12-bit PDCP sequence numbers (small, frequent packets; headers matter at 30-byte payloads — Module 4's ROHC arithmetic), data runs 16/15 for CA-scale flight volumes, and tReorderingUl/Dl=60 ms is sized to outlast the 7-rung HARQ ladder (7×8 ms) — barely, deliberately. The AM refinements on the SIP bearer (Adaptive RLC Poll-Retransmission, Load-Based UL RLC Retx Threshold) tune persistence's enthusiasm without abandoning it.

Case law

The mode someone “improved”: QCI-1 set to AM made the loss KPI fall and complaints explode — frames arrived embalmed, 60–100 ms late, and the de-jitter buffer stretched then dropped them. Loss improved + jitter exploded = wrong ruler. The review checklist owns this rule now.

CH 3The 280 ms corridor

Mouth-to-ear at ~280 ms conversation begins to die (ITU-T G.114's working region). The corridor is shared: codec framing ×2 (~20 ms + encode), the de-jitter buffer's deliberate hold, core + transport transit (possibly ×2 operators), and two radio legs inheriting the remainder — which is why Module 1 gave the eNodeB pdb=80. Jitter is the hidden tax: low average + high variance forces deep buffers, and a packet beyond the buffer's patience is functionally lost however alive. The defender is per-service AQM, configured on this node exactly as the physics demand: aqmMode=MODE2 on qci1 with pdbOffset=50 (deadline-respecting voice variant), OFF on qci5 (never drop SIP to manage latency — AM retransmits anyway), MODE1 on qci9 (classic early-drop; TCP understands loss). Three relationships with time, live in the kget.

The diagnostic discipline is subtraction: radio variance from HARQ/scheduler statistics, transport from TWAMP, the residual attributed honestly to the far side. The bufferbloat case is the genre: talk-over at 300 ms, radio slices normal, one new router buffering deeply mid-corridor — no loss, beautiful throughput, +80 ms queue under load. Voice quality is a supply chain; any unmanaged queue anywhere taxes every call.

CH 4Loss forensics

The first forensic question is where, because why has different suspects per surface: pmPdcpPktLostUlQci[1] (uplink air — power, adaptation, interference), pmPdcpPktDiscDlPelrQci[1] (in-node discard — queues, AQM, congestion with timestamps), pmPdcpPktDiscDlPelrUuQci[1] (downlink air — transmitted, never acknowledged). Traces convert ratios into stories: loss + collapsing CQI = coverage/interference; loss + healthy CQI + DCI misses = the control channel (Module 4's paradox); loss clustered in handover windows = mobility timing (Module 6). And the ear hears patterns, not percentages: concealment absorbs isolated loss, three consecutive frames is a syllable's corpse — the same 1% can be inaudible scattered or infuriating clustered. Report surface, rate, burstiness, and a labeled perceptual band; let MOS frameworks rank, never replace, the evidence.

CH 5The transport boundary

The radio's promises end at the cabinet door. The markings as a system: dscp 40 (voice — expedited), 26 (SIP — assured), 10 (data — best-effort intent), with the per-ARP refinement staged unused (dscpArpMap all −1 — emergency could mark differently). The failure mode is silent demotion: one re-marking interface and voice queues behind bulk — invisible until congestion. TWAMP (RFC 5357) is the proof: timestamped probes per marking class, compared under load, per segment. Three rules make it useful: probes must carry the markings (unmarked tests test nothing about voice), windows must be busy-hour (empty networks honor everything), sessions must be per-segment (end-to-end detects, segments diagnose). The maintenance-window case: a vendor swap shipped default queues collapsing all DSCP into one — the radio saw clean air and rising in-node discards; the TWAMP differential told within an hour. The differential is a permanent canary, not a one-time test.

CH 6Five KPI-optimization case studies

CASE 1

Edge loss that concealment can't hide — opening the rate loop

KPI symptom
Commute-corridor complaints (“breaking up”) before SRVCC; edge cells chew full-rate streams.
Evidence
DL air loss concentrated at low-RSRP bins; burst patterns defeating concealment; codec stuck at 23.85 kbps to the bitter end (SIP traces).
Root cause
No in-call rate adaptation signal — the loop is open, bitRateRecommendationEnabled=false.
Action
License check first (FAJ 121 5014 is gated), then Module-8 trial on corridor cells: enable on the qci1 profile.
Verification
Recommendation activity counters move; edge loss ratios fall at constant traffic; complaint phrase shifts from “breaking up” to “nothing notable”. SRVCC fires from stable calls.
Rollback
One profile boolean.
CASE 2

The AQM mode matrix audit

KPI symptom
Sister market: SIP timeouts under congestion (setup failures), voice fine.
Evidence
Config diff vs this node's matrix: their qci5 runs MODE1 — AQM dropping SIP to manage latency; AM retransmits, INVITEs survive but late, ladders time out.
Root cause
Latency management applied to a bearer whose service needs delivery, not deadline.
Action
qci5 aqmMode=OFF (this node's posture); keep MODE2+offset50 on qci1, MODE1 on qci9.
Verification
Setup SR under congestion recovers; per-service AQM matrix enters their golden file.
Rollback
Per-QCI attribute.
CASE 3

Two cells, one loss number, opposite verdicts

KPI symptom
Two cells at identical 1% voice loss; one generates fury, the other silence.
Evidence
Burstiness split: cell A loses isolated frames (concealment absorbs); cell B loses 4–8 in bursts every few minutes — a reflecting crane, audible dropouts.
Root cause
The KPI cannot see clustering; the ear hears nothing else.
Action
Urgent interference action on B; routine schedule for A. Burstiness joins the integrity reporting permanently.
Verification
Complaint rates diverge from the (still identical) loss ratio — proving the ruler change.
Rollback
n/a.
CASE 4

The router that confessed to TWAMP

KPI symptom
Inter-market calls develop talk-over; mouth-to-ear estimated ~300 ms; both radio ends exonerated by their own statistics.
Evidence
Segment TWAMP: one transport hop adds +80 ms queueing under load, zero loss — bufferbloat signature (delay rises, loss doesn't).
Root cause
A new router's deep unmanaged buffer mid-corridor, undoing the radio's AQM discipline two hops away.
Action
Transport team fixes the queue config — the conversation starts with timestamps, not suspicion.
Verification
Segment delay distribution collapses; talk-over complaints cease.
Rollback
n/a (transport-side).
CASE 5

Standing up the demotion canary

KPI symptom
None — pre-emptive, after the vendor-swap incident (CH 5's case).
Evidence
The incident's timeline: queue collapse at maintenance, detection via complaint volume days later, radio KPIs blind throughout.
Root cause
No continuous per-class transport measurement existed.
Action
Permanent TWAMP sessions per marking class (40 vs 10) per transport segment; the 40-vs-10 differential under load becomes a war-room tile with an amber threshold (Module 8).
Verification
The next demotion event alerts within the hour — the canary's value is measured in detection latency.
Rollback
n/a.

References