← course  ·  MODULE 8 · THEORY + KPI OPTIMIZATION SCENARIOS

The Instruments

Measure, trial & test: the counter bible's four questions, the war-room dashboard as a theory of what matters, traces as biography, the feature-trial protocol, the golden audit industrialized, the acceptance test, and the Big Four playbooks — with five KPI-optimization case studies woven through the method.

No fake things. Counter names are from the live LTE PM inventory (7,856 MeasurementTypes on this node); audit numbers are the real kget (25,224 MOs) against the real golden file (6,027 rows, 533 feature activations).

CH 1Four questions, four shelves

7,856 MeasurementTypes file under four questions. Can calls start — RRC establishment by cause (mo-VoiceCall visible at the door), the pmErabEstabAtt/SuccAddedQci[1] bearer pair, and the refusal ledger (GBR fails, per-ARP admission rejects, license rejections — each a different closed door, diagnosis pre-sorted). Do they survive — the release taxonomy (normal · MME · abnormal, with the Act qualifier and handover exclusions making the drop surgical), normalized by pmSessionTimeDrbQci (drops per exposure minute — the truth percentages hide), plus the handover and SRVCC per-phase shelves. Was the sound good — the three loss surfaces (Module 7), mean HARQ transmissions as the leading indicator, CQI/SINR distributions as MOS inputs. Did the machinery engage — bundling engagement, prescheduling activity, paging discards, rate-recommendation events: activation without engagement evidence is decoration, and trials live on this shelf.

The reading law

Volume before ratio · definition before comparison · bins before aggregates. (counterActiveMode changes what “active” means — check before comparing across vendors' dashboards.)

CH 2The dashboard as theory

Three layers, three decisions: network (aggregated families — investment decisions), cell (distributions, never averages — one disaster vanishes into 500 healthy cells; dispatch decisions), hour (ceilings have calendars — a schedule is half a diagnosis). Tiles carry their disciplines: attempt-volume sparklines under every accessibility ratio, both retainability forms side by side, the three loss surfaces unblended with the mean-HARQ leading tile beside them, and the stealth tiles (paging discards, engagement deltas). The threshold contract: green from each tile's own history, amber looks, red pages a human and names its playbook — a red without a playbook is anxiety as a service. The anti-patterns are refused by name: the single blended voice score, the average without distribution, the ratio celebrating a redial storm. And the instrument evolves by its failures: any complaint that beats the dashboard triggers an audit of the dashboard.

CH 3Traces — the biography

Counters locate; traces explain. Cell traces net all users on chosen cells; UE traces follow chosen subscribers; PM-initiated measurements commission the fleet as a drive test without trucks. The reading method is altitude discipline at microscope scale: skeleton from signalling first, bookmarks at anomalies, radio events only in the bookmarked windows — never start by scrolling ten thousand radio events. The riverfront case is the genre: a drop spike with no cell pattern, one traced evening hour, 31 of 40 drops sharing one shape (A3 ping-pong then RLF), one relation-scoped fix. A thousand statistics often hide one biography told a thousand times.

CH 4The trial protocol

Five phases, pre-committed. Phase 0: hypothesis as numbers, matched control cells, abort gates, rollback rehearsed, interactions cleared in config. Phase 1: a full baseline week — its product is distributions, not anecdotes; neighbors are guard rails. Phase 2: activation verified thrice — FeatureState (licensed + activated + operable), engagement counters moving, configuration on every trial cell. Phase 3: hands off; abort conditions act without debate, improvement curiosity waits; every intervention restarts the clock. Phase 4: three honest outcomes — accept scoped (golden updated through process), reject documented (a clean negative is knowledge), extend sharpened. The five sins it refuses: uncontrolled rollout, moving targets, confounded second changes, orphan trials, unverified activation.

CH 5The golden audit, industrialized

Policy vs photograph: 25,224 live objects against 6,027 golden rows + 533 activation rows. The machine: normalize enums and user-vs-internal forms, join on canonical keys (MO class + attribute + struct member, per instance), evaluate exceptions before flagging (and audit the exceptions in reverse — stale ones rot), output three populations: matches, deviations, and the ungoverned (policy gaps, reported, never silently ignored). Rank by blast radius and service impact; every deviation gets one of three verdicts — fix (drift; restore through change process — the majority), challenge (the golden may be stale; file with evidence), investigate (correlates with a KPI anomaly; route to the playbooks). Never silent mass reversion: a deviation is a question. Cadence: monthly, post-upgrade, pre-trial — trials on drifted config measure noise. The archaeology case: 40 cells' coherent deviation pattern traced to a festival config three years old, rollback never executed — every temporary config now registers an expiry. Temporary is a date, not an intention.

CH 6The acceptance test

Acceptance is sampling the failure modes, not exhausting them: fire each mechanism once, deliberately, instrumented. Five blocks — the setup matrix (MO/MT × idle/connected; ladders clean, causes correct, paging exercised), the long call (30+ minutes, silences scripted; HARQ flat, the DRX rhythm visible, the inactivity machinery's patience holding), the mobility routes (repeatable streets forcing the cascade, SRVCC at the true edge), the edge verification (defenses engage in order; staged features earn their keep or expose their absence), the quality sample (labeled scores, TWAMP, codec negotiation verified in SDP against what marketing promised). Criteria are numbers agreed before the test; failures get verdicts; the report is a filed, versioned artifact that seeds the site's baselines. The refusal case: calendar pressure said “accept with a note”; the team found the missing A5 stair instead, and three months later the transition carried thousands of calls flawlessly. Problems found by tests are cheap; found by customers, expensive — refusal is what the signature is worth.

CH 7The Big Four

The course indexed by symptom, executable at 3 a.m. Accessibility dip: volume first (attempts collapsed = upstream — paging health, barring state), then the refusal ledger (GBR → corridor calendar; ARP → policy; license → escalate), else trace the ladders. Drop spike: twin-spike check (with attempts = redial storm — release policy, not radio), then geography (everywhere = software/core; cluster = mobility surfaces; one cell = radio forensics). One-way audio: direction first (“they can't hear me” = your uplink), surface second (clean surfaces = transport + core; escalate with the radio exonerated by TWAMP), the glass-tower asymmetry where surfaces do show loss. MOS degradation: leading indicators (mean HARQ + delay percentiles — lateness before loss), burst patterns over rates, jitter legs through the corridor, and the population split (one OEM / band / EN-DC state degrading alone = configuration interaction, not illness). The meta-habits wrap all four: question first, reading law always, one change at a time even under pressure, and every executed playbook ends with “would we see this faster next time?” — the playbooks and the dashboard improve each other, or both rot.

CH 8Five KPI-optimization case studies

CASE 1

The question asked backwards

KPI symptom
A new engineer pulls 200 counter series for “mid-call audio gaps” and reaches no conclusion.
Evidence
Reframed by shelf: the complaint is an integrity question — open that shelf only. DL air loss on 4 cells, HARQ climbing, CQI sagging.
Root cause
Interference (the Module 3/4 diagnosis path) — found in 20 minutes once the question chose the shelf.
Action
Interference hunt on the 4 cells; method note filed: instruments answer questions — start from the question.
Verification
Loss surface clears post-mitigation.
Rollback
n/a.
CASE 2

The dashboard's blind spot

KPI symptom
A green week, rising complaints: “can't hear the first seconds.”
Evidence
Integrity tiles were loss-only; clipped onsets are a latency-tail symptom no loss counter sees.
Root cause
The instrument, not the network: no UL delay percentile tile existed.
Action
Add UL delay percentiles + SR-to-grant tiles; retro-test against history (amber would have fired 3 days early).
Verification
The class alerts ahead of complaints thereafter. Policy: complaints that beat the dashboard audit the dashboard.
Rollback
n/a.
CASE 3

Prescheduling through the full protocol

KPI symptom
UL latency tail on SR-heavy cells (the Module 3 evidence), staged profile awaiting trial.
Evidence
Phase 0: hypothesis in 3 numbers (delay tail ↓, SR volume ↓, PUSCH cost < ceiling); DRX interaction cleared in config first.
Root cause
n/a — this is the protocol run end to end as the worked example.
Action
Baseline week → activate voltePreschedulingEnabled on trial cells (verified thrice) → hands-off observation → pre-agreed gates decide.
Verification
Engagement counters prove the machinery fired; accept/reject/extend recorded; golden updated through process.
Rollback
One boolean, rehearsed in Phase 0.
CASE 4

The audit as archaeology

KPI symptom
Quarterly audit: 40 cells share a coherent deviation pattern — too coherent for drift.
Evidence
Change archaeology: a festival config from three years prior, rollback never executed, surviving two upgrades, team disbanded. Two of its values were hurting a new band's mobility (the missing stair).
Root cause
Temporary config without an expiry.
Action
Restore through change process; institute the expiry register for every temporary config.
Verification
Next month's audit residue shrinks by the class.
Rollback
n/a — restores policy.
CASE 5

Three in the morning, executed

KPI symptom
Red drop tile pages the on-call at 03:10.
Evidence
Playbook 2: attempts also spiking → twin-spike → redial storm; a release-policy change shipped in the evening window.
Root cause
Inactivity timer cut in the change — Module 5's economics, recognized in four minutes by the tree.
Action
Revert the change (rollback was documented); postmortem adds an engagement-delta tile for release-policy edits.
Verification
Both spikes collapse together; detection-to-revert time logged.
Rollback
The revert is the action.

References