A single call drop costs an operator far more than the lost revenue from that call. It erodes trust, drives churn, and triggers support tickets. Industry data shows that subscribers who experience 3+ drops per month are 4x more likely to churn. AI can predict call drops 30-120 seconds before they occur, giving the network enough time to take preventive action: preemptive handover, carrier activation, power boost, or load rebalancing. In this article, we build the complete ML pipeline from raw RAN data to real-time prevention.
What Causes Call Drops?
RF degradation, HO failure, congestion, hardware, interference
Call drops are classified by RRC release cause (3GPP TS 36.331): Radio Link Failure (T310 expiry — 50% of drops), Handover Failure (T304 expiry — 25%), Congestion (resource preemption — 15%), and Other (hardware fault, interference, core network — 10%). Each cause has different predictive features: RLF correlates with RSRP/SINR degradation, HO failure with mobility speed, congestion with PRB utilization.
| Cause | RRC Cause Code | Share | Key Predictor |
|---|---|---|---|
| Radio Link Failure | t310-Expiry (TS 36.331) | ~50% | RSRP trend, SINR variance |
| Handover Failure | handoverFailure (T304) | ~25% | UE speed, dRSRP/dt |
| Congestion | cs-FallbackHighPriority | ~15% | PRB utilization, RRC count |
| Hardware/Other | other | ~10% | VSWR, PA temp, alarms |
A cell shows RSRP=-108, SINR=-2, PRB=35%, no alarms. Most likely drop cause?
Data Collection Pipeline
PM counters + CHR traces + MDT for drop prediction
The prediction pipeline ingests three data streams: PM counters (ERAB.RelAbnormal.RLF, RRC.ConnMean, PRB.Used.DL.Avg — 15-min granularity), CHR traces (per-call RRC messages with measurement reports — real-time), and MDT (geo-tagged RSRP for coverage context). The PM counters provide the baseline features; CHR traces add per-call detail for labeling; MDT provides spatial context.
Feature Engineering
15+ features from raw counters and traces
Raw counters become ML features through statistical aggregation and temporal derivatives. The most predictive features: RSRP_trend_30s (slope of RSRP over last 30 seconds), SINR_variance_5min (signal instability), PRB_util_peak_15min (congestion risk), BLER_mean_1min (link quality), HO_attempt_rate (mobility stress), and active_users_delta (load change rate).
| Feature | Source | Engineering | Importance |
|---|---|---|---|
| RSRP_trend_30s | MeasReport | Linear slope | 0.23 (highest) |
| SINR_variance_5min | MeasReport | Rolling std dev | 0.18 |
| PRB_util_peak_15min | PM counter | Max over window | 0.14 |
| BLER_mean_1min | PM counter | Rolling mean | 0.12 |
| HO_attempt_rate | PM counter | Count/interval | 0.09 |
| active_users_delta | PM counter | Difference | 0.08 |
Which feature is most predictive of an imminent call drop?
Labeling Strategy
Defining "drop" vs. "normal" from RRC release causes
Labeling is critical. A "drop" is defined as any abnormal RRC release: ERAB.RelAbnormal.RLF (radio link failure), ERAB.RelAbnormal.HO (handover failure), or ERAB.RelAbnormal.Other. Normal releases (ERAB.RelNormal) are the negative class. The challenge: drops are only 2-5% of all releases, creating severe class imbalance. Solutions: SMOTE oversampling, class weights, or focal loss.
Model Architecture
XGBoost for classification + LSTM for time-series risk scoring
The best results come from a two-stage architecture: (1) XGBoost classifier predicts drop probability from current cell state (PM features) — fast, interpretable, 85% accuracy. (2) LSTM processes the time-series of measurement reports for UEs in "at-risk" cells — predicts per-UE drop probability 30-120s ahead. The LSTM catches temporal patterns (rapid RSRP degradation) that XGBoost's snapshot features miss.
Training & Validation
Temporal splits, SMOTE, cross-validation, F1-score
Training uses temporal split (never random): train on weeks 1-6, validate on week 7, test on week 8. SMOTE oversamples the minority class (drops) to 20-30% of training data. Evaluation metric: F1-score (harmonic mean of precision and recall) because both false positives (unnecessary actions) and false negatives (missed drops) are costly. Target: F1 > 0.80.
Adjust the prediction threshold. Lower threshold = more catches but more false alarms.
Real-Time Prediction System
Scoring pipeline, threshold tuning, alert generation
The real-time system scores every cell every 15 seconds (PM mini-interval) and every at-risk UE every 1 second (from measurement reports via E2). When drop probability exceeds the threshold (tuned for F1 > 0.80), an alert triggers with the predicted cause, affected UE, and recommended action. Alerts are prioritized by probability and subscriber value (VIP/enterprise customers get priority).
Preventive Actions
What the network does when a drop is predicted
Prediction without action is useless. The system triggers automated preventive actions based on the predicted cause: RLF predicted → preemptive handover to best neighbor. Congestion predicted → activate carrier aggregation or redirect new sessions. Interference predicted → adjust power/scheduling. HO failure predicted → prepare multiple target cells (conditional handover). Each action has measurable KPI impact: preemptive HO alone reduces drops by 30-40%.
| Predicted Cause | Preventive Action | Drop Reduction |
|---|---|---|
| Radio Link Failure | Preemptive handover to best neighbor | -40% |
| Congestion | Activate carrier / redirect new sessions | -35% |
| HO Failure | Conditional HO (multiple targets prepared) | -50% |
| Interference | Power adjustment / ICIC activation | -25% |
Final Assessment
10 questions on predicting call drops with AI