Comparative analysis of ML algorithms for 5G coverage prediction review

2026년 1월 5일 · 약 3분

Eunkwang Shin

Owner

Summary

Model performance in 5G coverage prediction is primarily determined by the alignment between data characteristics, feature design, and model inductive bias, rather than by model complexity alone.
Using real-world 5G NR drive-test data with physics-informed numerical features, this study demonstrates that Random Forest can achieve SOTA performance, outperforming more complex models such as XGBoost and deep neural networks.
The results highlight the continued importance of domain-informed feature engineering and show that deep learning becomes advantageous only when the data representation and scale justify its use.

Introduction

Coverage prediction in 5G networks is a core component of network planning, optimization, and resource allocation.
Conventional propagation and path loss models are limited in their ability to accurately capture the complexity of dense urban environments and the unique characteristics of 5G systems.
Machine Learning and Deep Learning have emerged as promising alternatives, as they can model complex non-linear relationships across multiple parameters.
However, prior studies typically suffer from several limitations:
- Most focus on 4G networks or rely on a limited set of input features.
- Comparisons across a wide range of algorithms are often insufficient.
- Systematic analyses of feature importance are largely lacking.

The objectives of this study are to:

Conduct a comprehensive comparison of multiple ML and DL algorithms using a unified dataset.
Identify dominant feature parameters that significantly influence 5G coverage prediction.
Demonstrate performance improvements over previously reported methods.

Methods

Data Collection

Real-world 5G NR drive test measurements conducted in Bandung, Indonesia (Batununggal area).
Approximately 1,500 SS-RSRP samples collected.
Deployment includes 10 gNodeBs, each configured with three sectors.
Measurement vehicle speed maintained below 30 km/h to minimize fast fading effects.

Input Features (10 Total)

2D Distance between Transmitter and Receiver
Frequency
Transmitter Tilt Angle
Transmitter Azimuth Angle
Altitude
Elevation Angle
Azimuth Offset Angle
Tilting Offset Angle
Horizontal Distance of Receiver from Transmitter Antenna Boresight
Vertical Distance of Receiver from Transmitter Antenna Boresight

Algorithms

Machine Learning (Classification-based):

Logistic Regression
K-Nearest Neighbors (KNN)
Naive Bayes
Random Forest
Support Vector Machine (SVM)
XGBoost
LightGBM
AdaBoost
Bayesian Network Classifier

Deep Learning:

Multi-Layer Perceptron (MLP)
Long Short-Term Memory (LSTM)
Convolutional Neural Network (CNN)

Training and Validation

Experiments conducted using Google Colab.
10-fold cross-validation applied for all models.
Hyperparameter optimization performed only on the best-performing models.

Evaluation Metrics

Regression Metrics: RMSE, MAE, R²
Classification Metrics: Accuracy, Precision, Recall, F1-score

Results

Machine Learning

Random Forest:

RMSE = 1.14 dB
MAE = 0.12
R² = 0.97
Accuracy / Precision / Recall / F1-score ≈ 98.4%

Deep Learning

Convolutional Neural Network (CNN):

RMSE = 0.289
MAE = 0.289
R² = 0.78
Accuracy = 75%
Precision = 85.6%
Recall = 87.8%
F1-score = 89.9%
MLP and LSTM exhibit inferior performance compared to CNN.

Feature Importance

The 2D Transmitter–Receiver Distance is identified as the most dominant feature across all algorithms.
Incorporating horizontal and vertical distances from the antenna boresight significantly improves prediction accuracy.

Comparison with Previous Studies

Both Random Forest and CNN achieve lower RMSE values compared to prior studies.
Random Forest, in particular, demonstrates state-of-the-art performance relative to existing 4G and 5G coverage prediction research.

Discussion

Random Forest
- Highly effective for small-to-medium-sized datasets with numerical features.
- Offers strong interpretability and robust performance stability.
Convolutional Neural Network
- Well-suited for grid-based or spatial data representations.
- Shows greater potential when image-based or satellite-derived features are incorporated.
- In this study, CNN was applied by transforming numerical features into a matrix-like structure.
The results empirically demonstrate that feature design and selection can be more critical than the choice of learning algorithm itself.

Summary​

Introduction​

Methods​

Data Collection​

Input Features (10 Total)​

Algorithms​

Training and Validation​

Evaluation Metrics​

Results​

Machine Learning​

Deep Learning​

Feature Importance​

Comparison with Previous Studies​

Discussion​

Summary

Introduction

Methods

Data Collection

Input Features (10 Total)

Algorithms

Training and Validation

Evaluation Metrics

Results

Machine Learning

Deep Learning

Feature Importance

Comparison with Previous Studies

Discussion