5G RAN Capacity Monitoring Guide (V100R017C10 - 01) (PDF) - en
5G RAN Capacity Monitoring Guide (V100R017C10 - 01) (PDF) - en
5G RAN Capacity Monitoring Guide (V100R017C10 - 01) (PDF) - en
V100R017C10
Issue 01
Date 2021-03-05
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees
or representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: https://www.huawei.com
Email: support@huawei.com
Contents
Overview
Growing traffic in mobile networks requires more and more resources. Lack of
resources will affect user experience. This document provides guidelines on 5G
capacity monitoring, including how to identify resource bottlenecks and how to
monitor network resource usage. Capacity monitoring serves as a basis for
network optimization and capacity expansion and enables maintenance personnel
to take measures before resources insufficiency affects network quality and user
experience.
NOTE
This document does not apply to heavy traffic scenarios. For guidance in these scenarios,
contact Huawei technical support.
Product Version
The following table lists the product versions related to this document.
Intended Audience
This document is intended for:
● Field engineers
● Network planning engineers
Organization
01 (2021-03-05)
This is the first commercial release.
Compared with Draft A (2020-12-30), this issue does not include any new topics
or changes, or exclude any topics.
Draft A (2020-12-30)
This is a draft.
Compared with Issue 01 (2020-04-07) of V100R016C10, this issue does not include
any new topics or changes, or exclude any topics.
The following table describes the meaning and impact of each type of resource.
● Issue-driven analysis
In-depth analysis is made to check whether an abnormal KPI is caused by
resource congestion. In this way, issues can be precisely located and proper
network optimization and capacity expansion solutions can be worked out.
For details on this capacity monitoring method, see 1.4 Resource Congestion
Diagnosis.
NOTE
1. Thresholds defined for resource monitoring are generally lower than alarm generation
thresholds so that resource insufficiency risks can be identified as early as possible.
2. Thresholds for capacity expansion provided in this document apply to networks experiencing
steady traffic growth. These thresholds are determined based on product specifications and
live network experience. For example, the CPU usage threshold 60% is determined based on
the CPU flow control threshold 80%. The RRC connected user license usage threshold 60% is
determined based on the peak-to-average ratio (about 1.5:1). When the average license
usage reaches 60%, the peak license usage approaches 100%. Threshold determination
considers both average and peak values. Operators can define these thresholds based on
actual situations.
3. If the network load increases abruptly or even exceeds product specifications, whether to
perform capacity expansion and how to perform can be determined using methods
applicable to networks experiencing steady traffic growth. Alternatively, operators may
decide to perform capacity expansion according to their requirements on network quality.
For example, perform capacity expansion once network congestion occurs.
4. Operators are encouraged to formulate resource capacity optimization solutions based on
prediction and analysis of networks that are experiencing rapid growth, networks that are
about to be deployed with new services, and networks that will apply new charging plans. If
resource capacity optimization services, such as prediction, evaluation, optimization,
reconfiguration, and capacity expansion, are required, contact Huawei technical support.
1.3.1 Overview
This section describes monitoring principles, monitoring methods, and related
counters of all types of resources. It also describes how to locate and handle
resource bottlenecks. Resource insufficiency may be indicated by more than one
monitoring item. For example, a resource bottleneck can be claimed only when
both RRC connected user capacity license usage and main control board CPU
usage exceed thresholds.
NOTE
For the purpose of accurate monitoring, all resources must be monitored during busy hours. It is
recommended that busy hours be defined as a period when the system or a cell is undergoing
the maximum resource consumption of a day.
1.3.7 BBP CPU Usage Average Add boards, replace old boards
BBP CPU with boards of higher
usage ≥ specifications, or balance the load
60% or among BBPs.
Percentage
of times
the BBP
CPU usage
reaches or
exceeds
85% ≥ 5%
Monitoring Principles
The PRB usage increases with the number of users. If the resource requirements of
users are not fulfilled, user rates will decrease and user experience will also
degrade. Therefore, the PRB usage is used to determine the possible resource
bottleneck and the corresponding PRB usage threshold is taken as the cell capacity
expansion threshold.
Monitoring Methods
The PRB usage is calculated using the following formulas:
Downlink PRB usage = N.PRB.DL.Used.Avg/N.PRB.DL.Avail.Avg x 100%; Uplink PRB
usage = N.PRB.UL.Used.Avg/N.PRB.UL.Avail.Avg x 100%.
where
● N.PRB.DL.Used.Avg indicates the average number of PRBs used in the
downlink of a cell.
● N.PRB.DL.Avail.Avg indicates the average number of PRBs available in the
downlink of a cell.
● N.PRB.UL.Used.Avg indicates the average number of PRBs used in the uplink
of a cell.
● N.PRB.UL.Avail.Avg indicates the average number of PRBs available in the
uplink of a cell.
Suggested Measures
If the uplink or downlink PRB usage in a cell reaches or exceeds 70% for X days
(three days by default) in a week, you are advised to take the following measures
accordingly:
Monitoring Principles
The user capacity usage can be evaluated by the RRC connected user capacity
usage of a cell. An RRC connected user in a 5G network is the user in
RRC_CONNECTED mode. When the number of users processed within a cell or by
a board exceeds the maximum number defined in the product specifications,
network KPIs deteriorate.
NOTE
When the number of users reaches the capacity expansion threshold, the user-perceived rate has
already decreased to an unacceptable level. Therefore, the user-perceived rate must be
considered first. The number of users should be considered when operators are more concerned
with user capacity than user experience.
Monitoring Methods
In NSA networking, the RRC connected user capacity usage of a cell is calculated
using the following formula:
RRC connected user capacity usage of a cell = N.User.NsaDc.PSCell.Avg/Number of
RRC connected users in a cell x 100%
where
● N.User.NsaDc.PSCell.Avg indicates the average number of LTE-NR NSA DC
users using the current cell as the primary secondary cell (PSCell).
● For details on the maximum number of RRC connected users supported by a
BBP, see 3900 & 5900 Series Base Station Technical Description.
In SA networking, the RRC connected user capacity usage of a cell is calculated
using the following formula:
RRC connected user capacity usage of a cell = N.User.RRCConn.Avg/Number of
RRC connected users in a cell x 100%
where
● N.User.RRCConn.Avg indicates the average number of RRC connected users in
a cell.
● For details on the maximum number of RRC connected users supported by a
BBP, see 3900 & 5900 Series Base Station Technical Description.
In SA_NSA hybrid networking, the RRC connected user capacity usage of a cell is
calculated using the following formula:
RRC connected user capacity usage of a cell = N.User.RRCConn.Avg/Number of
RRC connected users in a cell x 100%
where
● N.User.RRCConn.Avg indicates the average number of RRC connected users in
a cell.
● For details on the maximum number of RRC connected users supported by a
BBP, see 3900 & 5900 Series Base Station Technical Description.
Suggested Measures
● If the RRC connected user capacity usage of a cell reaches or exceeds 60% for
X days (three days by default) in a week, take measures as suggested in Main
Control Board CPU Usage.
Monitoring Principles
PDCCH resources consist of CCEs. PDCCH usage is evaluated using the CCE usage.
If the CCE usage is excessively high, CCEs may fail to be allocated to the new users
to be scheduled, which will result in a long scheduling delay and affect user
experience.
NOTE
Monitoring Methods
The CCE usage is calculated using the following formula:
CCE usage = N.CCE.Used.Avg/N.CCE.Avail.Avg x 100%
where
● N.CCE.Used.Avg indicates the average number of used PDCCH CCEs.
● N.CCE.Avail.Avg indicates the average number of available PDCCH CCEs.
Suggested Measures
If the CCE usage during busy hours reaches or exceeds 50% for X days (three days
by default) in a week, perform the following operations.
Monitoring Principles
● NSA networking
In NSA networking, paging messages are sent from the eNodeB over the S1
interface. Therefore, the paging resource usage can be evaluated by the
percentage of paging messages received over the S1 interface. If the number
of paging times exceeds the maximum limit, the paging messages sent from
the eNodeB to UEs may be discarded, decreasing the call completion rate.
On the base station side, paging messages received by the main control board
over the S1 interface will be finally sent from the BBP over the air interface. If
all the cells served by a BBU belong to the same tracking area identified by
the tracking area code (TAC), all the paging messages received by the main
control board need to be sent out through each BBP. Whether the paging
messages can be sent out depends on the overall BBU paging capability.
The overall paging capability of the BBU is determined by the smaller
capability between the main control board and BBP capabilities. For details
about the signaling specifications of the main control board and BBP, see
section "eNodeB Technical Specifications" in 3900 & 5900 Series Base Station
Technical Description.
● SA networking
Paging messages are sent over the NG interface. Therefore, paging resource
usage can be evaluated by the percentage of paging messages received on
the NG interface. If the number of paging times exceeds the maximum limit,
the paging messages sent from the gNodeB to UEs may be discarded,
decreasing the call completion rate.
On the base station side, paging messages received by the main control board
over the NG interface will be finally sent from the BBP over the air interface.
If all the cells served by a BBU belong to the same tracking area identified by
the TAC, all the paging messages received by the main control board need to
be sent out through each BBP. Whether the paging messages can be sent out
depends on the overall BBU paging capability.
The overall paging capability of the BBU is determined by the smaller
capability between the main control board and BBP capabilities. The paging
times specification refers to the average paging times of a board instead of
the maximum paging times. A 3900 series base station and a 5900 series base
station support a maximum of 360 paging times per second. For details about
the signaling specifications of the main control board and BBP, see section
Monitoring Methods
● NSA networking
Paging messages are delivered through the LTE network. Therefore, the
paging resource usage of the LTE network is used. The paging resource usage
is evaluated by the percentage of paging messages received over the S1
interface, which is calculated using the following formula:
Percentage of paging messages received over the S1 interface =
L.Paging.S1.Rx/Measurement period (in the unit of second)/Number of paging
messages that can be processed per second x 100%
In this formula, L.Paging.S1.Rx indicates the number of paging messages
received over the S1 interface.
● SA networking
The paging resource usage is evaluated by the percentage of paging
messages received over the NG interface, which is calculated using the
following formula:
Percentage of paging messages received over the NG interface =
N.Paging.NG.Rx/Measurement period (in the unit of second)/Number of
paging messages that can be processed per second x 100%
In this formula, N.Paging.NG.Rx indicates the number of paging messages
received over the NG interface.
Suggested Measures
If the percentage of paging messages received reaches or exceeds 60% for X days
(three days by default) in a week, you are advised to take either of the following
measures:
● Decrease the number of cells in the TAL that contains the congested cell.
● Modify the paging policy on the core network side. That is, reduce the
number of paging messages sent after the first or second paging failures to
reduce signaling load.
● Enable the precise paging function if the core network is provided by Huawei.
Monitoring Principles
The CPU usage of the main control board becomes high occasionally for some
reasons. However, the occasional high CPU usage is not necessarily the basis for
capacity expansion. Therefore, the main control board CPU usage is evaluated
based on both the average main control board CPU usage and the percentage of
the time during which the main control board CPU usage reaches or exceeds a
preconfigured threshold.
The main control board CPU usage reflects the busyness level of a gNodeB. If the
main control board CPUs are busy processing control plane or user plane data,
signaling-related KPIs may deteriorate such as a low access success rate and a
high service drop rate.
Monitoring Methods
The main control board CPU usage is evaluated based on the average CPU usage
and the percentage of times the main control board CPU usage exceeds a
preconfigured threshold (85%).
Suggested Measures
The main control board CPU of a gNodeB is considered overloaded if either of the
following conditions is met for X days (three days by default) in a week:
1. Transfer UEs from the local gNodeB. If a neighboring base station is lightly
loaded, adjust the antenna downtilt angles or decrease the transmit power of
the local gNodeB to shrink the coverage area and reduce the CPU load of the
local gNodeB. In addition, expand the coverage area of the neighboring base
station for load balancing.
2. Replace the main control board with a higher-capacity one. If the main
control board is an UMPTe, replace it with a UMPTg.
3. Add a main control board. If the base station has vacant slots, add a second
main control board.
4. Add a base station.
Monitoring Principles
The BBP CPU usage reflects the load of BBP CPUs. If a gNodeB has too much
traffic data to process, the CPUs responsible for processing user plane data of
BBPs will be heavily loaded. As a result, the gNodeB will experience a low RRC
setup success rate, a low handover success rate, and a high service drop rate.
Monitoring Methods
Based on the type of data processed by the BBP, the BBP CPU usage is classified
into control-plane CPU usage and user-plane CPU usage. The BBP CPU usage is
evaluated based on the average BBP CPU usage and the percentage of the period
during which the BBP CPU usage exceeds a preconfigured threshold. The involved
indicators are described as follows:
● Control-plane CPU usage
– Average control-plane CPU usage: VS.BBUBoard.CPULoad.Mean
– Percentage of the period during which the control-plane CPU usage
exceeds a preconfigured threshold: VS.BBUBoard.CPULoad.Over
VS.BBUBoard.CPULoad.Over indicates the percentage of the period during
which the CPU usage of a board exceeds the preconfigured threshold (85%)
to the entire measurement period.
● User-plane CPU usage
– Average user-plane CPU usage: VS.NRBoard.UPlane.CPULoad.Avg
– Percentage of times the user-plane CPU usage exceeds a preconfigured
threshold = VS.NRBoard.UPlane.CumulativeHighloadCount/Measurement
period (unit: second) x 100%
In the formulas, VS.NRBoard.UPlane.CumulativeHighloadCount indicates the
number of times the user-plane CPU usage exceeds the preconfigured
threshold (85%).
Suggested Measures
The BBP CPU of a gNodeB is considered overloaded if either of the following
conditions is met for X days (three days by default) in a week:
● The average BBP control-plane CPU usage (VS.BBUBoard.CPULoad.Mean) or
the average BBP user-plane CPU usage (VS.NRBoard.UPlane.CPULoad.Avg)
reaches or exceeds 60%.
● The percentage of times the BBP control- or user-plane CPU usage reaches or
exceeds the preconfigured threshold is greater than or equal to 5%.
When the BBP CPU usage is high, capacity expansion is recommended. Take the
following measures:
1. Transfer cells from the local gNodeB. If the local gNodeB has multiple BBPs
and one of them is overloaded, move cells from the overloaded BBP to a BBP
with lighter load.
The BBP load can be indicated by the average CPU usage, the percentage of
times the CPU usage reaches or exceeds a preconfigured threshold, or the
number of cells established on a BBP. For details, see Cell Data
Reconfigurations in 5G RAN Reconfiguration Guide.
2. Add a BBP. If the base station has vacant slots, add a BBP and migrate
existing cells to the new BBP for load sharing.
3. Add a base station. Add a base station for capacity expansion if the number
of configured BBP boards has reached the maximum value.
● NSA networking
In NSA networking, signaling is carried on the LTE network. Therefore, the
formula for calculating the RRC connection congestion rate on the LTE
network is used. For details, see eRAN Capacity Monitoring Guide. The RRC
connection congestion rate is calculated using the following formula:
RRC connection congestion rate = L.RRC.SetupFail.ResFail/L.RRC.ConnReq.Att x
100%
where
– L.RRC.SetupFail.ResFail indicates the number of RRC connection setup
failures due to resource allocation failures.
– L.RRC.ConnReq.Att indicates the number of RRC connection setup
requests.
If a KPI deteriorates, analyze the RRC connection congestion rate. If the RRC
connection congestion rate is higher than 0.2%, the KPI deterioration is
caused by limited capacity.
● SA networking
RRC connection congestion rate = N.RRC.SetupReqFail.Rej.NoRsrc/
N.RRC.SetupReq.Att x 100%
where
– N.RRC.SetupReqFail.Rej.NoRsrc indicates the number of RRC setup
rejections due to resource allocation failures in a cell.
– N.RRC.SetupReq.Att indicates the number of RRC connection setup
requests.
● SA_NSA hybrid networking
The diagnosis procedure typically begins with the detection of abnormal KPIs,
followed up by selecting top N cells and performing a KPI analysis on the cells.
Cell congestion mainly results from insufficient system resources. Bottlenecks can
be identified by analyzing access-related KPIs (RRC connection congestion rate
and E-RAB congestion rate).