SGSN_Healthcheck_Procedure
SGSN_Healthcheck_Procedure
SGSN_Healthcheck_Procedure
OPERATION DIRECTIONS
Disclaimer
The contents of this document are subject to revision without notice due to
continued progress in methodology, design and manufacturing. Ericsson shall
have no liability for any error or damage of any kind resulting from the use
of this document.
Trademark List
All trademarks mentioned herein are the property of their respective owners.
These are shown in the document Trademark Information.
Contents
1 Introduction 1
1.1 Scope 1
1.2 Target Group 1
2 Prerequisites 2
2.1 User 2
2.2 Planning 2
3 Preparation 2
7 Parameter Description 25
7.1 alarm_event_history_time 26
7.2 isp_history_time 26
7.3 mm_ip_addresses 26
7.4 mobility_event_history_time 27
7.5 oss_ip_addresses 27
7.6 session_event_history_time 28
8 Option Descriptions 28
8.1 show_gsm 29
8.2 show_wcdma 29
8.3 show_lte 29
8.4 check_ip_connectivity 30
8.5 check_cdr_generation 30
8.6 check_mobility_event 30
8.7 check_session_event 31
8.8 check_gras 31
8.9 check_cells 31
8.10 check_nses 32
8.11 check_diameter_peers 32
8.12 check_rncs 33
8.13 check_tas 33
8.14 check_enodebs 33
8.15 check_ss7 34
9 Support 34
1 Introduction
The health check data collection does not cause any traffic disturbances.
1.1 Scope
This document covers the following:
2 Prerequisites
This section describes the prerequisites to perform the health check procedure.
2.1 User
The person performing the health check procedures is required to have a solid
knowledge of and training in the following areas:
2.2 Planning
The automatic health check data collection is always enabled. To make full use
of the health check functionality, perform the following before you start using it:
• The operator must be a user with O&M access and configuration rights.
For more information, see Operator Access Handling.
• Consider value ranges for Data Collection components within Disk Usage,
CPU Load, Subscriber Statistics, Failure Ratio Key Performance Indicators,
and Payload.
• If CDR check is desired, the password for the user account administrator
om_admin is needed.
3 Preparation
The automatic health check data collection is always enabled. To make full use
of the health check functionality, perform the following before you start using it:
Instructions
1. If the SGSN-MME is configured for LTE access only, disable the CDR
generation check by running the health_check.pl -o configure
command and setting the check_cdr_generation parameter to No.
Example
Value ranges are available for all components that require configuration,
see Section 4 on page 4
3. Configure the allowed ranges individually for each SGSN-MME using the
health_check.pl -a configure command. Consider the following
when choosing ranges:
• Subscribers Statistics
Set the minimum value for allowed ranges to a value below the lowest
expected value during normal operation of the SGSN-MME. For
example, if the SGSN-MME is handling between 200,000 and 300,000
attached subscribers during one week around-the-clock, the minimum
value can be chosen to 150,000.
Set the maximum value in allowed ranges to a value that is higher than
the highest expected during normal operation of the SGSN-MME. One
approach is to use the default allowed range for one week, audit the
results after one week of operation and then choose the allowed value
range. For example, if the failure ratio for PDP Context activation is
0.5%, the allowed range can be chosen to 0.6%.
• Payload
Set the minimum value for allowed ranges to a value below the lowest
expected value during normal operation of the SGSN-MME. For
example, if the SGSN-MME is handling between 100 and 500 Mbps
for downlink on Iu during one week around-the-clock, the minimum
value can be chosen to 50.
4. The data collections contain information for all RAT types, but the
information for a specific RAT type can optionally be hidden in the output.
Use the health_check.pl -o configure command to hide the
information for a specific RAT type in the output.
This section describes the information stored and displayed for each health
check data collection component.
To minimize the CPU load impact and the size of the data collections, the
number of analyzed cleared alarms, events, mobility events, and session events
are limited. The logs contain a clear statement when limits have been reached.
Printout Example
-----------------------------------------
### CPU Load
-----------------------------------------
• Name
• Type
• Hardware
• TMO
The following ISP events are counted during the configured time interval:
• Large restarts
• Small restarts
• NCB failover
• Capsule failures
Printout Example
-----------------------------------------
### ISP information
-----------------------------------------
uptime :39.24days
large_restart :4
small_restart :0
small_local_restart :0
ncb_failover :0
pm_failure :0
capsule_failure :0
-----------------------------------------
• /Core
• /charging
• /logs
• /tmp
• /var
Printout Example
-----------------------------------------
### Disk usage
-----------------------------------------
Note: It is not recommended to increase the values above the default allowed
ranges.
Printout Example
-----------------------------------------
### CPU Load
-----------------------------------------
4.5 Alarms
Alarms show the amount of cleared and active alarms per severity. The number
of active and cleared alarms is always checked against the configured allowed
range. The alarms that have been active during a configured time interval are
checked. If the allowed range is set to zero, no alarms are allowed.
Printout Example
-----------------------------------------
### Alarms
-----------------------------------------
4.6 Events
Events show the number of restarts of external nodes.
Restarts are counted during the configured time interval for the following nodes:
• HLR
• HSS
• MSC
• RNC
• eNodeB
Printout Example
-----------------------------------------
### Events
-----------------------------------------
hlr_restart :5
hss_restart :0
gtp_node_restart :0
gs_msc_restart :0
sgs_msc_restart :0
rnc_restart :28
enodeb_reset :5
-----------------------------------------
Printout Example
-----------------------------------------
### Subscriber statistics
-----------------------------------------
subs_with_ongoing_signalling :5140
subs_no_ongoing_signalling :0
Printout Example
-----------------------------------------
### Failure Ratio Key Performance Indicators (KPIs)
-----------------------------------------
4.9 Payload
Payload shows the payload statistics, in Mbps, for Gb, Iu and Gn interfaces.
The measured payload is always checked against the configured allowed
range. The latest available values from PDC KPI are used as input.
Printout Example
-----------------------------------------
### Payload
-----------------------------------------
4.10 IP Connectivity
IP connectivity performs connectivity checks using ICMP towards external
nodes. The option check_ip_connectivity specifies if IP connectivity check is to
be performed or not. IP connectivity is performed towards for the configured
DNS and NTP servers. IP connectivity check can also be performed towards
OSS and MM nodes, but the IP addresses need to be configured with the
parameters. The parameters oss_ip_addresses and mm_ip_addresses specify
the IP addresses. The IP connectivity check is enabled by default, but can
be disabled.
Printout Example
-----------------------------------------
### IP connectivity check
-----------------------------------------
4.11 Services
The CDR generation check verifies that CDRs are generated and written to a
file in the SGSN-MME. The option check_cdr_generation specifies if CDR
generation check is to be performed or not. The CDR generation check is
enabled by default, but can be disabled.
Printout Example
-----------------------------------------
### Services check
-----------------------------------------
Printout Example
-----------------------------------------
### Mobility Events
-----------------------------------------
gmm_cause_network_failure_cc17 :738
Printout Example
-----------------------------------------
### Session Events
-----------------------------------------
sm_cause_missing_or_unknown_apn_cc27 :1000
Printout Example
-----------------------------------------
### Interface check
-----------------------------------------
****************************************
*** Gb
****************************************
nbr_ras_gsm :28
nbr_operational_cells_gsm :84
nbr_not_operational_cells_gsm :0
nbr_gbip_nses :7
nbr_gbip_operational_remote_ip_eps :21
nbr_gbip_not_operational_remote_ip_eps :0
****************************************
*** Iu-C
****************************************
nbr_operational_rncs :1
nbr_not_operational_rncs :0
****************************************
*** S1-MME
****************************************
nbr_tas_lte :12
nbr_operational_enodebs :15
nbr_not_operational_enodebs :0
****************************************
*** Diameter Peers
****************************************
nbr_operational_diameter_peers :7
nbr_not_operational_diameter_peers :0
****************************************
*** SS7
****************************************
nbr_operational_mtpl3_links :0
nbr_not_operational_mtpl3_links :0
nbr_operational_m3ua_assoc :2
nbr_not_operational_m3ua_assoc :0
****************************************
4.16 Verdict
Verdict presents the number of passed and failed checks.
Printout Example
+++++++++++++++++++++++++++++++++++++++++
+++ Verdict
+++++++++++++++++++++++++++++++++++++++++
nbr_passed 39
nbr_failed 5
+++++++++++++++++++++++++++++++++++++++++
Note: Create a reference health check data collection when the SGSN-MME
and network are working perfectly.
Instructions
Example
health_check.pl -c post_check.
Example
health_check.pl -s 60.
If the disk usage is above the allowed range, free disk space by deleting
all unnecessary data on the SGSN-MME, such as software configurations
and expired logs. Backup takes longer time if the partition core has high
disk usage level.
If the CPU load is above the allowed range, deploy additional capacity in
the network. For more information, see Characteristics.
If there are cleared alarms, check the following files to find out which alarms
have been activated and cleared since the last collection occasion:
• /tmp/DPE_COMMONLOG/health_check/data/<data_collectio
n>/alarms_history.txt
• /tmp/DPE_COMMONLOG/health_check/data/<data_collectio
n>/alarms_intensity_log.txt
• /tmp/DPE_COMMONLOG/health_check/data/<data_collecti
on>/events.txt
• /tmp/DPE_COMMONLOG/health_check/data/<data_collectio
n>/events_intensity_log.txt
If the expected subscriber statistics values are not met, analyze the
following logs. The logs are found in /tmp/DPE_COMMONLOG/health_ch
eck/data/<data_collection>.
• mobility_event_details_log.txt
• mobility_event_intensity_log.txt
• session_event_details_log.txt
• session_event_intensity_log.txt
• olp_log.txt
• mm_log.txt
• sm_log.txt
• ss7_log.txt
• not_operational_cells_gsm.txt
• not_operational_remote_ip_eps.txt
• not_operational_rncs.txt
• not_operational_mtpl3_links.txt
• not_operational_m3ua_assoc.txt
• not_operational_enodebs.txt
If the expected KPI values are not met, analyze the following logs. The
logs are found in /tmp/DPE_COMMONLOG/health_check/data/<da
ta_collection>.
• mobility_event_details_log.txt
• mobility_event_intensity_log.txt
• session_event_details_log.txt
• session_event_intensity_log.txt
• olp_log.txt
• mm_log.txt
• sm_log.txt
• ss7_log.txt
If the expected payload values are not met, analyze the following logs. The
logs are found in /tmp/DPE_COMMONLOG/health_check/data/<da
ta_collection>.
• packet_loss_log.txt
• not_operational_cells_gsm.txt
• not_operational_remote_ip_eps.txt
• not_operational_rncs.txt
• not_operational_mtpl3_links.txt
• not_operational_m3ua_assoc.txt
Contact your local Ericsson support if CDR generation does not work, see
Section 9 on page 34.
Note: The verdict from the CDR generation check is Failed if there is
no traffic causing CDR closure.
• not_operational_cells_gsm.txt
• not_operational_remote_ip_eps.txt
• not_operational_rncs.txt
• not_operational_mtpl3_links.txt
• not_operational_m3ua_assoc.txt
• not_operational_enodebs.txt
Note: The health check only checks SCTP associations used by M3UA. If
needed, use the sctp_status command to check other SCTP
associations.
The automatic data collections are given numbers from 1 through 60. After 10
days, the oldest data collection is overwritten.
To list the stored health check data collections, use the command
health_check.pl -l. A list of available health check data collections is
displayed on the screen.
Example
health_check.pl -c post_check
To calculate differences between two health check data collections, use the
command health_check.pl -d <data_collection_number1,data_c
ollection_number2>. The result of the comparison is displayed on the
screen, showing the data for each collection in the first two columns and the
difference between two collections in the third column.
Example
health_check.pl -d 79,78
Example
health_check.pl -s 57
health_check.pl -s 60,59,68,57,56
Example
health_check.pl -r 61,62,63
To reset the configured allowed ranges to default values, use the command
health_check.pl -a reset.
The following values are accepted when configuring the allowed ranges:
x2_handover_lte 0.1
s1_handover_lte 0.1
paging_lte 0.1
bearer_establishment_lte 0.1
service_request_lte 0.1
intra_mme_tau_lte 0.1
inter_mme_tau_lte 0.1
intra_isc_tau_lte 0.1
inter_isc_tau_lte 0.1
irat_ho_mme_source_lte 0.1
irat_ho_mme_target_lte 0.1
### Payload uplink_mbps_gb 0-10000
downlink_mbps_gb 0-10000
uplink_mbps_iu 0-10000
downlink_mbps_iu 0-10000
uplink_mbps_gn 0-10000
downlink_mbps_gn 0-10000
Example 1
The following values are accepted when configuring the options: yes|no.
6.8.1 Options
The option show_gsm specifies if GSM-related information is to be included in
the health check printout or not.
6.9.1 Parameters
The parameter alarm_event_history_time specifies the measurement
time interval for alarms and events.
7 Parameter Description
This section describes the parameters used by the health check tool.
Valid for Specifies the radio network access type for which the
parameter is valid.
Related Commands
Lists the UNIX commands that can be used to display
and manage the parameter.
7.1 alarm_event_history_time
The parameter alarm_event_history_time specifies the measurement
time interval for alarms and events. At automatic execution, the parameter is
always set to 5 hours.
Default Value 24
Activation Runtime
Related Commands
health_check.pl -p configure|list|reset
7.2 isp_history_time
The parameter isp_history_time specifies the measurement time interval
for ISP events. At automatic execution, the parameter is always set to 720
hours.
Activation Runtime
Related Commands
health_check.pl -p configure|list|reset
7.3 mm_ip_addresses
The parameter mm_ip_addresses specifies the IP addresses of the
Multi-Mediation nodes.
Activation Runtime
Related Commands
health_check.pl -p configure|list|reset
7.4 mobility_event_history_time
The parameter mobility_event_history_time specifies the measurement
time interval for mobility events. At automatic execution, the parameter is
always set to 5 hours.
Default Value 24
Activation Runtime
Related Commands
health_check.pl -p configure|list|reset
7.5 oss_ip_addresses
The parameter oss_ip_addresses specifies the IP addresses of the OSS
nodes.
Activation Runtime
Related Commands
health_check.pl -p configure|list|reset
7.6 session_event_history_time
The parameter session_event_history_time specifies the measurement
time interval for session events. At automatic execution, the parameter is
always set to 5 hours.
Default Value 24
Activation Runtime
Related Commands
health_check.pl -p configure|list|reset
8 Option Descriptions
This section describes the options used by the health check tool.
A general description is presented for each option, and in addition, the following
attributes are described:
Valid for Specifies the radio network access type for which the
parameter is valid.
Related Commands
Lists the UNIX commands that can be used to display
and manage the parameter.
8.1 show_gsm
The option show_gsm specifies if GSM-related information is to be included in
the health check printout or not.
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.2 show_wcdma
The option show_wcdma specifies if WCDMA-related information is to be
included in the health check printout or not.
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.3 show_lte
The option show_lte specifies if LTE-related information is to be included in
the health check printout or not.
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.4 check_ip_connectivity
The option check_ip_connectivity specifies if IP connectivity check is to
be performed or not.
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.5 check_cdr_generation
The option check_cdr_generation specifies if CDR generation check is to
be performed or not.
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.6 check_mobility_event
The option check_mobility_event specifies if analysis of the mobility event
logs is to be performed or not.
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.7 check_session_event
The option check_session_event specifies if analysis of session event
logs is to be performed or not.
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.8 check_gras
The option check_gras specifies if the GSM RA status check is to be
performed or not. At automatic execution, the check is never included.
Default Value no
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.9 check_cells
The option check_cells specifies if the GSM cell status check is to be
performed or not. At automatic execution, the check is never included.
Default Value no
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.10 check_nses
The option check_nses specifies if the GSM NSE status check is to be
performed or not. At automatic execution, the check is never included.
Default Value no
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.11 check_diameter_peers
The option check_diameter_peers specifies if the diameter peer status
check is to be performed or not. At automatic execution, the check is never
included.
Default Value no
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.12 check_rncs
The option check_rncs specifies if the RNC status check is to be performed
or not. At automatic execution, the check is never included.
Default Value no
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.13 check_tas
The option check_tas specifies if the TA status check is to be performed or
not. At automatic execution, the check is never included.
Default Value no
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.14 check_enodebs
The option check_enodebs specifies if the eNodeB status check is to be
performed or not. At automatic execution, the check is never included.
Default Value no
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
8.15 check_ss7
The option check_ss7 specifies if the SS7 status check is to be performed or
not. At automatic execution, the check is never included.
Default Value no
Activation Runtime
Related Commands
health_check.pl -o configure|list|reset
9 Support