0% found this document useful (0 votes)
2 views38 pages

SGSN_Healthcheck_Procedure

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 38

Health Check

OPERATION DIRECTIONS

10/1543-AXB 250 05/8-V2 Uen BB


Copyright

© Ericsson AB 2008-2017. All rights reserved. No part of this document may be


reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to
continued progress in methodology, design and manufacturing. Ericsson shall
have no liability for any error or damage of any kind resulting from the use
of this document.

Trademark List

All trademarks mentioned herein are the property of their respective owners.
These are shown in the document Trademark Information.

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Contents

Contents

1 Introduction 1
1.1 Scope 1
1.2 Target Group 1

2 Prerequisites 2
2.1 User 2
2.2 Planning 2

3 Preparation 2

4 Health Check Data Collection Components 4


4.1 Node Information 5
4.2 ISP Information 5
4.3 Disk Usage 6
4.4 CPU Load 6
4.5 Alarms 7
4.6 Events 7
4.7 Subscriber Statistics 8
4.8 Failure Ratio Key Performance Indicators 10
4.9 Payload 10
4.10 IP Connectivity 11
4.11 Services 11
4.12 Mobility Events 12
4.13 Session Events 12
4.14 Interface Status 13
4.15 Additional Logs 14
4.16 Verdict 15

5 Health Check Procedure 16

6 Health Check Operations 20


6.1 View Information on Command Use 20
6.2 List Available Health Check Data Collections 20
6.3 Trigger Manual Health Check Data Collection 20
6.4 Calculate Differences between Two Health Check Data
Collections 20

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Health Check

6.5 Display Single or Multiple Health Check Data Collections 21


6.6 Remove Health Check Data Collections 21
6.7 Configure Allowed Ranges 21
6.8 Configure Options 23
6.8.1 Options 23
6.9 Configure Parameters 24
6.9.1 Parameters 25

7 Parameter Description 25
7.1 alarm_event_history_time 26
7.2 isp_history_time 26
7.3 mm_ip_addresses 26
7.4 mobility_event_history_time 27
7.5 oss_ip_addresses 27
7.6 session_event_history_time 28

8 Option Descriptions 28
8.1 show_gsm 29
8.2 show_wcdma 29
8.3 show_lte 29
8.4 check_ip_connectivity 30
8.5 check_cdr_generation 30
8.6 check_mobility_event 30
8.7 check_session_event 31
8.8 check_gras 31
8.9 check_cells 31
8.10 check_nses 32
8.11 check_diameter_peers 32
8.12 check_rncs 33
8.13 check_tas 33
8.14 check_enodebs 33
8.15 check_ss7 34

9 Support 34

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Introduction

1 Introduction

This document describes the procedure of performing a health check of an


SGSN-MME.

A health check is performed to verify that no degradations have been


introduced into the network after procedures such as reconfiguration, software
updates, and software upgrades. A health check can also be performed during
emergencies to quantify the problems in the network. When changes are made
in the network, data used for verification must be collected manually, both
before and after the change.

To support a health check, the SGSN-MME automatically stores a collection


of relevant data every four hours. This configurable data collection gives
information on several components, such as, CPU load, subscriber statistics,
and mobility events, that can be used to determine the health status of
the SGSN-MME. To facilitate the health check further, the SGSN-MME
also compares the collected data with configured allowed ranges and gives
automatic verdicts.

The SGSN-MME provides several commands for manually triggering data


collections, managing, and comparing collections, as well as for configuring
contents of the data collections and allowed value ranges.

The health check data collection does not cause any traffic disturbances.

1.1 Scope
This document covers the following:

• Prerequisites for the health check procedure

• Instructions regarding use of the available health check commands

• Description of the health check data collection components

• Description of the procedure used when performing a health check

1.2 Target Group


This document is intended for personnel performing health check on the
SGSN-MME.

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 1


Health Check

2 Prerequisites

This section describes the prerequisites to perform the health check procedure.

2.1 User
The person performing the health check procedures is required to have a solid
knowledge of and training in the following areas:

• Operation of the SGSN-MME

• The current configuration of the SGSN-MME and the network.

2.2 Planning
The automatic health check data collection is always enabled. To make full use
of the health check functionality, perform the following before you start using it:

• The operator must be a user with O&M access and configuration rights.
For more information, see Operator Access Handling.

• The O&M IP address, or the corresponding DNS name of the SGSN-MME


for CLI access is available.

Consider the following issues when planning the configuration:

• Consider value ranges for Data Collection components within Disk Usage,
CPU Load, Subscriber Statistics, Failure Ratio Key Performance Indicators,
and Payload.

• If CDR check is desired, the password for the user account administrator
om_admin is needed.

3 Preparation

The automatic health check data collection is always enabled. To make full use
of the health check functionality, perform the following before you start using it:

2 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Preparation

Instructions

1. If the SGSN-MME is configured for LTE access only, disable the CDR
generation check by running the health_check.pl -o configure
command and setting the check_cdr_generation parameter to No.

2. If the CDR generation check is enabled, assign the user executing


the health check data collection to the charging group using the
set_sm_group CLI command. This also applies to the pdc_user user
that executes the automatic data collections.

Example

gsh set_sm_group -ui user -g charging

Value ranges are available for all components that require configuration,
see Section 4 on page 4

3. Configure the allowed ranges individually for each SGSN-MME using the
health_check.pl -a configure command. Consider the following
when choosing ranges:

• Subscribers Statistics

Set the minimum value for allowed ranges to a value below the lowest
expected value during normal operation of the SGSN-MME. For
example, if the SGSN-MME is handling between 200,000 and 300,000
attached subscribers during one week around-the-clock, the minimum
value can be chosen to 150,000.

• Failure Ratio KPIs.

Set the maximum value in allowed ranges to a value that is higher than
the highest expected during normal operation of the SGSN-MME. One
approach is to use the default allowed range for one week, audit the
results after one week of operation and then choose the allowed value
range. For example, if the failure ratio for PDP Context activation is
0.5%, the allowed range can be chosen to 0.6%.

• Payload

Set the minimum value for allowed ranges to a value below the lowest
expected value during normal operation of the SGSN-MME. For
example, if the SGSN-MME is handling between 100 and 500 Mbps
for downlink on Iu during one week around-the-clock, the minimum
value can be chosen to 50.

4. The data collections contain information for all RAT types, but the
information for a specific RAT type can optionally be hidden in the output.
Use the health_check.pl -o configure command to hide the
information for a specific RAT type in the output.

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 3


Health Check

5. The IP connectivity component is optional and requires further configuration.


Use the health_check.pl -o configure command to configure the
IP connectivity component.

6. The interface component is disabled by default, because it is


time-consuming for large configurations. Enable the interface component
for manual data collections by using the health_check.pl -o
configure command.

7. Configuring measurement time intervals is normally not needed. If


necessary, configure them using the health_check.pl -p configure
command.

4 Health Check Data Collection Components

This section describes the information stored and displayed for each health
check data collection component.

To minimize the CPU load impact and the size of the data collections, the
number of analyzed cleared alarms, events, mobility events, and session events
are limited. The logs contain a clear statement when limits have been reached.

Automatic verdict is performed on parts of the collected data. The collected


value is compared to the configured allowed value range resulting in the verdict
Failed if the value is outside the allowed range. This is also stated clearly in
the printout.

The automatic verdicts do not trigger any alarms in the SGSN-MME.

Printout Example

The following is an example of a printout containing a value outside the allowed


range, that is, with the verdict Failed:

-----------------------------------------
### CPU Load
-----------------------------------------

load_ncb_1.19 :0 (Allowed range : <= 80)


load_ncb_1.20 :3 (Allowed range : <= 80)
load_ap :80.50 (Allowed range : <= 80) Failed
load_dp :51.00 (Allowed range : <= 80)
load_ss7_dp :32.00 (Allowed range : <= 80)

4 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Health Check Data Collection Components

4.1 Node Information


The following node information is always collected:

• Name

• Type

• Hardware

• TMO

• Software level (SWL)

4.2 ISP Information


The SGSN-MME uptime is always calculated and stated in the ISP information.

The following ISP events are counted during the configured time interval:

• Large restarts

• Small restarts

• Small local restarts

• NCB failover

• Processing Module failures

• Capsule failures

The parameter isp_history_time specifies the measurement time interval for


ISP events.

Printout Example

The following is an example of an ISP information printout:

-----------------------------------------
### ISP information
-----------------------------------------

uptime :39.24days
large_restart :4
small_restart :0
small_local_restart :0
ncb_failover :0
pm_failure :0
capsule_failure :0
-----------------------------------------

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 5


Health Check

4.3 Disk Usage


Disk usage shows the current storage level for different partitions.

The disk usage is always checked for the following partitions:

• /Core

• /charging

• /logs

• /tmp

• /var

Disk usage is checked against the configured allowed range.

Note: It is not recommended to increase the values on default allowed ranges.

Printout Example

The following is an example of a disk usage printout:

-----------------------------------------
### Disk usage
-----------------------------------------

core :50% (Allowed range : <= 60)


logs :8% (Allowed range : <= 80)
tmp :7% (Allowed range : <= 10)
-----------------------------------------

4.4 CPU Load


CPU load shows the current CPU usage for different type of processing
roles. The CPU load is always checked against the configured allowed
range. It is presented as the average CPU load over all instances executing
a specific processing role, except for the NCBs. The current values of gauge
SYS.gsnCpuUsage are used for the calculation. SYS.gsnSs7SctpDpCpuUsage,
SYS.gsnPayloadDpCpuUsage, and SYS.gsnApCpuUsage are also used.

Note: It is not recommended to increase the values above the default allowed
ranges.

Printout Example

The following is an example of a CPU load printout:

-----------------------------------------
### CPU Load

6 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Health Check Data Collection Components

-----------------------------------------

load_ncb_1.19 :0 (Allowed range : <= 70)


load_ncb_1.20 :3 (Allowed range : <= 70)
load_ap :76.50 (Allowed range : <= 70)
load_dp :52.80 (Allowed range : <= 70)
load_ss7_dp :25.00 (Allowed range : <= 70)
-----------------------------------------

4.5 Alarms
Alarms show the amount of cleared and active alarms per severity. The number
of active and cleared alarms is always checked against the configured allowed
range. The alarms that have been active during a configured time interval are
checked. If the allowed range is set to zero, no alarms are allowed.

The parameter alarm_event_history_time specifies the measurement time


interval for alarms.

Printout Example

The following is an example of an alarm printout:

-----------------------------------------
### Alarms
-----------------------------------------

alarms_active_critical :6 (Allowed range : x = 0) Failed


alarms_active_major :97 (Allowed range : x = 0) Failed
alarms_active_minor :0
alarms_active_warning :0
alarms_cleared_critical :0 (Allowed range : x = 0)
alarms_cleared_major :1 (Allowed range : x = 0) Failed
alarms_cleared_minor :1
alarms_cleared_warning :0

4.6 Events
Events show the number of restarts of external nodes.

Restarts are counted during the configured time interval for the following nodes:

• HLR

• HSS

• Node handling GTP

• MSC

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 7


Health Check

• RNC

• eNodeB

The parameter alarm_event_history_time specifies the measurement time


interval for events. The restart and its corresponding event are shown in Table
1.

Table 1 Restarts and Corresponding Events


Restart Event
hlr_restart mapHLRrestarted
hss_restart hssRestarted
gtp_node_restart gtpGSNrestarted
gs_msc_restart gsMscVlrRestarted
sgs_msc_restart sgsMscVlrRestarted
rnc_restart ranRncRestarted
enodeb_reset eNodeBUeConnectionsRemoved

Printout Example

The following is an example of an event information printout:

-----------------------------------------
### Events
-----------------------------------------

hlr_restart :5
hss_restart :0
gtp_node_restart :0
gs_msc_restart :0
sgs_msc_restart :0
rnc_restart :28
enodeb_reset :5
-----------------------------------------

4.7 Subscriber Statistics


Subscriber Statistics show, for example, the number of attached subscribers,
the number of active PDP contexts, and the number of active bearers. The
subscriber statistic gauges are always checked against the configured allowed
range. The current values of the performance monitoring gauges are used
as input. The subscriber statistics name, and its corresponding performance
monitoring gauge, is shown in Table 2.

8 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Health Check Data Collection Components

Table 2 Subscriber Statistics Name and Performance Monitoring Gauge


Subscriber Statistic Name Performance Monitoring Gauge
sau_gsm MM.NbrActAttachedSub.G
sds_gsm SM.NbrActivePdpPerSgsn.G
pdp_gsm SM.NbrActPdpContext.G
ready_state_gsm nbrOfSubReady
sau_wcdma MM.NbrActAttachedSub.U
sds_wcdma SM.NbrActivePdpPerSgsn.U
pdp_wcdma SM.NbrActPdpContext.U
pmm_connected_wcdma MM.NbrSubPmmConnected
sau_lte VS.MM.NbrActAttachedSub.E
sau_sgs_lte VS.MM.NbrCsAttachedSub.E
sau_s102_lte VS.CDMA.NbrRegisteredUE.E
active_bearer_lte VS.SM.NbrActBearer.E
active_pdn_lte VS.SM.NbrActDefaultBearer.E
ecm_connected_lte VS.MM.NbrEcmConnectedSub.E
subs_with_ongoing_signalling subscribersInTransitionalState
subs_no_ongoing_signalling gsnSessionResilienceNotInitializedC
onnections

Printout Example

The following is an example of a subscriber statistics printout:

-----------------------------------------
### Subscriber statistics
-----------------------------------------

sau_gsm :229477 (Allowed range : 0 - 3000000)


sds_gsm :138623
pdp_gsm :167042 (Allowed range : 0 - 3000000)
ready_state_gsm :23935
sau_wcdma :241039 (Allowed range : 0 - 3000000)
sds_wcdma :157360
pdp_wcdma :193648 (Allowed range : 0 - 3000000)
pmm_connected_wcdma :27911
sau_lte :268554 (Allowed range : 0 - 3000000)
sau_sgs_lte :110412 (Allowed range : 0 - 3000000)
sau_s102_lte :0 (Allowed range : 0 - 3000000)
active_bearer_lte :320843 (Allowed range : 0 - 5000000)
active_pdn_lte :277737 (Allowed range : 0 - 5000000)
ecm_connected_lte :162012

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 9


Health Check

subs_with_ongoing_signalling :5140
subs_no_ongoing_signalling :0

4.8 Failure Ratio Key Performance Indicators


Failure Ratio KPIs show the failure ratios for signalling procedures. For
definitions, see GSM KPIs, WCDMA KPIs, and LTE KPIs. The Failure Ratio
KPIs are always checked against the configured allowed range. The latest
available values from PDC KPI are used as input.

Printout Example

The following is an example of a Failure Ratio KPI information printout:

-----------------------------------------
### Failure Ratio Key Performance Indicators (KPIs)
-----------------------------------------

attach_gsm :0.0% (Allowed range : <= 0.1)


pdp_activation_gsm :0.0% (Allowed range : <= 0.1)
intra_rau_gsm :0.0% (Allowed range : <= 0.1)
israu_gsm :0.0% (Allowed range : <= 0.1)
paging_gsm :0.0% (Allowed range : <= 0.1)
pdp_cut_off_gsm :0.0% (Allowed range : <= 0.1)
attach_wcdma :0.0% (Allowed range : <= 0.1)
pdp_activation_wcdma :0.0% (Allowed range : <= 0.1)
intra_rau_wcdma :0.0% (Allowed range : <= 0.1)
israu_wcdma :0.1% (Allowed range : <= 0.1)
paging_wcdma :0.0% (Allowed range : <= 0.1)
pdp_cut_off_wcdma :0.0% (Allowed range : <= 0.1)
rab_establishment_wcdma :0.0% (Allowed range : <= 0.1)
service_request_wcdma :0.0% (Allowed range : <= 0.1)
attach_lte :0.0% (Allowed range : <= 0.1)
intra_tau_lte :0.0% (Allowed range : <= 0.1)
inter_mme_tau_lte :0.0% (Allowed range : <= 0.1)
x2_handover_lte :0.0% (Allowed range : <= 0.1)
s1_handover_lte :0.0% (Allowed range : <= 0.1)
paging_lte :0.0% (Allowed range : <= 0.1)
bearer_establishment_lte :0.0% (Allowed range : <= 0.1)
service_request_lte :0.0% (Allowed range : <= 0.1)

4.9 Payload
Payload shows the payload statistics, in Mbps, for Gb, Iu and Gn interfaces.
The measured payload is always checked against the configured allowed
range. The latest available values from PDC KPI are used as input.

Printout Example

10 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Health Check Data Collection Components

The following is an example of a payload information printout:

-----------------------------------------
### Payload
-----------------------------------------

uplink_mbps_gb :0.978 (Allowed range : 0 - 10000)


downlink_mbps_gb :3.623 (Allowed range : 0 - 10000)
uplink_mbps_iu :3.762 (Allowed range : 0 - 10000)
downlink_mbps_iu :22.688 (Allowed range : 0 - 10000)
uplink_mbps_gn :4.740 (Allowed range : 0 - 10000)
downlink_mbps_gn :26.311 (Allowed range : 0 - 10000)

4.10 IP Connectivity
IP connectivity performs connectivity checks using ICMP towards external
nodes. The option check_ip_connectivity specifies if IP connectivity check is to
be performed or not. IP connectivity is performed towards for the configured
DNS and NTP servers. IP connectivity check can also be performed towards
OSS and MM nodes, but the IP addresses need to be configured with the
parameters. The parameters oss_ip_addresses and mm_ip_addresses specify
the IP addresses. The IP connectivity check is enabled by default, but can
be disabled.

Note: The IP address connected to the IP service CDR-FTP is used when


communicating with MM nodes.

Printout Example

The following is an example of an IP connectivity check printout:

-----------------------------------------
### IP connectivity check
-----------------------------------------

dns_server_10.10.104.49 :ok (Allowed range : ok)


dns_server_10.10.104.51 :ok (Allowed range : ok)
ntp_server_10.35.250.184 :ok (Allowed range : ok)
oss_10.35.250.184 :ok (Allowed range : ok)
mm_10.10.104.49 :ok (Allowed range : ok)
mm_10.10.104.51 :ok (Allowed range : ok)

4.11 Services
The CDR generation check verifies that CDRs are generated and written to a
file in the SGSN-MME. The option check_cdr_generation specifies if CDR
generation check is to be performed or not. The CDR generation check is
enabled by default, but can be disabled.

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 11


Health Check

Note: The check fails if there is no traffic causing CDR closure.

Printout Example

The following is an example of a services check printout:

-----------------------------------------
### Services check
-----------------------------------------

cdr_generation :ok (Allowed range : ok)

4.12 Mobility Events


The mobility event log files are analyzed during a configured time interval.
The option check_mobility_event specifies if analysis of the mobility event
logs is to be performed or not. The mobility event check is enabled by default,
but can be disabled.

The parameter mobility_event_history_time specifies the measurement time


interval for mobility events.

Printout Example

The following is an example of a mobility event printout:

-----------------------------------------
### Mobility Events
-----------------------------------------

gmm_cause_network_failure_cc17 :738

4.13 Session Events


The session event log files are analyzed during a configured time interval.
The option check_session_event specifies if analysis of session event logs is
to be performed or not. The session event check is enabled by default, but
can be disabled.

The parameter session_event_history_time specifies the measurement time


interval for session events.

Printout Example

The following is an example of a session event printout:

-----------------------------------------
### Session Events
-----------------------------------------

12 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Health Check Data Collection Components

sm_cause_missing_or_unknown_apn_cc27 :1000

4.14 Interface Status


The status of different interfaces is checked if the interface status check is
enabled. The following options are available for each interface status check:

• The option check_gras specifies if the GSM RA status check is to be


performed or not.

• The option check_cells specifies if the GSM cell status check is to be


performed or not.

• The option check_nses specifies if the GSM NSE status check is to be


performed or not.

• The option check_rncs specifies if the RNC status check is to be performed


or not.

• The option check_tas specifies if the TA status check is to be performed or


not.

• The option check_enodebs specifies if the eNodeB status check is to be


performed or not.

• The option check_ss7 specifies if the SS7 status check is to be performed


or not.

• The option check_diameter_peers specifies if the diameter peer status


check is to be performed or not.

The check is disabled by default because the procedure is time-consuming for


large configurations. The interface status check is always disabled when the
health check is executed automatically.

Printout Example

The following is an example of an interface status printout:

-----------------------------------------
### Interface check
-----------------------------------------

****************************************
*** Gb
****************************************
nbr_ras_gsm :28
nbr_operational_cells_gsm :84
nbr_not_operational_cells_gsm :0
nbr_gbip_nses :7
nbr_gbip_operational_remote_ip_eps :21
nbr_gbip_not_operational_remote_ip_eps :0

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 13


Health Check

****************************************
*** Iu-C
****************************************
nbr_operational_rncs :1
nbr_not_operational_rncs :0
****************************************
*** S1-MME
****************************************
nbr_tas_lte :12
nbr_operational_enodebs :15
nbr_not_operational_enodebs :0
****************************************
*** Diameter Peers
****************************************
nbr_operational_diameter_peers :7
nbr_not_operational_diameter_peers :0
****************************************
*** SS7
****************************************
nbr_operational_mtpl3_links :0
nbr_not_operational_mtpl3_links :0
nbr_operational_m3ua_assoc :2
nbr_not_operational_m3ua_assoc :0
****************************************

4.15 Additional Logs


Additional logs with detailed information are created. The logs are stored per
data collection in one of the following directories:

• Automatic health check: /tmp/DPE_COMMONLOG/health_check/dat


a_automatic/<data_collection_name>

• Manual health check: /tmp/DPE_COMMONLOG/health_check/data/<d


ata_collection_name>

The following filenames are used:

• Active alarms: active_alarms.txt

• History of raised and cleared alarms: alarms_history.txt

• Alarm intensity: alarms_intensity_log.txt

• Triggered events: events.txt

• Event intensity: events_intensity_log.txt

• Mobility event details: mobility_event_details_log.txt

14 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Health Check Data Collection Components

• Mobility event intensity: mobility_event_intensity_log.txt

• Session event details: session_event_details_log.txt

• Session event intensity: session_event_intensity_log.txt

• Non-operational GSM cells: not_operational_cells_gsm.txt

• Non-operational remote IP end points for Gb over IP: not_operationa


l_remote_ip_eps.txt

• Non-operational RNCs: not_operational_rncs.txt

• Non-operational MTP-L3 links: not_operational_mtpl3_links.txt

• Non-operational M3UA associations: not_operational_m3ua_assoc.


txt

• Non-operational eNodeBs: not_operational_enodebs.txt

• Packet loss PM counters: packet_loss_log.txt

• Overload protection PM counters: olp_log.txt

• Mobility Management PM counters and gauges: mm_log.txt

• Session Management PM counters and gauges: sm_log.txt

• SS7 PM counters and gauges:ss7_log.txt

4.16 Verdict
Verdict presents the number of passed and failed checks.

Printout Example

The following is an example of a verdict printout:

+++++++++++++++++++++++++++++++++++++++++
+++ Verdict
+++++++++++++++++++++++++++++++++++++++++

nbr_passed 39
nbr_failed 5
+++++++++++++++++++++++++++++++++++++++++

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 15


Health Check

5 Health Check Procedure


The health check procedure covers the following:

• Creating a health check data collection, or viewing an existing one

• Required checks of the health check data collection components

• Required actions depending on the result of the checks

For more information on available commands, see Section 6 on page 19.


For more information on the components in the data collection, see the
corresponding section in Section 4 on page 4.

Note: Create a reference health check data collection when the SGSN-MME
and network are working perfectly.

Perform a health check according to the following instructions.

Instructions

1. Display the contents of a health check data collection by creating a new


manual health check data collection, or viewing an existing health check
data collection.

• Use the health_check.pl -l command to display the existing data


collections.

• Use the health_check.pl -c <data_collection_name>


command to trigger a manual health check data collection.

Example

health_check.pl -c post_check.

• Use the health_check.pl -s <data_collection_number>


command to view an existing health check data collection.

Example

health_check.pl -s 60.

Data collection numbers are used as input to display an existing data


collection, for both manually and automatically created health check data
collection. Data collection names are used when triggering a manual health
check data collection.

2. Check the ISP information.

If any restarts or failures have occurred, perform a data collection. For


more information, see Data Collection Guideline

16 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Health Check Procedure

3. Check the disk usage.

If the disk usage is above the allowed range, free disk space by deleting
all unnecessary data on the SGSN-MME, such as software configurations
and expired logs. Backup takes longer time if the partition core has high
disk usage level.

4. Check the CPU load.

If the CPU load is above the allowed range, deploy additional capacity in
the network. For more information, see Characteristics.

5. Check the active, and the cleared, alarms.

If there are active alarms, check the file /tmp/DPE_COMMONLOG/health_


check/data/<data_collection>/active_alarms.txt. Act on all
active alarms as described in the corresponding alarm description.

If there are cleared alarms, check the following files to find out which alarms
have been activated and cleared since the last collection occasion:

• /tmp/DPE_COMMONLOG/health_check/data/<data_collectio
n>/alarms_history.txt

• /tmp/DPE_COMMONLOG/health_check/data/<data_collectio
n>/alarms_intensity_log.txt

6. Check the events.

Check the following files:

• /tmp/DPE_COMMONLOG/health_check/data/<data_collecti
on>/events.txt

• /tmp/DPE_COMMONLOG/health_check/data/<data_collectio
n>/events_intensity_log.txt

Act on the events as described in the corresponding event description.

7. Check the subscriber statistics.

If the expected subscriber statistics values are not met, analyze the
following logs. The logs are found in /tmp/DPE_COMMONLOG/health_ch
eck/data/<data_collection>.

• mobility_event_details_log.txt

• mobility_event_intensity_log.txt

• session_event_details_log.txt

• session_event_intensity_log.txt

• olp_log.txt

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 17


Health Check

• mm_log.txt

• sm_log.txt

• ss7_log.txt

Analyze the following logs, If available:

• not_operational_cells_gsm.txt

• not_operational_remote_ip_eps.txt

• not_operational_rncs.txt

• not_operational_mtpl3_links.txt

• not_operational_m3ua_assoc.txt

• not_operational_enodebs.txt

For further information, see the corresponding section in Troubleshooting.

8. Check the Failure Ratio KPIs.

If the expected KPI values are not met, analyze the following logs. The
logs are found in /tmp/DPE_COMMONLOG/health_check/data/<da
ta_collection>.

• mobility_event_details_log.txt

• mobility_event_intensity_log.txt

• session_event_details_log.txt

• session_event_intensity_log.txt

• olp_log.txt

• mm_log.txt

• sm_log.txt

• ss7_log.txt

For further information, see the corresponding section in Troubleshooting.

9. Check the payload.

If the expected payload values are not met, analyze the following logs. The
logs are found in /tmp/DPE_COMMONLOG/health_check/data/<da
ta_collection>.

• packet_loss_log.txt

• not_operational_cells_gsm.txt

18 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Health Check Procedure

• not_operational_remote_ip_eps.txt

• not_operational_rncs.txt

• not_operational_mtpl3_links.txt

• not_operational_m3ua_assoc.txt

For further information, see the corresponding section in Troubleshooting.

10. Check the IP connectivity.

If the IP connectivity has failed, see Troubleshooting.

11. Check the CDR generation.

Contact your local Ericsson support if CDR generation does not work, see
Section 9 on page 34.

Note: The verdict from the CDR generation check is Failed if there is
no traffic causing CDR closure.

12. Check the interface status.

If non-operational interfaces are found, analyze the following logs. The


logs are found in /tmp/DPE_COMMONLOG/health_check/data/<da
ta_collection>.

• not_operational_cells_gsm.txt

• not_operational_remote_ip_eps.txt

• not_operational_rncs.txt

• not_operational_mtpl3_links.txt

• not_operational_m3ua_assoc.txt

• not_operational_enodebs.txt

For further information, see the corresponding section in Troubleshooting.

Note: The health check only checks SCTP associations used by M3UA. If
needed, use the sctp_status command to check other SCTP
associations.

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 19


Health Check

6 Health Check Operations

The health_check.pl command is a complex tool that can be used to perform


health check operations in the UNIX™ shell. The following subsections give
detailed descriptions of the operations.

6.1 View Information on Command Use


To view information regarding use of the different health check commands, use
the command health_check.pl -h . Information regarding the available
commands is displayed on the screen.

6.2 List Available Health Check Data Collections


Both manually and automatically created health check data collections can be
displayed. The data collections are accessed using the data collection number,
which is displayed within parentheses for each data collection.

The automatic data collections are given numbers from 1 through 60. After 10
days, the oldest data collection is overwritten.

To list the stored health check data collections, use the command
health_check.pl -l. A list of available health check data collections is
displayed on the screen.

6.3 Trigger Manual Health Check Data Collection


To trigger a manual health check data collection, use the command
health_check.pl -c <data_collection_name>. The data collected
during the manual health check is displayed on the screen.

Up to 20 manually triggered data collections can be stored.

Example

health_check.pl -c post_check

6.4 Calculate Differences between Two Health Check


Data Collections
The difference between two data collections can be calculated to compare two
data collections, for example, before and after an upgrade.

20 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Health Check Operations

To calculate differences between two health check data collections, use the
command health_check.pl -d <data_collection_number1,data_c
ollection_number2>. The result of the comparison is displayed on the
screen, showing the data for each collection in the first two columns and the
difference between two collections in the third column.

Example

health_check.pl -d 79,78

6.5 Display Single or Multiple Health Check Data


Collections
One or several health check data collections can be displayed simultaneously
to compare multiple data collections and analyze trends.

To display health check data collections, use the command health_check.pl


-s <data_collection_number,data_collection_number,...>. The
selected health check data collection or collections are displayed on the screen.
Each column contains information for one health check.

Example

health_check.pl -s 57

health_check.pl -s 60,59,68,57,56

6.6 Remove Health Check Data Collections


A maximum of 20 manual health check data collections can be stored. To
create new manual health check data collections, existing data collections must
be removed when the limit has been reached. Automatic health check data
collections cannot be removed manually.

To remove manual health check data collections, use the command


health_check.pl -r <data_collection_number,data_collect
ion_number,...>.

Example

health_check.pl -r 61,62,63

6.7 Configure Allowed Ranges


The allowed ranges for the Data Collection components can be configured.

To configure the allowed ranges, use the command health_check.pl -a


configure.

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 21


Health Check

To view the allowed ranges, use the command health_check.pl -a list.

To reset the configured allowed ranges to default values, use the command
health_check.pl -a reset.

The following values are accepted when configuring the allowed ranges:

• dc, which is used to exclude the parameter from verdict

• Single number, which is used to enter decimal values, such as 0.1

• Integer range (min_integer-max_integer), which is used to enter a


range between two integers, such as 1000-2000

health_check.pl -a configureThe following allowed_ranges


are configured:### Disk Usage
### CPU Load MKVIII specific
load_ncb_1.9 80
load_ncb_1.15 80
load_ss7_sctp 80
### Alarms
alarms_active_critical 0
alarms_active_major 0
alarms_active_minor dc
alarms_active_warning dc
alarms_cleared_critical 0
alarms_cleared_major 0
alarms_cleared_minor dc
alarms_cleared_warning dc
### Subscriber Statistics
sau_gsm 0-3000000
pdp_gsm 0-3000000
sau_wcdma 0-3000000
pdp_wcdma 0-3000000
sau_lte 0-3000000
active_bearer_lte 0-5000000
### Failure Ratio Key Performance Indicators
attach_gsm 0.1
pdp_activation_gsm 0.1
intra_rau_gsm 0.1
israu_gsm 0.1
paging_gsm 0.1
pdp_cut_off_gsm 0.1
attach_wcdma 0.1
pdp_activation_wcdma 0.1
intra_rau_wcdma 0.1
israu_wcdma 0.1
paging_wcdma 0.1
pdp_cut_off_wcdma 0.1
rab_establishment_wcdma 0.1
service_request_wcdma 0.1
attach_lte 0.1

22 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Health Check Operations

x2_handover_lte 0.1
s1_handover_lte 0.1
paging_lte 0.1
bearer_establishment_lte 0.1
service_request_lte 0.1
intra_mme_tau_lte 0.1
inter_mme_tau_lte 0.1
intra_isc_tau_lte 0.1
inter_isc_tau_lte 0.1
irat_ho_mme_source_lte 0.1
irat_ho_mme_target_lte 0.1
### Payload uplink_mbps_gb 0-10000
downlink_mbps_gb 0-10000
uplink_mbps_iu 0-10000
downlink_mbps_iu 0-10000
uplink_mbps_gn 0-10000
downlink_mbps_gn 0-10000
Example 1

6.8 Configure Options


To configure the options, use the command health_check.pl -o
configure.

To view the configured options, use the command health_check.pl -o


list.

To reset the configured options to default values, use the command


health_check.pl -o reset.

The following values are accepted when configuring the options: yes|no.

6.8.1 Options
The option show_gsm specifies if GSM-related information is to be included in
the health check printout or not.

The option show_wcdma specifies if WCDMA-related information is to be


included in the health check printout or not.

The option show_lte specifies if LTE-related information is to be included in


the health check printout or not.

The option check_ip_connectivity specifies if IP connectivity check is to


be performed or not.

The option check_cdr_generation specifies if CDR generation check is to


be performed or not.

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 23


Health Check

The option check_mobility_event specifies if analysis of the mobility event


logs is to be performed or not.

The option check_session_event specifies if analysis of session event


logs is to be performed or not.

The option check_gras specifies if the GSM RA status check is to be


performed or not. At automatic execution, the check is never included.

The option check_cells specifies if the GSM cell status check is to be


performed or not. At automatic execution, the check is never included.

The option check_nses specifies if the GSM NSE status check is to be


performed or not. At automatic execution, the check is never included.

The option check_rncs specifies if the RNC status check is to be performed


or not. At automatic execution, the check is never included.

The option check_tas specifies if the TA status check is to be performed or


not. At automatic execution, the check is never included.

The option check_enodebs specifies if the eNodeB status check is to be


performed or not. At automatic execution, the check is never included.

The option check_ss7 specifies if the SS7 status check is to be performed or


not. At automatic execution, the check is never included.

The option check_diameter_peers specifies if the diameter peer status


check is to be performed or not. At automatic execution, the check is never
included.

6.9 Configure Parameters


To configure the parameters, use the command health_check.pl -p
configure.

To view the configured parameters, use the command health_check.pl


-p list.

To reset the configured parameters to default values, use the command


health_check.pl -p reset.

The following values are accepted when configuring the parameters:

• List of IP addresses, specified as dot-decimal format with comma between


each IP address, for example, 10.10.10.10,20.20.20.20

• Time, specified in number of hours, such as 24

24 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Parameter Description

6.9.1 Parameters
The parameter alarm_event_history_time specifies the measurement
time interval for alarms and events.

The parameter isp_history_time specifies the measurement time interval


for ISP events.

The parameter mm_ip_addresses specifies the IP addresses of the


Multi-Mediation nodes.

The parameter mobility_event_history_time specifies the measurement


time interval for mobility events.

The parameter oss_ip_addresses specifies the IP addresses of the OSS


nodes.

The parameter session_event_history_time specifies the measurement


time interval for session events.

7 Parameter Description

This section describes the parameters used by the health check tool.

A general description is presented for each parameter, and in addition, the


following attributes are described:

Valid for Specifies the radio network access type for which the
parameter is valid.

Data Type Specifies the data type of the parameter.

Syntax Specifies the format in which the value of the parameter


is specified.

Value Range Specifies the valid parameter values.

Default Value Specifies the parameter value set if no value is explicitly


given.

Activation Specifies the activation method required for the


parameter change to take effect.

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 25


Health Check

Related Commands
Lists the UNIX commands that can be used to display
and manage the parameter.

7.1 alarm_event_history_time
The parameter alarm_event_history_time specifies the measurement
time interval for alarms and events. At automatic execution, the parameter is
always set to 5 hours.

Valid for GSM, WCDMA, LTE

Data Type Integer

Value Range 0-24

Default Value 24

Activation Runtime

Related Commands
health_check.pl -p configure|list|reset

7.2 isp_history_time
The parameter isp_history_time specifies the measurement time interval
for ISP events. At automatic execution, the parameter is always set to 720
hours.

Valid for GSM, WCDMA, LTE

Data Type Integer

Value Range 0-2300

Default Value 720

Activation Runtime

Related Commands
health_check.pl -p configure|list|reset

7.3 mm_ip_addresses
The parameter mm_ip_addresses specifies the IP addresses of the
Multi-Mediation nodes.

Valid for GSM, WCDMA

Data Type String

26 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Parameter Description

Syntax Comma-separated list of IP addresses:


[0-255].[0-255].[0-255].[0-255],[0-
255].[0-255].[0-255].[0-255],.

Value Range No defined value range

Default Value No default value

Activation Runtime

Related Commands
health_check.pl -p configure|list|reset

7.4 mobility_event_history_time
The parameter mobility_event_history_time specifies the measurement
time interval for mobility events. At automatic execution, the parameter is
always set to 5 hours.

Valid for GSM, WCDMA, LTE

Data Type Integer

Value Range 0-24

Default Value 24

Activation Runtime

Related Commands
health_check.pl -p configure|list|reset

7.5 oss_ip_addresses
The parameter oss_ip_addresses specifies the IP addresses of the OSS
nodes.

Valid for GSM, WCDMA, LTE

Data Type String

Syntax Comma-separated list of IP addresses:


[0-255].[0-255].[0-255].[0-255],[0-
255].[0-255].[0-255].[0-255],..

Value Range No defined value range

Default Value No default value

Activation Runtime

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 27


Health Check

Related Commands
health_check.pl -p configure|list|reset

7.6 session_event_history_time
The parameter session_event_history_time specifies the measurement
time interval for session events. At automatic execution, the parameter is
always set to 5 hours.

Valid for GSM, WCDMA, LTE

Data Type Integer

Value Range 0-24

Default Value 24

Activation Runtime

Related Commands
health_check.pl -p configure|list|reset

8 Option Descriptions

This section describes the options used by the health check tool.

A general description is presented for each option, and in addition, the following
attributes are described:

Valid for Specifies the radio network access type for which the
parameter is valid.

Data Type Specifies the data type of the parameter.

Syntax Specifies the format in which the value of the parameter


is specified.

Value Range Specifies the valid parameter values.

Default Value Specifies the parameter value set if no value is explicitly


given.

Activation Specifies the activation method required for the


parameter change to take effect.

28 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Option Descriptions

Related Commands
Lists the UNIX commands that can be used to display
and manage the parameter.

8.1 show_gsm
The option show_gsm specifies if GSM-related information is to be included in
the health check printout or not.

Valid for GSM

Data Type String

Value Range yes, no

Default Value yes

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

8.2 show_wcdma
The option show_wcdma specifies if WCDMA-related information is to be
included in the health check printout or not.

Valid for WCDMA

Data Type String

Value Range yes, no

Default Value yes

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

8.3 show_lte
The option show_lte specifies if LTE-related information is to be included in
the health check printout or not.

Valid for LTE

Data Type String

Value Range yes, no

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 29


Health Check

Default Value yes

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

8.4 check_ip_connectivity
The option check_ip_connectivity specifies if IP connectivity check is to
be performed or not.

Valid for GSM, WCDMA, LTE

Data Type String

Value Range yes, no

Default Value yes

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

8.5 check_cdr_generation
The option check_cdr_generation specifies if CDR generation check is to
be performed or not.

Valid for GSM, WCDMA

Data Type String

Value Range yes, no

Default Value yes

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

8.6 check_mobility_event
The option check_mobility_event specifies if analysis of the mobility event
logs is to be performed or not.

Valid for GSM, WCDMA, LTE

Data Type String

30 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Option Descriptions

Value Range yes, no

Default Value yes

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

8.7 check_session_event
The option check_session_event specifies if analysis of session event
logs is to be performed or not.

Valid for GSM, WCDMA, LTE

Data Type String

Value Range yes, no

Default Value yes

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

8.8 check_gras
The option check_gras specifies if the GSM RA status check is to be
performed or not. At automatic execution, the check is never included.

Valid for GSM

Data Type String

Value Range yes, no

Default Value no

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

8.9 check_cells
The option check_cells specifies if the GSM cell status check is to be
performed or not. At automatic execution, the check is never included.

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 31


Health Check

Valid for GSM

Data Type String

Value Range yes, no

Default Value no

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

8.10 check_nses
The option check_nses specifies if the GSM NSE status check is to be
performed or not. At automatic execution, the check is never included.

Valid for GSM

Data Type String

Value Range yes, no

Default Value no

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

8.11 check_diameter_peers
The option check_diameter_peers specifies if the diameter peer status
check is to be performed or not. At automatic execution, the check is never
included.

Valid for GSM

Data Type String

Value Range yes, no

Default Value no

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

32 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30


Option Descriptions

8.12 check_rncs
The option check_rncs specifies if the RNC status check is to be performed
or not. At automatic execution, the check is never included.

Valid for WCDMA

Data Type String

Value Range yes, no

Default Value no

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

8.13 check_tas
The option check_tas specifies if the TA status check is to be performed or
not. At automatic execution, the check is never included.

Valid for LTE

Data Type String

Value Range yes, no

Default Value no

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

8.14 check_enodebs
The option check_enodebs specifies if the eNodeB status check is to be
performed or not. At automatic execution, the check is never included.

Valid for LTE

Data Type String

Value Range yes, no

Default Value no

Activation Runtime

10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30 33


Health Check

Related Commands
health_check.pl -o configure|list|reset

8.15 check_ss7
The option check_ss7 specifies if the SS7 status check is to be performed or
not. At automatic execution, the check is never included.

Valid for GSM, WCDMA

Data Type String

Value Range yes, no

Default Value no

Activation Runtime

Related Commands
health_check.pl -o configure|list|reset

9 Support

If errors detected during the health check cannot be resolved, refer to


Troubleshooting or contact your local Ericsson support.

34 10/1543-AXB 250 05/8-V2 Uen BB | 2017-03-30

You might also like