IPL1A RC35 Upgrade Artifacts

17.b.
Pre-Maintenance Check Manual (Non-Automated Requirements)
Step 17.b.1: Check Nodes Status
Time Line: 5 minutes.
Action: Execute the commands below on the Fuel node

Login into fuel node ssh {fuel IP} use user as ATT UAM and become root by typing toor
fuel node shows "ready". No nodes in discover or error. Online field should be 1.
Step 17.b.2: Check Fuel Environment

Login into fuel node ssh {fuel IP} use user as ATT UAM and become root by typing toor
fuel env shows the environment "operational" (dummy_env can be ignored)
Step 17.b.3: Request storage operations to backup of all vLCP LUNs (Large Sites Only)
Time Line: 72 hours prior to CW.
Action: Submit request to backup of all vLCP LUNs in site.
SRTS Ticket needs to be created 72 hours ahead of CW AIC STORAGE OPERATIONS

SUPPORT: http://ushportal.it.att.com/step2.cfm?app=3962&home=ush
Select AIC SNAPSHOT BACKUP & RESTORE and then fill out the information requested by the
form.
Step 17.b.4: Backup contrail controller and config database.
Action: On the Contrail controller VM, as fuel admin execute -

ssh <contrail-controller-vm>
create backup folder -

mkdir -p /var/tmp/contrailbackuppre/$(date +"%Y%m%d%H%M")
cd /var/tmp/contrailbackuppre/<foldercreatedabove>
check if there is sufficient space under /var/tmp for backup

df -h
Note -
For Large sites - make sure 60 GB free space on /var partition.
For Medium sites - make sure 25 GB free space on /var partition.
initiate backup -
sudo python /usr/lib/python2.7/dist-packages/cfgm_common/db_json_exim.py --export-to backup.json
chmod -Rf 640 /var/tmp/contrailbackuppre/
Results/Descriptions:
Backup tar files per contrail node is created in the backup folder.
Step 17.b.5: Backup xml of vms on the kvms
Action: On OpsSimpleVM execute -
~~ ssh @
sudo su - m96722 OR sudo -iu m96722
cd /home/m96722/aic/
Check if there is sufficient disk space in /var/tmp for backup
ansible kvm_host -i inventory -m shell -sa "df -h"
Take the dump of xml files of all the vms and copy it to the jump_host
ansible-playbook -i inventory/ playbooks/infraxml_dump.yaml ~~
Step 17.b.6 Check for Contrail Zookeeper Cassandra Discrepancies
From Fuel vm login into contrail controller node,

ssh <contrail-controller-node-IP>
sudo /usr/localcw/bin/eksh
python /usr/lib/python2.7/dist-packages/vnc_cfg_api_server/db_manage.py --api-conf /etc/contrail/contrail-api.conf
--verbose check
If any of the checkers report failure to contrail Ops. Do NOT proceed with MOP execution until data
inconsistencies are remediated.
Step 17.b.7 Pre-upgrade checks
https://codecloud.web.att.com/projects/ST_CCP/repos/aic-docs/browse/docs/dg/preupgrade.md
Step 17.b.8 Contrail checks
Contrail Checks - https://att.sharepoint.com/:w:/r/sites/NCOMOPS/MOP%20Document%20Library/

AIC-MOP-195_Contrail_Control_Plane_Generic_checks.docx?
d=w6469387ef976478ab2ea94cbdc1652dd&csf=1&web=1&e=rhhr6W
Step 17.b.9: Backup Fuel Nailgun database.

Connect to Fuel node with your personal attuid ssh {fuel IP} and become root using toor
Check if there is sufficient disk space in /var/log for backup

df -h
For mediums, recommend a minimum of 10GB free space for backup.
For large sites, recommend a minimum of 20GB free space for backup.
create backup folder -

mkdir /var/log/fuelbackuppre
cd /var/log/fuelbackuppre
sudo -u postgres pg_dump nailgun > nailgun.dump
gzip nailgun.dump
chmod -f 640 /var/log/fuelbackuppre/*
chmod -f 740 /var/log/fuelbackuppre
Results/Descriptions: Nailgun dump file is created in the backup folder.

Step 17.b.10: Backup Fuel settings
Time Line: 5 minutes

mkdir /var/log/fuelbackuppre
cd /var/log/fuelbackuppre
identify fuel env id and set variable

fuel env
export envid=<envid>
fuel settings download --env $envid
fuel network download --env $envid
Step 17.b.11: LCM status
Action: Login to Foreman UI

 only LDAP accounts added to the groups(AP-AIC_ForemanViewerGroup or AP-
AIC_ForemanAdminGroup) can be used to access Foreman UI
 verify that the nodes are green, if they are not in sync, review node reports to
assess severity. Based on severity raise a Jira - https://jira.web.labs.att.com/issues
 Attach node report to the ticket
In the hosts tab, check that all the nodes except the non-openstack nodes are pointing at the
previous release_candidate.
 The only nodes which can be pointing at production are the non-openstack nodes.
Results/Description:
 Remove LCM nodes from Foreman:
In Hosts -> All hosts find and select all aic-influxdb,aic-elasticsearch and aic-alerting nodes
Remove them: Select Action -> Delete hosts
Step 17.b.12: Update OpsSimple packages
Action: Execute the commands below on the Opssimple node

Connect to Opssimple node with your personal attuid ssh {opsc IP} and become root using toor
vi /etc/apt/sources.list
verify /etc/apt/sources.list contents is as per SIL/Production or IST sites from links below:
For all SIL and Production Sites - prod.repo
For IST sites dev.repo

sudo apt-get update
sudo apt-get install aic-opssimple-addons-large
sudo apt-get install aic-opssimple-backend
sudo apt-get install aic-opssimple-core-lcp
sudo apt-get install aic-opssimple-plugins-foreman
sudo apt-get install aic-opssimple-plugins-gstools
sudo apt-get install aic-opssimple-plugins-knownstate
sudo apt-get install aic-opssimple-plugins-mosaudit
sudo apt-get install aic-opssimple-plugins-nagios
sudo apt-get install aic-opssimple-plugins-releaseupgrade
sudo apt-get install aic-opssimple-plugins-ro
sudo apt-get install aic-opssimple-plugins-security
sudo apt-get install aic-opssimple-plugins-vipr (Only for Medium sites)
sudo apt-get install aic-opssimple-plugins-astra
sudo apt-get install aic-opssimple-plugins-contrail
Make sure below plugins updated from the latest release repo, by checking apt-cache policy
command.
Here is plugins list - Opssimple plugins list
Note: aic-lcm packages may be at a higher version they can be updated separately by MOP-355 to
latest versions (aic-opssimple-plugins-aiclcm, aic-lcm)
Results/Descriptions: Verify OpsSimple Packages are updated.

#####Install latest aiclcm package:
Execute MOP-355 Update aiclcm packages
https://att.sharepoint.com/:w:/r/sites/NCOMOPS/MOP%20Document%20Library/AIC-MOP-355-
Update_aic_lcm_packages.docx?
d=w787973068da2491288f93d6b4a0ea17e&csf=1&web=1&e=QKCsYL
#####Install certain version of packages on apollo node:

Execute AIC-MOP-563-MOP to pin debian packages to certain versions (See the link on the MOP in #16
Materials requirement)
Step 17.b.13: Create the Fuel dummy_env prior to export/import Opssimple_site if it does not
already exist

fuel env | grep dummy_env
If dummy_env was not present in the output of the above command then execute the
commands below to create the Fuel dummy_env.
fuel release
fuel env create --name dummy_env --release <Release id from above command>
Verify that the dummy_env was created in Fuel

fuel env | grep dummy_env
Results/Descriptions: commands ran sucessfully without errors and dummy_env exists in Fuel
Step 17.b.14: Check Foreman SSL certificates.
NOTE: from RC24 - New certificate for Foreman UI should exist (foreman-aic/foreman-
aic.includesprivatekey.pem) with Common Name: puppet..cci.att.com, and Alternative names:
lcma01..cci.att.com, lcma02..cci.att.com, lcma03..cci.att.com
cd /home/m96722/aic/files/certificates
openssl x509 -in foreman-aic/foreman-aic.includesprivatekey.pem -text -noout
Sample output for MTN11:

.....
Subject: C=US, ST=Michigan, L=Southfield, O=AT&T Services, Inc., CN=puppet.mtn11.cci.att.com
.....
X509v3 Subject Alternative Name:
DNS:zmtn11lcma01.mtn11.cci.att.com, DNS:zmtn11lcma02.mtn11.cci.att.com,
DNS:zmtn11lcma03.mtn11.cci.att.com, DNS:puppet.mtn11.cci.att.com
Results/Descriptions: for MTN11 - Common Name: puppet.mtn11.cci.att.com and Alternative
Name: zmtn11lcma01.mtn11.cci.att.com, DNS:zmtn11lcma02.mtn11.cci.att.com,
DNS:zmtn11lcma03.mtn11.cci.att.com, DNS:puppet.mtn11.cci.att.com **
Step 17.b.15: Verify Astra logging (Only required for Large sites with Astra.)
Action: On Opssimple VM, check the golden configuraion -

set use_syslog to 1 in Large sites which have Astra node and 0 which doesn't have Astra installed
Results/Descriptions: Verify use_syslog is set to 1 for large sites which have astra node and 0 which
doesn't have astra installed.
Step 17.b.16: Merge RC specific content into OpsSimple_site_yaml.
Action: On Opssimple VM, as m96722 user export current site yaml

ssh <opscvm>
cd /home/m96722/install
python3 /var/www/aic_opssimple/backend/manage.py export -f /home/m96722/before_rc.yaml <fuelenvname>
yaml
Check /home/m96722/before_rc.yaml in case any errors occurred. If any errors are found then stop
deployment and escalate to the appropriate release management chat specified in section 19.
./resetdb.sh
python3 /var/www/aic_opssimple/backend/manage.py import /home/m96722/before_rc.yaml yaml
cd /var/www/aic_opssimple/backend/
./manage.py export -f /home/m96722/latest.yaml <siteid> yaml
where siteid - represents site short code, obtained from enviorment details wiki.
Merge site yaml prepared in step 6 with above file and verify following
 To ensure the right list of repos: Replace the Repo section in the
opssimple_site.yaml by the Repo golden configuration Repo section
 To ensure the right list of packages: The list of package for a release candidate
will be provided by the CI/CD team has a yaml file. The *: present in the at&t
plugin must be replaced by *: latest
 Ensure that any new parameter has been added to the att-plugin,lcm-plugin
 Ensure that node_env in the lcm_plugin section is pointing at 3_0_3_RC<35>_<stable|
prod>
 If node_env is pointing at version 3.0.3_RC15_<stable|prod> and older ensure that
r10k_do_init is set to true in the lcm_plugin section.
 If node_env is pointing at version 3.0.3_RC16_<stable|prod> and newer ensure that
r10k_do_init is set to false in the lcm_plugin sections: ~~ r10k_do_init: description:
If r10k should create puppet env after deployment label: Create puppet
environment (R10k) restrictions: - action: hide condition: settings:fuel-plugin-
lcm.metadata.enabled != true type: checkbox value: false weight: 135 ~~
 Ensure for each compute ignore flag is set to false.
 Ensure that all passwords have been decrypted. If you see encrypted passwords,
you can decrypt them by the aiclcm tool (aic-lcm repository):
~~
** Update the Environment section to kernel version: ubuntu4Q21** ~~ Environment:
 domain_name: {{ localaliasnvp }}.cci.att.com

ubuntu_repo: http://ubuntumirror.it.att.com/ubuntu4Q21/ubuntu/ kernel_version:
3.13.0-187.generic enable_root_access: true enable_automation_mechid: false
verify_image_checksum: true
image_checksum_file_url: http://mirrors.it.att.com/images/infra/MD5SUMS automat
ion_uam_role: AIC Automation Prod is_local_storage: false name: {{ opsenvname }}
site_code: {{ site_code }} template: production type: idc_edc ~~
ssh aiclcm opssimplesite decrypt_passwd --mechid --path path/to/your/opssimple_site.yaml ~~
Please, check the output of the commands above. If you see any issues, please resolve them before
you proceed further. You can find an example of the output below (truncated):
./manage.py import /home/m96722/latest.yaml yaml
Results/Descriptions: Verify command ran sucessfully without errors

Step 17.b.16.1: Generate a random Redis Password and update OpsSimple_site_yaml.

ssh <opscvm>
Export the ops file which is imported (RC35) earlier using below command.
python3 /var/www/aic_opssimple/backend/manage.py export -f /home/m96722/ops_redis.yaml <fuelenvname>
yaml
Note: Make sure the above file is latest RC35 file.
Generate random password and Capture the output.
date +%s%N | md5sum | sha256sum | base64 | head -c 14 ; echo
Update the redis_password in "aic-fuel-plugin" of the exported home/m96722/ops_redis.yaml file

and import it using the below command.
python3 /var/www/aic_opssimple/backend/manage.py import /home/m96722/ops_redis.yaml yaml
Step 17.b.16.2: Check if db_password for cmha in Fuelpluginconfig block is default

ssh <opscvm>
Export the ops file which is imported (RC35) earlier using below command.
python3 /var/www/aic_opssimple/backend/manage.py export -f /home/m96722/ops_cmha.yaml <fuelenvname>
yaml
**Only Update the cmha db_password if the password is default **

Generate random password and Capture the output. date +%s%N | md5sum | sha256sum | base64 |
head -c 14 ; echo Update the db_password for cmha in "Fuelpluginconfig block" of the exported
home/m96722/ops_cmha.yaml file and import it using the below command. python3
/var/www/aic_opssimple/backend/manage.py import /home/m96722/ops_cmha.yaml yaml
Step 17.b.16.3: Refresh aic-lcm Configs
Action: On OpsSimple VM, Login using UAM and switch as m96722

ssh <attuid>@<opscvm>
Execute the following:
~~ 1. /var/www/aic_opssimple/backend/manage.py export <ENV_NAME> -f /tmp/.yml yaml ##

Make sure Foreman section in .yaml has proper Credentials. 2. aiclcm setup factory_reset 3. aiclcm
setup check 4. aiclcm setup create --opssimplefile=/tmp/.yml ## aiclcm.setup.yaml will be created 5.
aiclcm setup install --setupfile aiclcm.setup.yaml ## Select y, when prompted to confirm for config
replace 6. opssimple-backend restart 7. sudo service nginx restart 8. aiclcm opscops export_site --env
<ENV_NAME> --path /tmp/_exported_useing_aiclcm.yml ~~
Step 17.b.17: Generate Scripts
Action: On Opssimple VM, as m96722 execute the commands -

ssh <opscvm>
cd /var/www/aic_opssimple/backend
./manage.py generate_scripts <siteid>
Results/Descriptions: Verify there are no differences listed in allchecks.txt file.
Step 17.b.18: Restart the haproxy service on fuel
Time Line: 2 mins
Action: On opssimple VM, as m96722 execute following,

cd /home/m96722/aic ansible fuel_host -i inventory -m shell -sa "service haproxy restart"
Step 17.b.19: Update Repo Paths
Time Line: 2 mins
Action: On opssimple VM, as m96722 execute following,

./setup.sh configure_repo kvm_host
./setup.sh configure_repo jump_host

./setup.sh configure_repo nagios_host
./setup.sh configure_repo astra_host
./setup.sh configure_repo maas_host
Results/Descriptions: Steps completed without errors
Step 17.b.20: Update Apollo Package
Time Line: 10 hours prior to CW.
Action: On seed node, as ubuntu user execute following,

toor
eksh -l -o emacs'
su - ubuntu
sudo apt-get update
sudo apt-get install --upgrade aic-opssimple-apollo
sudo apt-get install python-prettytable
cd apollo/
sudo apollo maas configure_bind

Step 17.b.21: Update the foreman.ini file (for OAuth authentication)
Action: On opssimple VM, as m96722 execute following to backup, delete and regenerate the
foreman.ini file:
sudo apt-get install python-requests-oauthlib
cp ~/aic/foreman_api/foreman.ini ~/aic/foreman_api/foreman.ini.bak.$(date +"%Y-%m-%dT%H%M")
rm ~/aic/foreman_api/foreman.ini
ansible-playbook -i inventory/ playbooks/deploy_puppet_agent_hosts.yml --tags foremanini

18. ATS Bulletin
1. Access the ATS Bulletin site at the following URL: ATS Bulletin
2. Click on the List/Search button
3. Click on the "Search" Menu
4. Determine a Key word associated with the activity that you are performing and type
it in to the Keyword: (Hint) field
5. Click on the Submit Query button
6. Review each search result bulletin by clicking on the bulletin number to determine
if any of the bulletins have warnings/actions associated with the activity that you
are implementing ATS Bulletin
7. Associated bulletins listed in the following table:
Bulletins Table
Layer 3 - AIC
General AIC
19. Emergency Contacts
Contact (Team or Person) Contact Information
AIC Deployment Upgrades - Shashank Gupta sg944h 972-415-3973 AIC Platform Engineering - Hari
Om Singh hs571j 469-731-6556 Release Management Chat -
Large qto://meeting/q_rooms_cb12191468270764278/AIC+RC26.2%2C+RC29%2C+RC31+Large+De
ployments Release Management Chat -
Medium qto://meeting/q_rooms_cb12191534195381761/AIC+3.0.3+RC18.1+Medium+Deployments
+%28see+Meeting+Attibutes+for+current+zones+%26+CWs%29
20. Preliminary Implementation
Permissions for Openstack configuration file backups should be set to 640, and other files 640 or
600. The rule of least privilege should apply, but we need to make sure the DE has access to the files
for restore / backout purposes. This should not be a group that non-SA and non-DE users would be
in.
NOTE: All terminal input/output must be logged during the change.
Step 20.1.a: Schedule Downtime to Disable Nagios Alerts
For work covering the whole zone: (Please run below commands on Both Nagios servers
ngos01,ngos02 for Large Environments)
Action: Connect to Opssimple node using your personal attuid, switch to m96722, and then connect
to one of the Nagios hosts and become root
ssh {opsc IP}
sudo -iu m96722
ssh {nagios IP}
toor
/usr/local/bin/nagios-DE-sched-downtime-zone.sh [zone NAME] [HOURS] [Comment]
Example: Say I have an approved CR for PDK5 starting shortly expecting it to take 8 hrs: log onto
zpdk5ngos01.pdk5.cci.att.com
sudo /usr/localcw/bin/eksh v
/usr/local/bin/nagios-DE-sched-downtime-zone.sh pdk5 8 CR 4342837 // RM Notification would have CR
details
and on nagios GUI (downtime link, lower left under ‘System’ section)
Once the Schedule downtime is run on both Nagios servers, verify all servers in the zone have
expected downtime using the script zone-downtime-validation.sh and also login to Nagios UI and
validate downtime as per below screenshot.
Example of script zone-downtime-validation.sh usage for medium and large sites : This script is in
/usr/local/bin , should be run as root, and requires 2 inputs from command line – the first being the
‘zone’ (make sure it’s what Nagios recognizes - for example if you used ‘IPL1b’ it would fail to find
that since nagios only knows ipbin1b). the second value is the comment you expect to see from
setting the downtime.
root@dsvtxvcngos03.infra.aic.att.net:/home/sc3998# ./zone-downtime-validation.sh ipbin1b "IPBIN1B RC35 Upgrade"
ipbinrsv114.ipbin1b.infra.aic.att.net - all checks in downtime
ipbinvmopsc01.ipbin1b.infra.aic.att.net - all checks in downtime
ALL servers in the zone have expected downtime

here’s an example from large chg1b (ngos02)
root@zchg1bngos02:/home/sc3998# ./zone-downtime-validation.sh chg1b "bgscollect does not stay running"
chg1r03a004.chg1b.cci.att.com - WARNING - service checks greater than number of downtimes posted
chg1r22o003.chg1b.cci.att.com - WARNING - service checks greater than number of downtimes posted
zchg1bopsc01.chg1b.cci.att.com - WARNING - service checks greater than number of downtimes posted
zchg1bngos01.chg1b.cci.att.com - WARNING - service checks greater than number of downtimes posted
zchg1bngos02.chg1b.cci.att.com - WARNING - service checks greater than number of downtimes posted
zchg1brosv01.chg1b.cci.att.com - all checks in downtime
zchg1brosv02.chg1b.cci.att.com - all checks in downtime
NOT all the zone is in downtime. Examine output above for WARNING messages
NOTE: If all the zone is not in downtime , re run the nagios-DE-sched-downtime-zone.sh again
NOTE: hostname MUST match exactly as defined in nagios /etc/nagios3/conf.d/.cfg file for nagios3
version and /usr/local/nagios/etc/conf.d/host.* for nagios4
NOTE: Release Management Chat to be communicated via Q chat for duration Nagios alert would
be disabled
Release Management Chat - Large

- https://teams.microsoft.com/l/channel/19%3a660c634c071749efab96f2c58b853103%40thread.tacv
2/AIC%2520Large%2520Prod%2520Deployments?groupId=9f716cdc-d9de-45cf-8171-
f74c1e6f3bc5&tenantId=e741d71c-c6b6-47b0-803c-0f3b32b07556
Release Management Chat - Medium

- https://teams.microsoft.com/l/channel/19%3a26525ea7678c4642ab3aea18a373990f
%40thread.tacv2/AIC%2520Medium%2520Prod%2520Deployments?groupId=9f716cdc-d9de-45cf-
8171-f74c1e6f3bc5&tenantId=e741d71c-c6b6-47b0-803c-0f3b32b07556
Results/Descriptions: Nagios dashboard disabled and release management notified
20.1.a.1 : The following additional validation needs to be done using the nagios GUI by the DE. Time
Line: 5 minutes Action:Log onto the nagios web GUI, select hosts, change the server count from ‘100’
in the dropdown to ‘all’. Verify downtime is added for all hosts and if downtime does not exists on
any nodes fix it by rerunning the script in Step 20.1.a to schedule downtime on those nodes. Validate
downtime as per below screenshot.
Results/Descriptions: All hosts on Nagios dashboard downtime is scheduled.
Step 20.1.b: Create Tower Ticket to supress TOA Alarm
Action: During upgrade if compute reboot is required (as part of Change window) DE need to
ensure to create Tower Ticket using below steps for complete CR duration.
 Use TOWER to open a request to O&T: http://tower.web.att.com/#/

 Select the relevant Issue type.
 Maintenance Request General alerting/ticketing problems and eSet rule changes
(eDART/routing etc.). In the Maintenance Request form:
 Input EMOS in the Filter Application List field


 Then click in the Select Application field and select the area that best fits your issue.
o For eDart requests select EMOS EDART / Universal Work Flow

o For eSet alarm/ticket routing select EMOS ESET Rule Assistance
(Infrastructure, DBA, Storage)
 Skip the Select Assignee field

 Select appropriate Request Type. If unsure, choose Information Request.

 Provide a Brief Summary that describes your issue

 Provided a Detailed Description. Be sure to include example AOTS ticket numbers, server names,
alert info, etc., as appropriate. This helps speed up the investigation.

 Confirm the I am the Primary Contact statement. Default is Yes.

 Answer the Is there Associated Capital Labor? question. Default is No.

 Add Attachment if needed (optional).

 Submit.

Feel free to Q or email me (ln8367), to confirm I received your request, or for assistance in
completing the form.
Inputting a request in TOWER will result in an iTrack Issue being created. TOWER is simply the Front-
door issue entry tool.
Step 20.1: Fix ssh config for admin user on Fuel node

Action: Connect to Opssimple node with your personal attuid ssh {opsc IP} , become m96722
using sudo -iu m96722 and execute the following:
cd aic
ansible-playbook -i inventory -s playbooks/deploy_ssh_keys.yml --tags=install --limit=fuel_host
Results/Descriptions: It will switch User from fuel to m96722 in ssh config (needed for MOP
automation)
Step 20.2: Execute AIC-MOP-586 : MOP to regenerate LCM certificates
https://codecloud.web.att.com/projects/ST_CCP/repos/aic-docs/browse/docs/mops/
MOP_LCM_certs.md?until=cb27d032aae46bf4aaa9249a7d03605b1af96816&untilPath=docs
%2Fmops%2FMOP_LCM_certs.md
Step 20.3: Suspend Puppet Agent

Action: Connect to Fuel node with your personal attuid ssh {fuel IP} and become root using toor
Retrieve Fuel env id and set

fuel env
upload fuel stop puppet graph

fuel2 graph upload --env $envid --type stop_puppet --file
/var/www/nailgun/plugins/aic-fuel-plugin-3.0/utils/custom_graphs/standard_operation/stop_puppet.yaml
Execute stop puppet graph

fuel2 graph execute --env $envid --type stop_puppet
check graph execution is complete

fuel2 task list
Results/Descriptions: Check puppet agent is stopped on all nodes.
On Fuel VM, logged in using ATTUID, execute following to check status of puppet agents on all
nodes:
~~ for i in sudo /usr/localcw/bin/eksh -c "fuel nodes" | grep ready | awk '{print $5}' do echo -n "$i " ssh -qt -o
StrictHostKeyChecking=no $i 'sudo /usr/localcw/bin/eksh -c "sudo service puppet status"' done ~~
Results/Descriptions: Commands executed without errors and puppet agents stopped

Step 20.4: Disable weak ciphers on OpsSimple
Action: On OpsSimple VM, Login using UAM and switch as m96722 execute following to update
weak ciphers (security requirement).
cd aic
./setup.sh disable_ops_weakciphers
Step 20.5: Update Fuel repo

Action: Connect to Fuel node with your personal attuid ssh {fuel IP} and become root using toor ,
update /etc/yum.repos.d/update_repo_RCXY.repo as per
prod.repo stable.repo
Step 20.6: Upgrade Fuel components

Action: Connect to Fuel node with your personal attuid ssh {fuel IP} , become root using toor and
execute:
sudo yum --disablerepo=* --enablerepo=update_repo_RC35_1_dep --enablerepo=update_repo_RC35_1_rpm --
enablerepo=update_repo_RC35_1_plugins update fuel-nailgun -y
~~ nailgun_syncdb nailgun_fixtures systemctl daemon-reload && service nailgun reload ~~ Check

and remove duplicate designate packages
rpm -qa| grep designate
rpm -ev <older version of designate> --noscripts
NOTE: In the case you run into issues with below command, kill the process and rerun to remove any
duplicate designate packages and continue.
$ sudo yum --disablerepo=* --enablerepo=update_repo_RC35_1_dep --enablerepo=update_repo_RC35_1_rpm --
enablerepo=update_repo_RC35_1_plugins update -y --exclude fuel-octane
Step 20.7: Check for duplicate yum packages.

execute:
rpm -qa --qf "%{NAME} %{ARCH}\n" | sort | uniq -c | grep -v '1 '
If there are duplicates, remove the unnecessary packages and re-run above command again until it
shows no duplicates. To remove duplicates execute -
rpm -e <package_name>
Step 20.8: Update Fuel Services configuration.

execute:
#####a) hiera data preparation deployment graph:

We have to keep the graph execution until next prod RC comming after RC35
fuel env
fuel2 graph upload --env $envid --type hiera_prep --file
/var/www/nailgun/plugins/aic-fuel-plugin-3.0/utils/custom_graphs/fuel/hiera-preparation.yaml
fuel2 graph execute --env $envid --type hiera_prep
/etc/puppet/kilo-9.0/modules/fuel/examples/settings.sh $envid RC35_before_upgrade
#####b) Apply changes on Fuel node by execution update.sh :

/etc/puppet/kilo-9.0/modules/fuel/examples/update.sh
Note: There should be tasks:
hiera,sshkeygen,keystone,ldap,keystone_token_disable_nginx_services,nailgun,ostf,client,logindefs,fue
l_tasks_cleaner,security, backup.
If all these are not present then please re-execute command above.
The exit code could be either 0 or 2 .
If there any other exit codes then it may require troubleshooting.
Results/Descriptions: Check /var/log/remote/127.0.0.1/puppet-apply.log in case any errors occurred.
Step 20.9: Restart Fuel Services.

execute:
service mcollective restart
service nailgun restart
service astute restart
fuel plugins --sync
fuel plugins --list
Results/Descriptions: Verify latest packages are installed The list of fuel packages is listed
here: fuelpluginlist.yaml
Step 20.10: Update Fuel meta data

execute:
fuel env
python /var/www/nailgun/plugins/aic-fuel-plugin-3.0/utils/update_plugins_metadata.py -e $envid -u
Note - if there are issues with python command, execute fuel plugins --sync one more time and re-run
the python command to update metadata.
Check for Contrail Alarm Metadata by downloading settings file and see if it was updated. If
Contrail alarm metadta was not updated re-run the python command to update the metadata.
fuel env
fuel settings --env ${envid} --download
fgrep alarm_list $(pwd)/settings_${envid}.yaml
Results/Descriptions: Verify alarm and other plugin metadata is in sync by examing settings_$
{envid}.yaml.
Remove the settings files once verified

rm $(pwd)/settings_${envid}.yaml
Step 20.11: Propagate changes from Opssimple to Fuel.
Action: On Opssimple VM, as m96722 export current site yaml -

ssh <opscvm>
./manage.py generate_scripts <site_name>
cd ~/aic/files/env_<siteid>/fuel-client
python ./script.py update_repository
python ./script.py update_plugin
Results/Descriptions: Verify update repository and update plugin returns a result code of 200.
cd /home/m96722/aic
ansible-playbook -i inventory/ playbooks/prepare_authorized_keys.yml
Verify if a file named authorized_keys got created within /home/m96722/aic/files/env_/fuel-client/ It
will be a copy of the id_rsa.pub from /home/m96722/.ssh folder with from restriction of Fuel,
Opssimple & Seed nodes added.
cd ~/aic/files/env_<sitename>/fuel-client
python ./script.py restrict_os_user
The above command will push the authorized_keys of operator_user ( m96722 ) to Fuel.
Verify that proper configuration was uploaded:
Action: On Fuel VM, execute -

fuel env
python /var/www/nailgun/plugins/aic-fuel-plugin-3.0/utils/update_plugins_metadata.py -e $envid -u
The output command should not contain any "key is missing" lines
Step 20.12: Install opsfix0064 and copy diff_struct.py to fuel node.
Install opsfix0064:
~~
sudo su - m96722 OR sudo -iu m96722 sudo apt-get update sudo apt-get install aic-opsfix-cmc-
0064`
~~
Copy diff_struct.py script to fuel node:
~~ cd ~/aic ansible -i inventory/ fuel_host -mcopy -a "src=~/rotatemechid/diff_struct.py

dest=/tmp/" ~~
Step 20.13: Validate fuel settings.

execute:
ssh <fuelvm>
mkdir /var/log/new_RC35
cd /var/log/new_RC35
fuel env
fuel settings --download --env $envid
fuel network --download --env $envid
/etc/puppet/kilo-9.0/modules/fuel/examples/settings.sh $envid RC35_new_config
Results/Descriptions: verify following,
No differences in network yaml.

python /tmp/diff_struct.py network_<envid>.yaml ../fuelbackuppre/network_<envid>.yaml
RC specific updates should show up differences in settings yaml.

python /tmp/diff_struct.py settings_<envid>.yaml ../fuelbackuppre/settings_<envid>.yaml
Step 20.14: Touch nova-common file in computes and controllers

Action: On Opssimple VM, as m96722 execute the commands

ssh <opscvm>
cd aic
ansible compute_host:controller_host -i inventory/openstack -m shell -sa "cp
/usr/localcw/opt/sudo/sudoers.d/aic_nova_sudoers /usr/localcw/opt/sudo/sudoers.d/nova-common"
ansible compute_host:controller_host -i inventory/openstack -m shell -sa "cp /etc/sudoers.d/aic_nova_sudoers
/etc/sudoers.d/nova-common"
21. Implementation
Step 21.1: Remove the The StackLight Collector Plugin. (Only required for Large sites, not needed for
medium sites.)
Action: On Fuel VM, as fuel admin execute -
fuel plugins | grep lma_collector for example:

[root@nailgun ~]# fuel plugins | grep lma_collector
4 | lma_collector | 0.10.1188 | 4.0.0 | ubuntu (kilo-9.0, liberty-8.0, liberty-9.0, mitaka-9.0)
fuel plugins --remove lma_collector== for example:
fuel plugins --remove lma_collector==0.10.1188
Step 21.2: Execute Fuel release candidate graph.
Action: On Fuel VM, as fuel admin execute -

fuel env
fuel2 graph delete --env $envid --type release_candidate
fuel2 graph upload --env $envid --type release_candidate --file
/var/www/nailgun/plugins/aic-fuel-plugin-3.0/utils/custom_graphs/standard_operation/release_candidate.yaml
fuel2 graph execute --env $envid --type release_candidate

fuel2 graph delete --env $envid --type custom_step_1
fuel2 graph upload --env $envid --type custom_step_1 --file
/var/www/nailgun/plugins/aic-fuel-plugin-3.0/utils/custom_graphs/standard_operation/from_rc27/01-remove-
lma_art.yaml
/var/www/nailgun/plugins/aic-fuel-plugin-3.0/utils/custom_graphs/standard_operation/from_rc32/01-
backup_mysql.yaml
/var/www/nailgun/plugins/aic-fuel-plugin-3.0/utils/custom_graphs/standard_operation/from_rc32/03-
clean_lcm_and_dbng_nodes.yaml
fuel2 graph execute --env $envid --type custom_step_1
Check that backup was successful on all nodes:

fuel2 task list
Percona upgrade is very disruptive procedure. To prevent mysql cluster failures take
additional steps:
Login to all 3 dbng (lcm) nodes, and update mysql configuration:
add "innodb_fast_shutdown = 0" to the '[mysqld]' section of the /etc/mysql/my.cnf
From any dbng (lcm) node restart the mysql cluster:

crm resource restart clone_p_mysqld
Wait until all three nodes are in the 'started' state::

crm resource list
Execute mysqlcheck on all 3 dbng (lcm) nodes:

mysqlcheck --auto-repair --optimize --all-databases
You should not have any errors.
If you see any errors - stop the upgrade and create a ticket for the tiger team for investigation.
Update pacemaker timeout. Login to all 3 dbng (lcm) nodes, and update pacemaker configuration:
Open '/usr/lib/ocf/resource.d/fuel/mysql-wss' file and find the line in the mysql_stop() function
proc_stop "${OCF_RESKEY_pid}" "mysqld.*${OCF_RESKEY_datadir}" SIGTERM 5 $((
$shutdown_timeout/5 ))
Add '1205' between "mysqld.*${OCF_RESKEY_datadir}" and SIGTERM to have:

proc_stop "${OCF_RESKEY_pid}" "mysqld.*${OCF_RESKEY_datadir}" 1205 SIGTERM 5 $
(( $shutdown_timeout/5 ))
Save changes. Please note that 1205 is 2 hours of trying to gracefully terminate the mysql process. Be
patient.
Stop the mysql cluster:

crm resource stop clone_p_mysqld
Wait until all three nodes are in the 'stopped' state::

crm resource list
Check that there are no mysql and mysql-related processes on the node (on all 3 nodes):
ps aux | grep mysql
Login to all 3 nodes and check the state of DB one by one:
Check that .my.cnf is pointing to the /root/.my.localhost.cnf:

ls -la /root/ | grep .my.cnf
lrwxrwxrwx 1 root root 23 Jan 12 19:41 .my.cnf -> /root/.my.localhost.cnf
If not, execute:
ln -sf /root/.my.localhost.cnf /root/.my.cnf
Ban p_mysqld resource in pacemaker one by one:

crm_resource --resource clone_p_mysqld -B --host $(hostname)
Start the Mysql locally:

mysqld --log-error=/var/log/mysql/mysqld_custom.log --user=mysql --wsrep-provider='none' --innodb-read-
only=OFF &
Check that you can login to mysql:

mysql>
If you not able to login - stop the upgrade and ask the tiger team for investigation.
Check the DB status:

mysqlcheck --auto-repair --optimize --all-databases
mysqladmin shutdown
ps aux | grep mysql
Remove innodb_fast_shutdown and add 'show_compatibility_56 = on' to all 3 nodes:

remove "innodb_fast_shutdown = 0" from the '[mysqld]' section of the /etc/mysql/my.cnf
add "show_compatibility_56 = on" to the '[mysqld]' section of the /etc/mysql/my.cnf
Remove '1205' '/usr/lib/ocf/resource.d/fuel/mysql-wss' file from all 3 dbng nodes to have:

proc_stop "${OCF_RESKEY_pid}" "mysqld.*${OCF_RESKEY_datadir}" SIGTERM 5 $((
$shutdown_timeout/5 ))
Repeat the same for LCM nodes.
Percona Upgrade procedure (for DBNG and LCM nodes)
On the Fuel node:
Create upgrade.yaml file with content:
- id: stop_mysql_cluster
type: shell
role: ['primary-lcm']
version: 2.1.0
required_for: [check-clusters-status]
parameters:
cmd: crm resource stop clone_p_mysqld
retries: 3
interval: 20
timeout: 300
- id: stop_rabbit_cluster
type: shell
role: ['primary-aic-dbng']
version: 2.1.0
requires: [stop_mysql_cluster]
parameters:
cmd: crm resource stop master_p_rabbitmq-server
retries: 3
interval: 20
timeout: 300
- id: check-clusters-status
version: 2.1.0
type: shell
role: ['primary-aic-dbng', 'aic-dbng', 'primary-lcm', 'lcm']
requires: [stop_rabbit_cluster, stop_mysql_cluster]
cross-depends:
- name: stop_mysql_cluster
parameters:
cmd: sleep 30; crm resource list | grep Stopped:| grep $(hostname)
retries: 12
interval: 30
timeout: 1800
strategy:
type: one_by_one
- id: update_config
type: shell
version: 2.1.0
requires: [check-clusters-status]
parameters:
cmd: |
sed -i '/myisam_recover/d' /etc/mysql/my.cnf
puppet resource file_line home-dir path=/etc/mysql/my.cnf line='innodb-data-home-dir = /var/lib/mysql/' match=innodb-
data-home-dir
puppet resource file_line databases-exclude path=/etc/mysql/my.cnf line='databases-exclude = lost+found' after=parallel
retries: 3
interval: 20
timeout: 180
- id: update_packages
type: shell
version: 2.1.0
requires: [update_config]
parameters:
cmd: |
apt-get remove percona-xtradb-cluster-server-5.6 percona-xtradb-cluster-client-5.6 percona-xtrabackup percona-xtradb-
cluster-common-5.6 -y
puppet resource package percona-xtradb-cluster-server-5.7 ensure=present
puppet resource package percona-xtradb-cluster-client-5.7 ensure=present
retries: 3
interval: 20
timeout: 3600
Upload and execute this graph:

fuel2 graph upload --env $envid --type package_update --file ./upgrade.yaml
fuel2 graph execute --env $envid --type package_update

On all 3 DBNG (LCM) nodes one by one
Login to all 3 dbng nodes and execute:

eval "/usr/bin/mysqld_safe --log-error=/var/log/mysql/mysqld_custom.log --skip-grant-tables --skip-networking --
user=mysql --wsrep-provider='none' 2>&1 > /dev/null" & disown
Wait at least 20-30 seconds and execute:

mysql_upgrade
If successful, stop MySQL on this node and execute the same for the next node:
service mysql stop
If not successful, try:

mv /var/lib/mysql/ib_logfile0 /root/
mv /var/lib/mysql/ib_logfile1 /root/
eval "/usr/bin/mysqld_safe --log-error=/var/log/mysql/mysqld_custom.log --skip-grant-tables --skip-networking --
user=mysql --wsrep-provider='none' 2>&1 > /dev/null" & disown
And execute mysql_upgrade again:

mysql_upgrade
If successful, stop MySQL on this node and execute the same for the next node:
service mysql stop
If you see any errors - stop the upgrade and create a ticket for the tiger team for investigation.
Repeat the same steps for LCM nodes
If all 3 nodes have been successfully upgraded, start the Mysql cluster:
Starting from the primary-dbng (primary-lcm) node:

crm_resource --resource clone_p_mysqld -U --host $(hostname)
crm resource start clone_p_mysqld
Try to login to mysql and remove constraints for other nodes:
From the primary-dbng (primary-lcm) node login to the mysql:

mysql
If you not able to login - stop the upgrade and ask the tiger team for investigation.
Start mysql on other 2 nodes:

crm_resource --resource clone_p_mysqld -U --host $(hostname)
#Percona upgrade is very disruptive procedure. After this procedure we have to login to dbng and
lcm nodes and check the cluster status.
From the fuel node as fuel admin:

fuel node | grep dbng
Login to any dbng and lcm node and login to the mysql. If mysql is not working on this node use
another dbng (lcm) node.
Check cluster status and uninstall validate_password plugin

(https://itrack.web.att.com/browse/AICDEFECT-1990):
mysql> uninstall plugin validate_password;
mysql> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
wsrep_cluster_size should be 3, If not - create a ticket for

the tiger team for investigation., else - repeat the same
steps for lcm nodes.
Start rabbitmq cluster on any dbng node:
crm resource start master_p_rabbitmq-server
Proceed with upgrade procedure. From the fuel node:

Step 21.2.1: Run glance db sync
Action: on mosc, as fuel user run

ssh <moscvm>
sudo glance-manage -d db_sync
Step 21.2.2: Update LCM hosts

Action: on any LCM, as foreman admin execute ssh <lcmvm> lcm_update_hosts -f
Step 21.2.3: Disable THP on all GV nodes
Action:
On OpsSimple VM, as m96722 execute -

cd aic
ansible controller_host[0] -i inventory/openstack -m shell -a ". /home/m96722/openrc_v2; openstack aggregate list |
grep 'gv'"
ansible controller_host[0] -i inventory/openstack -m shell -a ". /home/m96722/openrc_v2; openstack aggregate
show <previous_command_output> -f json | python -mjson.tool"
ansible <host_list_from_previous_command (host1:host2:...)> -i inventory/openstack -m shell -s -a "puppet apply
/etc/puppet/modules/fuel/examples/grubupdate.pp"
Check if 'transparent_hugepage=never' is present on all needed nodes:

ansible <host_list (host1:host2:...)> -i inventory/openstack -m shell -s -a "cat /etc/default/grub | grep
GRUB_CMDLINE_LINUX"
Reboot all GV nodes:

ansible <host_list (host1:host2:...)> -i inventory/openstack -m shell -s -a "reboot"
Step 21.3: Designate DDNS service removal and cron update. - for (Large Sites only)

Action: On each of the TMDG nodes, Login into UAM and execute toor
apt-get remove aic-designate-ddns
also update cron tab with commenting below entries to stop
#*/1 * * * * python /usr/lib/python2.7/dist-packages/aicddns/ddnsupdate.py
#30 4 * * * python /usr/lib/python2.7/dist-packages/aicddns/axfrupdate.py

crontab -e
[DEFECT-17627]: Also make sure desingate server is up and running
designate server-list
+--------------------------------------+-----------------------+
| id | name |
+--------------------------------------+-----------------------+
| 7abc8cc1-031f-4c66-bab2-8bb0bd3ae8f6 | ns1.zone.tci.att.com. |
+--------------------------------------+-----------------------+
Check the designate server existence only for Large sites : ansible-playbook -i inventory/
playbooks/openrc_automation/get_designate_server_list.yml
If above commands results in a null ouput, please create a sesignate server
Run the below script

/etc/designate/create_nameserver.sh
Step 21.4: Check LCM actions on limited set of nodes.
Action: on primary LCM nodes, as fuel admin execute following -
~~ facter -p environment puppet agent --test --verbose --debug --trace --evaltrace --summarize --
detailed-exitcodes --noop cat /var/log/puppet/puppet.log ~~
Check the result in the console, in /var/log/puppet/puppet.log and Foreman, verify that this is not an
"empty catalog", check that foreman is green, check that the last report for that node in Foreman is
around 1 minute ago. If something wrong is detected (unwanted mechid changes, unwanted timer
change..), the DE neeeds to go back and adjust the opssimple_site.yaml.
This following command will force the puppet agent to run immediatly.
~~ service puppet start ; sleep 5 && kill -USR1 ps aux | grep 'puppet agent' | grep -v configuration | grep -v
grep | awk '{ print $2 }' cat /var/log/puppet/puppet.log ~~
Check /var/log/puppet/puppet.log
Results/Descriptions: Commands executed without errors and that nothing wrong detected in log
or Foreman
**Step 21.5:**Re-enable Puppet Agents.
The puppet agents are currently in a suspended state. The following graph will re-enable them. The
"actual" run of the puppet agents is spread over a 30 min window.
Re-enable the Puppet Agents

fuel2 graph upload --env $envid --type start_puppet --file
/var/www/nailgun/plugins/aic-fuel-plugin-3.0/utils/custom_graphs/standard_operation/start_puppet.yaml
fuel2 graph execute --env $envid --type start_puppet
check the graph execution is complete.
Results/Descriptions: On Fuel VM, logged in using ATTUID, execute following to to verify puppet
agents are started on all nodes:
Step 21.6: GSTools Update
Action:

cd aic
ansible all -i inventory -s -m shell -a "killall bgsagent "
ansible all -i inventory -s -m shell -a "killall bgssd.exe "
Execute GSTools update procedure per - GSTools MOP
Results/Descriptions: Commands executed without errors

Re-run on failed nodes:
Health check:
Health check at :
/opt/aic-health-check/cache/hc_ipbinvlopsc01.ipbin1a.infra.aic.att.net_20220427_030732.csv
Step21.6.1: Steps to start restart bgssd and bgscollect
Action:

cd aic
ansible all -i inventory/ -m shell -sa "pkill -9 -u m95031"
ansible all -i inventory/ -m shell -sa "/etc/init.d/bgssd start"
ansible all -i inventory/ -m shell -sa "yes | /opt/tools/bpa/b1config10700.sh &"
NOTE: Please wait for 10 mins before executing the next steps
ansible all -i inventory/ -m shell -sa "/usr/adm/best1_default/bgs/bin/best1collect.exe -I noInstance -B
/usr/adm/best1_10.7.00"
Verify if all 4 bgs process are running

ansible all -i inventory/ -m shell -sa "ps -ef |grep bgs"
Verify nagios alert using cli

ansible all -i inventory/ -m shell -sa "/usr/lib/nagios/plugins/check_procs_multi -p bgsagent,bgscollect,bgssd.exe -a
bgsioconfigcollect"
Step 21.7: Steps to disable McAfee agent
Action: On OpsSimple VM, as m96722 execute -
sudo apt-get update
sudo apt-get install aic-opssimple-plugins-gstools
cd aic
ansible-playbook -i inventory/ playbooks/mcafee_disable.yml

Validate the changes
ansible-playbook -i inventory/ playbooks/mcafee_disable_validate.yml
Step 21.8: Revert Astra config on openstack nodes (Only required for Large sites, not needed for
medium sites.)
Time Line: 30 mins
Execute AIC-MOP-611
https://att.sharepoint.com/:w:/r/sites/NCOMOPS/_layouts/15/Doc.aspx?sourcedoc=%7B91A7FE22-
C104-45E9-BE3F-F2853478AF9E%7D&file=AIC-MOP-
611_MOP_FOR_OPSFIX_0211.docx&action=default&mobileredirect=true&cid=f7fc2067-2fc9-417d-
be18-c4bfe646467f
Step 21.9: Update Nagios.
Execute AIC-MOP-601:
https://att.sharepoint.com/:w:/r/sites/NCOMOPS/_layouts/15/Doc.aspx?sourcedoc=%7B0BA3044B-
0FA8-4A4C-A54A-51FC62FA80F1%7D&file=AIC-MOP-
601_MOP_FOR_OPSFIX_0212.docx&action=default&mobileredirect=true
ansible-playbook -i inventory playbooks/deploy_nagios_server.yml
ansible-playbook -i inventory playbooks/deploy_nagios_agent.yml
To upgrade nagios to Nagios core 4.4.6 version(applicable only for RC 29 release), execute
ansible nagios_host -i inventory/ -m shell -s -a "sudo apt-get install --upgrade nagios4"
To add startup links to to init scripts for nagios, execute

ansible nagios_host -i inventory/ -m shell -s -a "update-rc.d nagios defaults"
Removal of duplicate pacemaker log rotation config file - AICDEFECT-901 ansible -i inventory --limit
tmdg_hosts -s -m shell -a "rm /etc/logrotate.d/pacemaker"
Step 21.10: Restore Fuel config to normal state
Action: On Opssimple VM, as m96722 user execute the following:

./manage.py export -f /home/m96722/latest.yaml <siteid> yaml
a) Change *: latest to *: present
To avoid unwanted changes outside a maintenance window, the packages list needs to be changed
from *: latest by the package list *: present in /home/m96722/latest.yaml .
b) Add / change - enable_puppet_agent: true in fuel-plugin-lcm section file /home/m96722/latest.yaml
c) Then execute the following to send data to a Fuel API:

./manage.py import /home/m96722/latest.yaml yaml
./manage.py generate_scripts <site_name>
cd ~/aic/files/env-<siteid>/fuel-client
python ./script.py update_plugin
d) Propagate changes through Fuel, Connect to Fuel node with your personal attuid ssh {fuel
IP} and become root using toor
fuel env
export envid={envid}
where envid is obtained from results for fuel env. above.

fuel2 graph execute --env $envid --type stop_puppet
/etc/puppet/kilo-9.0/modules/fuel/examples/settings.sh $envid RC35_before_config_db

fuel2 graph delete --env $envid --type update_config_db
fuel2 graph upload --env $envid --type update_config_db --file
/var/www/nailgun/plugins/aic-fuel-plugin-3.0/utils/custom_graphs/standard_operation/update_config_db.yaml
fuel2 graph execute --env $envid --type update_config_db
On OpsSimple VM, as m96722, please execute following to check *: latest has changed to *: present :
ssh <opscvm> cd aic/
Execute the ansible command:

ansible all -i inventory/openstack -m shell -a "sudo hiera packages|grep -q present" Expected result: command
finished with SUCCESS status.
e) Start puppet agent on all nodes:
fuel2 graph execute --env $envid --type start_puppet
Results/Descriptions: Once graph execution is complete, on Fuel VM, logged in using ATTUID,
execute following to check status of puppet agents on all nodes:
Step 21.11: Remove python3 from fuel node.
https://att.sharepoint.com/:w:/r/sites/NCOMOPS/MOP%20Document%20Library/AIC-MOP-
549_MOP_to_remove_python3_from_fuel.docx?
d=w910975145346489093628711b7c70f14&csf=1&web=1&e=N9Jwot
Step 21.12: Update MongoDB auth-scheme version to 5
Action: On opssimpleVM, as m96722 execute following,

validate current version is 3 ansible-playbook -i inventory/ playbooks/mongodb_auth_scheme_enabling.yml --
tags validate
Update AUTH-schema version to 5 ansible-playbook -i inventory/

playbooks/mongodb_auth_scheme_enabling.yml --tags deploy
Validate current version is now 5 ansible-playbook -i inventory/
playbooks/mongodb_auth_scheme_enabling.yml --tags validate

Step 21.13: Remove DHCP client from RO, Astra VMs
Action: On opssimpleVM, as m96722 execute following,

ansible-playbook -i inventory/ playbooks/remove_dhcp.yml
Step 21.14: Restart novaservice if VNC console is not accessible
Action: On each MOS Controller, as fuel admin execute following,

service nova-novncproxy restart

Step 21.15: Update puppet config on non Openstack nodes (Non-Fuel managed hosts with tags
deployagtastra,deployagtfuel,deployagtjump,deployagtkvm,deployagtmaas,deployagtnagios,deploy
agtro)
Action: On Opssimple VM, as m96722 user execute following,

ssh <opscvm>
cd aic
Stop puppet on the Non Openstack hosts:

ansible all -i inventory/hosts -m shell -s -a 'service puppet stop'
Remove the puppet certificates for the Non Openstack hosts:

ansible all -i inventory/hosts -m shell -s -a 'find /var/lib/puppet/ssl/ -name "*.pem" -delete'
Remove the Non Openstack host certificates from Fuel:

ssh <fuelvm>
toor
cd /var/lib/fuel/keys/<fuel_env_id>/puppet_ssl_certs
Note Replace the string "<non_openstack_fqdn>" in the statements below with the fqdn name of
the certificate.
Example
find astra-aic.mtn16b.cci.att.com.pem This example is provided for illustration only.
find -name <non_openstack_fqdn>.pem
find -name <non_openstack_fqdn>.pem -delete
On Opssimple VM, as m96722 user run the following playbooks to generate and deploy the new
puppet certificates for the Non Openstack hosts: ssh <opscvm>
cd aic
Generate and deliver non-openstack nodes certificates to LCM nodes

ansible-playbook -i inventory/ playbooks/update_fuel_puppetcerts_hosts.yml --tags
"deployagtastra,deployagtfuel,deployagtjump,deployagtkvm,deployagtmaas,deployagtnagios,deployagtro" -v –sudo
ansible lcm_host -i inventory/ -m shell -s -a 'chown -R puppet:puppet /var/lib/puppet/ssl/*'
Generate hiera data for non-openstack nodes:

ansible-playbook -i inventory/ playbooks/deploy_puppet_agent_hosts.yml --tags "addhierarole" -v –sudo
Install puppet agent configuration for non-openstack nodes:
ansible-playbook -i inventory/ playbooks/deploy_puppet_agent_hosts.yml --tags
"deployagtastra,deployagtfuel,deployagtjump,deployagtkvm,deployagtmaas,deployagtnagios,deployagtro" -v –sudo
Step 21.16: Enable RBAC for Load Balancer As A Service (LBaaS) objects by migrating Contrail
internal object called service_appliance_set
Action: On primary contrail controller, as root execute following,

python /opt/contrail/utils/chmod2.py --os-username *** --os-password *** --os-tenant-name *** --server
localhost:9100 --type service-appliance-set --name default-global-system-config:opencontrail --global-access 5
Reference defect - https://sdp.web.att.com/ccm/web/projects/Defect

%20Management#action=com.ibm.team.workitem.viewWorkItem&id=462503
Step 21.17: Disabling VNC Server Unauthenticated Access on all sites
IMPORTANT!
First check the already installed version of opsfix_200:

sudo apt-cache policy aic-opsfix-cmc-0200
If the installed version is lower than 40343, perform the rollback procedure. Only after that upgrade
the opsfix and apply the deploy procedure.
Action: For update firewall rules on compute node on all sites execute the following MOP:
510_MOP_for_opsfix_0200.docx?
d=w527ee9a050494ea8a6121d82e7b00423&csf=1&web=1&e=v9m9nF
Step 21.18: Execute MOP-064 MVMD Security Scanning
Action: Execute MOP-064 MVMD Security Scanning
064_FOR_OPSFIX_0110.docx?d=w286a593bcfd9406391fbf81980534bc1&csf=1&web=1&e=GbCMJB
Revert:
Applying the MOP:

Test Plan:
If there are errors encountered while running ansible-playbook -i inventory/

playbooks/aic_opsfix_deploy_0110.yml --tags deploy
stating below errors:
checkdir error: cannot create /var/mvmd/ .....
File exists
Please ignore those as the files are trying to re-copy or recreate
Step 21.19: Create database user account to support MVMD security scanning of AIC OpenStack
MySQL databases on lcm and fuel nodes
Action: Create database user account to support MVMD security

scanning https://att.sharepoint.com/:w:/r/sites/NCOMOPS/MOP%20Document%20Library/AIC-MOP-
598_MOP_FOR_OPSFIX-0210.docx?
d=w473a1bd65c264759823af01a00a22e5b&csf=1&web=1&e=AUkOmq&isSPOFile=1
Step 21.20: Execute Step 21.4 Install DHCP from MOP-194 MaaS Update to update DHCP packages
Action: Execute Step 21.4 Install DHCP from MOP-194 MaaS Update to update DHCP
packages https://att.sharepoint.com/:w:/r/sites/NCOMOPS/MOP%20Document%20Library/AIC-
MOP-194%20MaaS%20Update%201.docx?
d=wc3306ea8e0dd4b44a5fdea78e84b375e&csf=1&web=1&e=fKu9av
Step 21.sec.1: Disable libvirt daemon's listen mode on operational KVM
Action: On Opssimple VM, as m96722 user execute following,

ssh <opscvm>
cd aic
ansible-playbook -i inventory/ playbooks/disable_webvirt.yaml -e "variable_host=kvm_host"
Step 21.sec.2: Ubuntu password Update on KVM and MaaS Node
Action: Execute MoP https://codecloud.web.att.com/projects/ST_CCP/repos/aic-docs/browse/

docs/mops/password_rotate.md?
until=c60880424fa671dc9e2f27cd9ec9faa8d0afca6c&untilPath=docs%2Fmops
%2Fpassword_rotate.md&at=refs%2Fheads%2Fmaster

Step 21.sec.3: Disable ssh-access to Fuel using passwords (only ssh-access using public/private keys
will be possible). For all Large and medium sites
Action: On OpsSimpleVM, as m96722 user execute

ssh <opscvm>
cd aic
The below playbook sets sshd.conf parameter PasswordAuthentication as no and restarts sshd service
to apply this parameter on Fuel node. Full fqdn name of Fuel node can be taken by grep '\[fuel_host\]'
inventory/hosts -A 1 | sed -n 2p command.
ansible-playbook -i inventory/hosts playbooks/disable_password_login.yml --limit <full fqdn name of Fuel node>
Example:
$ ansible-playbook -i inventory/hosts playbooks/disable_password_login.yml --limit zmtn12fuel01.zmtn12.datacenter
PLAY [Disable password authentication] ****************************************
GATHERING FACTS ***************************************************************
ok: [zmtn12fuel01.zmtn12.datacenter]
TASK: [Changing ssh configuration and restarting ssh service] *****************
changed: [zmtn12fuel01.zmtn12.datacenter]
TASK: [restart sshd redhat] ***************************************************
TASK: [restart sshd ubuntu] ***************************************************
skipping: [zmtn12fuel01.zmtn12.datacenter]
TASK: [restart sshd CentOS] ***************************************************
PLAY RECAP ********************************************************************
zmtn12fuel01.zmtn12.datacenter : ok=4 changed=3 unreachable=0 failed=0
Step 21.sec.4: Block MAAS UI access from outside of LCP
Action: On OpsSimpleVM, as m96722 user execute following

cd ~/aic
To block MAAS API for the outside-of-LCP world, use the following playbook
ansible-playbook -i inventory/ playbooks/iptables_maas.yml
To block MAAS proxy

ssh <uamid>@<MAASVM>
execute toor
su - ubuntu
Check MAAS proxy is running.
ps -ef | grep -i squid
To disable MAAS proxy

sudo squid3 -k shutdown
In couple of minute the proxy will be disabled.
verify MAAS proxy.

If, for whatever reason, old iptables rules need to be restored, you can perform the following on the
maas node
sudo iptables-save | grep -v '--dport 5240' | sudo iptables-restore
To enable MAAS proxy

ssh <uamid>@<MAASVM>
execute toor
su - ubuntu
sudo /usr/sbin/squid3 -N -f /etc/maas/maas-proxy.conf &
verify MAAS proxy.

Step 21.sec.5: Fix the opssimple log file permission

cd ~/aic
ansible-playbook -i inventory/ playbooks/fix_opssimple_log_permission.yml
Step 21.sec.6: Cleanup unwanted directories and files on opssimple, fuel and seed node

cd ~/aic
ansible-playbook -i inventory/ playbooks/preventative_maintenance.yml
After this step, all the unwanted directories and files will be moved to trash on the respective nodes
1. opssimple => /home/m96722/trash
2. fuel => /root/trash
3. seed => /home/ubuntu/trash
Step 21.sec.7: Fix the opssimple nginx conf permission


cd ~/aic
ansible-playbook -i inventory/ playbooks/fix_opssimple_nginx_conf_permission.yml
Step 21.sec.8: Remove devops user from the KVM hosts.

cd ~/aic
ansible-playbook -i inventory/ playbooks/disable_passwd_login_ubuntu_user.yml --tags devops_removal
Step 21.sec.9: Rename sudoers

cd ~/aic
sudo apt-get update
ansible-playbook -i inventory/ playbooks/rename_sudoers.yml
Step 21.sec.10: Execute opsfix 0132 for openstack sudoers file integrity

cd ~/aic
sudo apt-get install aic-opsfix-cmc-0132
Use only inventory/openstack while running the playbook

ansible-playbook -i inventory/openstack playbooks/aic_opsfix_backup_0132.yml
ansible-playbook -i inventory/openstack playbooks/aic_opsfix_deploy_0132.yml
Step 21.sec.11: Disable Ubuntu and lock the password.

cd ~/aic
sudo apt-get update
ansible-playbook -i inventory/ playbooks/disable_ubuntu_login.yml
Results/Descriptions: Verify if the playbook is executed successfully
Step 21.sec.12: Remove unauthorized public keys present on the nodes

cd ~/aic
sudo apt-get update
ansible-playbook -i inventory/ playbooks/remove_unauthorized_pubkey.yml
Step 21.sec.13: Remove m96722 90-cloud-init-user from sudoers

cd ~/aic
sudo apt-get update
ansible-playbook -i inventory/ playbooks/remove_unwanted_sudoers.yaml
Step 21.sec.14: Update Octane


ssh <fuelvm> sudo yum --disablerepo=* --enablerepo=update_repo_RC35_1_dep --
enablerepo=update_repo_RC35_1_rpm --enablerepo=update_repo_RC35_1_plugins update fuel-octane -y
Step 21.sec.15: Update world readable file permissions

cd ~/aic
sudo apt-get update
ansible-playbook -i inventory/ playbooks/world_readable_file_permission.yml
Step 21.sec.16: Remove Multiple ssh keys present on the non-openstack nodes

cd ~/aic
sudo apt-get update
ansible-playbook -i inventory/ playbooks/remove_multiple_keys.yml
Step 21.sec.17: Lock Nagios user on Nagios host

cd ~/aic
sudo apt-get update
ansible-playbook -i inventory/ playbooks/lock_nagios_user.yml
Step 21.sec.18: Remove nova-common if present

cd ~/aic
ansible compute_host:controller_host -i inventory/openstack -m shell -sa "rm -rf
/usr/localcw/opt/sudo/sudoers.d/nova-common"
ansible compute_host:controller_host -i inventory/openstack -m shell -sa "rm -rf /etc/sudoers.d/nova-common"
Step 21.sec.19: Remove HP upgrade manager, which will remove DISCAGNT daemon

cd ~/aic
sudo apt-get update
ansible-playbook -i inventory/ playbooks/uninstall_hped.yml
Step 21.sec.20: Check for LCM VM's available RAM
cd ~/aic
ansible lcm_host -i inventory/ -s -m shell -a "free -g"
If the output is less then 16GB like below

m96722@zmtn16aopsc01:~/aic$ ansible lcm_host -i inventory/openstack -s -m shell -a "free -g"
zmtn16alcma03.mtn16a.cci.att.com | success | rc=0 >>
total used free shared buffers cached
Mem: 15 15 0 0 1 5
-/+ buffers/cache: 8 7
Swap: 7 0 7
Mem: 15 14 0 0 1 5
Swap: 7 0 7
Mem: 15 14 0 0 1 5
Swap: 7 0 7
Please follow MOP-475 (https://att.sharepoint.com/:w:/r/sites/NCOMOPS/MOP%20Document

%20Library/AIC-MOP-475_MOP_FOR_MEM_CPU_UPDATE_LCM.docx?
d=wc404662ddf684f38a94cba5871f96697&csf=1&web=1&e=cul6zT)
Step 21.sec.21: Check if any SSH weak algorithms are supported in Non Openstack nodes
Login to the Opssimple VM as att uid ( UAM) Refer to Jump host for sites as per below wiki
link https://wiki.web.att.com/display/AICP/AIC+Production+Environments
sudo su - m96722 or sudo -iu m96722
Check if the mac algorithms were updated in sshd_config
ansible all -i inventory/hosts -m shell -sa "cat /etc/ssh/sshd_config | grep -i 'MACs hmac-sha2-
512,hmac-sha2-256,hmac-ripemd160'" Check if ciphers are updated in sshd_config
ansible all -i inventory/hosts -m shell -sa "cat /etc/ssh/sshd_config | grep -i 'Ciphers aes256-
ctr,aes192-ctr,aes128-ctr'" Check if KexAlgorithms are updated in sshd_config
ansible all -i inventory/hosts -m shell -sa "cat /etc/ssh/sshd_config | grep -i 'KexAlgorithms ecdh-
sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256'"
Note: Please ignore fuel as it is managed by puppet and will have different md5sum. If md5sum
doesn't match please execute the MOP
Please follow MOP-561:- (https://att.sharepoint.com/:w:/r/sites/NCOMOPS/_layouts/15/Doc.aspx?
sourcedoc=%7BB677C89A-DAC8-476D-B58A-48A62D680275%7D&file=AIC-MOP-561%20MOP
%20for%20OPSFIX_0206.docx&action=default&mobileredirect=true)
22. Test Plan
Inform CPVT team via email and the appropriate Release Mangement Chat to perform full PVT once
uplift is complete. PVT should clean up artifacts.
Release Management Chat -

Large qto://meeting/q_rooms_cb12191468270764278/AIC+RC26.2%2C+RC29%2C+RC32+Large+De
ployments
Release Management Chat -

Medium qto://meeting/q_rooms_cb12191534195381761/AIC+3.0.3+RC18.1+Medium+Deployments
+%28see+Meeting+Attibutes+for+current+zones+%26+CWs%29
23. Backout Procedure

To restore vLCP control plane from backup execute steps 23.1, 23.2 and 23.3.
Step 23.1: Shutdown the LCP VMs which need to be restored
Action: Access control plane KVM where the VM is running and shutdown VM
ssh <attuid>@<kvm>
virsh list --all
virsh shutdown <vmhostname>
virsh undefine <vmhostname>
Results/Descriptions: Control plane VM is shutdown and undefined.
Step 23.2: Storage team to restore control plane LUNs from backup (Large Sites Only)
Action: Submit request to restore of all vLCP LUNs from backup.
For Production: AIC STORAGE OPERATIONS SUPPORT: http://ushportal.it.att.com/step2.cfm?

app=3962&home=ush
form.
For Labs and SIL: http://labticketing.web.att.com/LabTicketing/(S(mmrztkbhhyshix45mbp0tmqk))/

AicTrouble.aspx?Lep=0
Callout Storage team qto://meeting/q_rooms_tg79241332875301882/EMOC+-

+AIC+Tier+2+Team+Chat
Also notify storage team through email DL-STORAGEOPS-REPLICATION@att.com
Step 23.3: Restore vms on the kvms from dump xml backup (Medium Sites Only)
~~ ssh @
Copy the xml files and define the vms using xml files.
ansible-playbook -i inventory/ playbooks/restore_vms.yml
~~
Step 23.4: Startup all LCP VMs
Action: Access each control plane KVM and startup VMs. Proceed starting DBNG nodes, MOSC
controller nodes and then rest of the nodes.
ssh <attuid>@<kvm>
virsh list
virsh start <vmhostname>
Results/Descriptions: All control plane VMs are running.
Step 23.5: To rollback Nagios, rerun playbooks once step 23.1, 23.2, 23.3 are restored. These
playbooks will re-install previous RC packages.

ssh <opssimplevm>
ansible-playbook -i inventory playbooks/deploy_lma.yml
ansible-playbook -i inventory playbooks/deploy_nagios_server.yml
ansible-playbook -i inventory playbooks/deploy_nagios_agent.yml
Step 23.6: To rollback Astra, rerun playbooks once step 23.1, 23.2, 23.3 are restored. These
playbooks will re-install previous RC packages.

ansible-playbook -i inventory playbooks/deploy_astra_identity.yml
ansible-playbook -i inventory playbooks/deploy_astra_controller.yml
ansible-playbook -i inventory playbooks/deploy_astra_compute.yml

Step 23.7: Restore OpsSimple Configuration (Medium Sites Only)
Action: On Opssimple VM, as m96722 user import the original site yaml created in Step 17.b.15.
ssh <opscvm>
cd /home/m96722/install
./resetdb.sh
python3 /var/www/aic_opssimple/backend/manage.py import /home/m96722/before_rc.yaml yaml
Step 23.8: Restore Fuel from backup (Medium Sites Only)
Follow this step only if fuel needs to be restored from backups taken during preupgrade checks. This
step is not necessary if the control plane is restored from backups.
Action: From the FuelVM, execute following as m96722 user -

octane --debug -v fuel-repo-restore --from repos_and_images.tar.gz
octane --debug -v fuel-restore --from master_node_state.tar.gz --admin-password admin
Results/Descriptions: Fuel is restored from backup.
Step 23.9: Enable Ubuntu and unlock the password.
Action: On Opssimple VM, as m96722 user execute the following

cd ~/aic
sudo apt-get update
ansible-playbook -i inventory/ playbooks/revert_disable_ubuntu_login.yml
24. POST Activities (during maintenance window)
To comply with RIM, and audit findings, every MOP must include steps to remove backups and
artifacts created during the deployment of that MOP
The removal process may be at a later date to allow for potential back outs. Any artifacts for this
must be listed in Post Change activities section and include instructions on scheduling the future
removal.
Any backups and artifacts created during the MOP execution which will not be needed for backout,
should be removed in the POST implementation activities section executed before the end of the
change window.
Permissions for Openstack configuration file backups should be set to 640, and other files 640 or
600. The rule of least privilege should apply, but we need to make sure the DE has access to the files
for restore / backout purposes. This should not be a group that non-SA and non-DE users would be
in.
Check 24.1: Perform post upgrade verification
Time Line: 1 hours.
Action: Execute post-ugrade checklist - https://codecloud.web.att.com/projects/ST_CCP/repos/aic-

docs/browse/docs/dg/common/postupgradeverification.md?at=refs%2Fheads%2F3.0.3_RC35_stable
Results/Descriptions: All post-upgrade checks are successful.
Check 24.2: Check Nodes Status

fuel node shows "ready". No nodes in discover or error. Online field should be 1.
Results/Descriptions: Commands executed without errors and status is ready
Check 24.3: Check Fuel Environment

fuel env shows the environment "operational" (dummy_env can be ignored)
Results/Descriptions: Commands executed without errors and environment operational
Check 24.4: Perform Contrail verification

Time Line: 1 hours.
Action: Execute contrail checklist - https://att.sharepoint.com/:w:/r/sites/NCOMOPS/MOP

%20Document%20Library/AIC-MOP-195_Contrail_Control_Plane_Generic_checks.docx?
d=w6469387ef976478ab2ea94cbdc1652dd&csf=1&web=1&e=rhhr6W
Results/Descriptions: All post-upgrade checks are successful.
Check 24.5: Check if the ruby package is upgraded

Action: Validate the ruby package is upgraded to 2.0.0.484-1ubuntu2.13 .
On OpsSimple VM:
cd /home/m96722/aic
ansible all -i inventory/openstack -m shell -sa " dpkg -l | grep ruby2"
Results/Descriptions: All the openstack nodes shoul be upgraded to the version 2.0.0.484-
1ubuntu2.13
Check 24.6: Cleanup upgrade related artifacts
Action: Remove upgrade related artifacts created on OpsSimple, Fuel and Contrail Controller VMs.
Connect to OpsSimple node with your personal attuid ssh {opsc IP} and become m96722 using sudo -
iu m96722 , then execute:
rm -rf /var/tmp/xml_dump/
rm -rf /var/tmp/aic_opsfix_*/
Connect to Fuel node with your personal attuid ssh {fuel IP} and become root using toor , then
execute:
rm -rf /var/log/fuelbackuppre/nailgun.dump.gz
From OpsSimple host as m96722, Connect to one of the Contrail Controller VM and become root
using toor , then execute:
rm -rf /var/tmp/contrailbackuppre
Check 24.7: Request storage operations to backup of all vLCP LUNs (Large Sites Only)
Time Line: 1 hours.
Action: Submit request to backup of all vLCP LUNs in site.

For Production: AIC STORAGE OPERATIONS SUPPORT: http://ushportal.it.att.com/step2.cfm?
app=3962&home=ush
form.
For Labs and SIL: http://labticketing.web.att.com/LabTicketing/(S(mmrztkbhhyshix45mbp0tmqk))/

AicTrouble.aspx?Lep=0
Check 24.8: Backup xml of vms on the kvms
~~ ssh @
ansible-playbook -i inventory/ playbooks/infraxml_dump.yml ~~
Check 24.9: Delete Scheduled Downtime to Enable Nagios alerts
NOTE: Repeat steps for both nagios01 and nagios02.
Action: Connect to Opssimple node using your personal attuid, switch to m96722, an d then connect
to one of the Nagios hosts and become root
ssh {opsc IP}
sudo -iu m96722
ssh {nagios IP}
toor
Time Line: 1 hours.
Action: Enable Nagios dashboard back
nagios-del-downtime-host.sh [HOST exactly as in the cfg file for nagios]
Example -
/usr/local/bin/nagios-del-downtime-zone.sh pdk5
Note- Release Management Chat to be communicated via Q chat specified in Section 19 after
enabling Nagios alert
25. Post Maintenance Work
Check 25.1: Remove update graphs.

Action: Connect to Fuel node with your personal attuid ssh {fuel IP} and become root using toor and
execute:
fuel2 graph delete --env $envid --type release_candidate
fuel2 graph delete --env $envid --type start_puppet
fuel2 graph delete --env $envid --type stop_puppet
fuel2 graph delete --env $envid --type update_config_db

IPL1A RC35 Upgrade Artifacts

Uploaded by

Copyright:

Available Formats

IPL1A RC35 Upgrade Artifacts

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IPL1A RC35 Upgrade Artifacts

Uploaded by

Copyright:

Available Formats

17.b.

Pre-Maintenance Check Manual (Non-Automated Requirements)

Step 17.b.1: Check Nodes Status

Time Line: 5 minutes.

Action: Execute the commands below on the Fuel node

Step 17.b.2: Check Fuel Environment

Time Line: 5 minutes.

Action: Execute the commands below on the Fuel node

Time Line: 72 hours prior to CW.

Action: Submit request to backup of all vLCP LUNs in site.

SRTS Ticket needs to be created 72 hours ahead of CW AIC STORAGE OPERATIONS

Step 17.b.4: Backup contrail controller and config database.

Time Line: 20 minutes.

Action: On the Contrail controller VM, as fuel admin execute -

create backup folder -

check if there is sufficient space under /var/tmp for backup

For Large sites - make sure 60 GB free space on /var partition.

For Medium sites - make sure 25 GB free space on /var partition.

Step 17.b.5: Backup xml of vms on the kvms

Time Line: 30 minutes.

Action: On OpsSimpleVM execute -

sudo su - m96722 OR sudo -iu m96722

Check if there is sufficient disk space in /var/tmp for backup

ansible kvm_host -i inventory -m shell -sa "df -h"

ansible-playbook -i inventory/ playbooks/infraxml_dump.yaml ~~

Step 17.b.6 Check for Contrail Zookeeper Cassandra Discrepancies

From Fuel vm login into contrail controller node,

Step 17.b.8 Contrail checks

Contrail Checks - https://att.sharepoint.com/:w:/r/sites/NCOMOPS/MOP%20Document%20Library/

Step 17.b.9: Backup Fuel Nailgun database.

Time Line: 5 minutes.

Action: Execute the commands below on the Fuel node

Check if there is sufficient disk space in /var/log for backup

For mediums, recommend a minimum of 10GB free space for backup.

create backup folder -

Results/Descriptions: Nailgun dump file is created in the backup folder.

Time Line: 5 minutes

Action: Execute the commands below on the Fuel node

identify fuel env id and set variable

Step 17.b.11: LCM status

Time Line: 15 minutes

Action: Login to Foreman UI

Remove them: Select Action -> Delete hosts

Step 17.b.12: Update OpsSimple packages

Time Line: 5 minutes

Action: Execute the commands below on the Opssimple node

For all SIL and Production Sites - prod.repo

For IST sites dev.repo

Here is plugins list - Opssimple plugins list

Results/Descriptions: Verify OpsSimple Packages are updated.

Execute MOP-355 Update aiclcm packages

#####Install certain version of packages on apollo node:

Time Line: 5 minutes

Action: Execute the commands below on the Fuel node

Verify that the dummy_env was created in Fuel

Step 17.b.14: Check Foreman SSL certificates.

Time Line: 10 minutes

openssl x509 -in foreman-aic/foreman-aic.includesprivatekey.pem -text -noout

Sample output for MTN11:

Subject: C=US, ST=Michigan, L=Southfield, O=AT&T Services, Inc., CN=puppet.mtn11.cci.att.com

Update the Environment section to kernel version: ubuntu4Q21 ~~ Environment:

Only Update the cmha db_password if the password is default