Page MenuHomePhabricator

AQS deployment guide
Closed, ResolvedPublic

Description

Description

The goal here is to create a step by step deployment guide for all AQS services . The guide(s) should include steps that should be applicable for each of AQS services

Acceptance Criteria
Hold an internal review session within engineers and managers
Request SRE to review the guide

Details

Other Assignee
SGupta-WMF

Event Timeline

I have a few comments/suggestions for the deployment guide. I'm sorry because I explained poorly with my first review and, for some cases, I'm commenting similar things as before:

  • I was wondering whether https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/AQS#Deployment should be the place for this guide. At this time there is a guide to deploy AQS 1
  • About the version formats:
    • In reality we always use the production variant for both environments, staging and production, for all AQS services that reside at gerrit. At this time, according to the guide, it seems that environment is like a parameter that depends on the environment but it should be always production. The only parameter we have to keep in mind is to use the right date according to the image that was built for the merged MR.
    • Instead of encouraging looking for the docker image at the Wikimedia's Docker Registry I would mention the pipeline output (gerrit and gitlab ones) as the place where to find it. I think it should be the place to look for the image you want to deploy. The Docker Registry takes 4 hours until information about available images is updated but, in reality, the image, once it's built, it's available immediately. It's not needed to wait until it's published at the Docker Registry.
  • I would put a few sample MRs to point out the right file that needs to be modified and updated according to the new version that is going to be deployed. When someone is doing this for the first time, it's not trivial to find the right path and file. That file is different depending on the environment you want to deploy to. And, in the case you want to deploy only to staging to try something, you have to change the image version at value-staging.yaml file and you should remove it after your test). I mean, the process is a bit different depending on your goal is to deploy to production or only to staging to test something. Even if you have some experience deploying things, the path where the values file is could change a bit and it's not trivial to find it. Apparently, different kubernetes cluster have different ways to organize things.
  • I have recently learned there are a couple of alias (deployment.eqiad.wmnet and deployment.codfw.wmnet) to ssh to the deployment servers. We could use them to refer to the deployment servers
  • I would add something about how to see the pod logs after the Check Pods phase. I think it's interesting to be able to know if your service has started properly and also to be able to debug in the case you need to see something specific after a sample request (specially at staging environment)
  • I would also add a quick start section to have a quick look only at the commands you need to deploy the service. In my case, I tend to check them almost everytime I have to deploy something. And for those cases, the full guide is not needed because you already understand the process but want to be sure that you are running the right command before deploying anything. Someone creates something similar for the ops week documentation to deploy AQS before the automation was done: https://wikitech.wikimedia.org/wiki/Data_Platform_Engineering/Ops_week#Is_there_a_new_Mediawiki_History_snapshot_ready?_(beginning_of_the_month). In that page there is a quick reference really useful to remember the right commands
  • We should keep in mind that staging URLs are different for device-analytics and the rest of the services. The bat URL and port are not the same. We should explain that where we mention that some sample request can be performed at staging before deploying to production

Nice work! I've reviewed https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/AQS/Deployment and made a few formatting changes to improve readability. +1 to Sfaci's comments above. I agree that this content should be moved to https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/AQS#Deployment once complete, replacing the deployment instructions for AQS 1.0.

Thanks @apaskulin and @Sfaci for review , I am looking at these comments . Shall post another version for your review soon

Hi @Sfaci I made some changes depending on my knowledge and understanding . There might be some minor things you might want to add . Please feel to add and publish . Once done I will move this to https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/AQS#Deployment.
Thanks!

I think my review and contributions are already done. @SGupta-WMF can you take a look at it?

I have been working with https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/AQS/Deployment

Thanks @Sfaci for your valuable additions . I just made some changes to language and published the page . Tagging @apaskulin for final review so I can move the page to https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/AQS#Deployment.

Milimetric updated Other Assignee, added: SGupta-WMF.

Great work! I especially like the quick start section. I've edited the page to apply some language and formatting improvements. This is now ready to go! Keep in mind that you'll need to increase all the heading levels by one when moving the content to the main AQS page.