Page MenuHomePhabricator

Eqiad: 1VM request for Peek (PM service in use by Security Team)
Closed, ResolvedPublic

Description

Cloud VPS Project Tested: security-tools

https://wikitech.wikimedia.org/wiki/Security/Peek
https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/594993/
https://gerrit.wikimedia.org/r/#/projects/wikimedia/security/tooling/peek,dashboards/default

Site/Location:EQIAD
Number of systems: 1
Service: peek (project management)
Networking Requirements: internal IP
Processor Requirements: 1
Memory: 2G
Disks: 20G+ (it could be 10 or whatever but 20 would be nice)
Other Requirements:

Event Timeline

Marostegui triaged this task as Medium priority.May 12 2020, 5:23 AM
Marostegui added a project: serviceops.

@chasemp Did you want peek1001 or should we rather use something a bit more generic like sectool1001?

Will it maybe host some other tools in the future or really just dedicated to peek?

Why does this need a complete VM, though? If this simply sends some notifications triggered by cron jobs, simply running them from mwmaint1002 seems fine? (Despite the mw-specific name, mwmaint1002 also hosts various other maintenance profiles (profile::mariadb::maintenance and profile::openldap::management)

Why does this need a complete VM, though? If this simply sends some notifications triggered by cron jobs, simply running them from mwmaint1002 seems fine? (Despite the mw-specific name, mwmaint1002 also hosts various other maintenance profiles (profile::mariadb::maintenance and profile::openldap::management)

Reasons I would put this on its own VM:

  • Right now there are two scheduled jobs but the gist of the thing is jinja2 templates as we are thinking webservice. There are more components of this to come.
  • The secrets here will give access to a wide variety of sensitive content on a couple backends, it doesn't seem like folks would be making decisions about mwmaint sudo access with anything related to these backends in mind. Leaking of this sensitive information from one of the backends in particular would be damaging.
  • mwmaint arguably shouldn't be running those ldap and mariadb jobs for the same reason
  • My understanding is we have been moving away from the overloaded generic role hosts with the low effort and relative ease of VM deployment and management. I'm not sure on whether that paradigm still stands.
  • The best option here prior to more flesh on a webservice would be k8s cron jobs but I don't believe that's baked in production as of yet

@chasemp Did you want peek1001 or should we rather use something a bit more generic like sectool1001?

Will it maybe host some other tools in the future or really just dedicated to peek?

peek1001 being preferred and the resource ask kept tight is my preference, but whatever works, or if overloading mwmaint is the better option let me know.

Dzahn changed the task status from Open to Stalled.May 21 2020, 8:04 AM
Dzahn removed Dzahn as the assignee of this task.
Dzahn subscribed.

Giving it back to the pool and setting to stalled because of ongoing discussion whether this should be on a dedicated VM or on mwmaint.

@MoritzMuehlenhoff could you revisit this when you have a minute? I'd like to get this off my plate and @wiki_willy and I were waiting on this to do some coordination.

@chasemp I think it's a little overblown, but if it helps unblocking existing tests, feel free go ahead. Our Ganeti capacity in eqiad is exceeded though, you'll have to wait until the new servers are fully setup or you can set up the instance as peek2001 (the DC shouldn't matter for your use case).

@chasemp I think it's a little overblown, but if it helps unblocking existing tests, feel free go ahead. Our Ganeti capacity in eqiad is exceeded though, you'll have to wait until the new servers are fully setup or you can set up the instance as peek2001 (the DC shouldn't matter for your use case).

Fair enough, thanks for the wiggle room.

@Dzahn do you feel comfortable picking this back up for CODFW ganeti?

@chasemp Yes, i can create the VM. But I would ask you to please add your new cluster name, description and contact on https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions#Servers

Change 598966 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] introduce peek2001.codfw.wmnet

https://gerrit.wikimedia.org/r/598966

Change 598966 merged by Dzahn:
[operations/dns@master] introduce peek2001.codfw.wmnet

https://gerrit.wikimedia.org/r/598966

Ah, wait, so i was about to create it and already added peek2001.codfw.wmnet to DNS but then noticed it asks for external IP. So this really needs to be peek2001.wikimedia.org ? (Why though? )

Ah, wait, so i was about to create it and already added peek2001.codfw.wmnet to DNS but then noticed it asks for external IP. So this really needs to be peek2001.wikimedia.org ? (Why though? )

internal is fine, we can always expose it through the standard layers. Apologies for the mixup.

Creating VM peek2001.codfw.wmnet in cluster ganeti01.svc.codfw.wmnet with row=B vcpus=1 memory=2GB disk=20GB link=private. This may take a few minutes.

Dzahn changed the task status from Stalled to Open.May 27 2020, 1:07 PM

Creating VM peek2001.codfw.wmnet in cluster ganeti01.svc.codfw.wmnet with row=B vcpus=1 memory=2GB disk=20GB link=private. This may take a few minutes.

cheers

Change 599016 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site/DHCP/partman: add peek2001.codfw.wmnet

https://gerrit.wikimedia.org/r/599016

Change 599016 merged by Dzahn:
[operations/puppet@production] site/DHCP/partman: add peek2001.codfw.wmnet

https://gerrit.wikimedia.org/r/599016

cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: peek2001.codfw.wmnet

  • peek2001.codfw.wmnet (FAIL)
    • Failed downtime host on Icinga (likely already removed)
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.codfw.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed

ERROR: some step on some host failed, check the bolded items above

Change 599035 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] DHCP: update MAC for peek2001

https://gerrit.wikimedia.org/r/599035

Change 599035 merged by Dzahn:
[operations/puppet@production] DHCP: update MAC for peek2001

https://gerrit.wikimedia.org/r/599035

@chasemp The VM has been created and I installed the OS and signed the puppet cert request. It is in site.pp with the "insetup" role. The initial puppet run is ongoing right now and I ran out of time. In a couple minutes you should be able to SSH to it. You can start creating your puppet role and applying it in site.pp

chasemp moved this task from Waiting to Our Part Is Done on the Security-Team board.

@chasemp The VM has been created and I installed the OS and signed the puppet cert request. It is in site.pp with the "insetup" role. The initial puppet run is ongoing right now and I ran out of time. In a couple minutes you should be able to SSH to it. You can start creating your puppet role and applying it in site.pp

sweet, thanks man. Yeah I'll take it from here in T251784, no worries.

Alright, done. Your SSH user exists now.

Change 602598 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add IPv6 records for peek2001

https://gerrit.wikimedia.org/r/602598

Change 602598 merged by Dzahn:
[operations/dns@master] add IPv6 records for peek2001

https://gerrit.wikimedia.org/r/602598