[tor-project] reminder: bullseye upgrades batch coming next week
Antoine Beaupré
anarcat at torproject.org
Wed Apr 27 15:25:16 UTC 2022
Reminder: the bullseye upgrade run is continuing in May.
We therefore are probably going to resume upgrades of the rest of the
cluster *next week*. The machines in this batch are:
> bacula-director-01
> bungei
> carinatum
> check-01
> crm-ext-01
> crm-int-01
> fallax
> gettor-01
> gitlab-02
> henryi
> majus
> mandos-01
> materculae
> meronense
> neriniflorum
> nevii
> onionbalance-01
> onionbalance-02
> onionoo-backend-01
> onionoo-backend-02
> onionoo-frontend-01
> onionoo-frontend-02
> polyanthum
> rude
> staticiforme
> subnotabile
If you have any concern about those servers being upgraded, do let us
know.
A copy of the original RFC follows.
Thanks!
a.
--
Antoine Beaupré
torproject.org system administration
On 2022-03-24 16:35:34, Antoine Beaupré wrote:
> Note: this proposal is also visible in:
>
> https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-20-bullseye-upgrades
>
> Summary: bullseye upgrades will roll out starting the first weeks of
> April and May, and should complete before the end of August 2022. Let
> us know if your service requires special handling.
>
> # Background
>
> Debian 11 [bullseye][] was [released on August 14 2021][]). Tor
> started the upgrade to bullseye shortly after and hopes to complete
> the process before the [buster][] EOL, [one year after the stable
> release][], so normally around August 2022.
>
> In other words, we have until this summer to upgrade *all* of TPA's
> machine to the new release.
>
> New machines that were setup recently have already been installed in
> bullseye, as the installers were changed shortly after the release. A
> few machines were upgraded manually without any ill effects and we do
> not consider this upgrade to be risky or dangerous, in general.
>
> This work is part of the [%Debian 11 bullseye upgrade milestone][],
> itself part of the [OKR 2022 Q1/Q2 plan][].
>
> # Proposal
>
> The proposal, broadly speaking, is to upgrade all servers in three
> batches. The first two are somewhat equally sized and spread over
> April and May, and the rest will happen at some time that will be
> announced later, individually, per server.
>
> ## Affected users
>
> All service admins are affected by this change. If you have shell
> access on any TPA server, you want to read this announcement.
>
> ## Upgrade schedule
>
> The upgrade is split in multiple batches:
>
> * low complexity (mostly TPA): April
> * moderate complexity (service admins): May
> * high complexity (hard stuff): to be announced separately
> * to be retired or rebuilt servers: not upgraded
> * already completed upgrades
>
> The free time between the first two will also allow us to cover for
> unplanned contingencies: upgrades that could drag on and other work
> that will inevitably need to be performed.
>
> The objective is to do the batches in collective "upgrade parties"
> that should be "fun" for the team (and work parties *have* generally
> been generally fun in the past).
>
> ### Low complexity, batch 1: April
>
> A first batch of servers will be upgraded in the first week of April.
>
> Those machines are considered to be somewhat trivial to upgrade as
> they are mostly managed by TPA or that we evaluate that the upgrade
> will have minimal impact on the service's users.
>
> ```
> archive-01
> build-x86-05
> build-x86-06
> chi-node-12
> chi-node-13
> chives
> ci-runner-01
> ci-runner-arm64-02
> dangerzone-01
> hetzner-hel1-02
> hetzner-hel1-03
> hetzner-nbg1-01
> hetzner-nbg1-02
> loghost01
> media-01
> metrics-store-01
> perdulce
> static-master-fsn
> submit-01
> tb-build-01
> tb-build-03
> tb-tester-01
> tbb-nightlies-master
> web-chi-03
> web-cymru-01
> web-fsn-01
> web-fsn-02
> ```
>
> 27 machines. At a worst case 45 minutes per machine, that is 20 hours
> of work. At three people, this might be doable in a day.
>
> Feedback and coordination of this batch happens in issue
> [tpo/tpa/team#40690][].
>
> ### Moderate complexity, batch 2: May
>
> The second batch of "moderate complexity servers" happens in the first
> week of May. The main difference with the first batch is that the second
> batch regroups services mostly managed by service admins, who are given
> a longer heads up before the upgrades are done.
>
> ```
> bacula-director-01
> bungei
> carinatum
> check-01
> crm-ext-01
> crm-int-01
> fallax
> gettor-01
> gitlab-02
> henryi
> majus
> mandos-01
> materculae
> meronense
> neriniflorum
> nevii
> onionbalance-01
> onionbalance-02
> onionoo-backend-01
> onionoo-backend-02
> onionoo-frontend-01
> onionoo-frontend-02
> polyanthum
> rude
> staticiforme
> subnotabile
> ```
>
> 26 machines. If the worst case scenario holds, this is another day of
> work, at three people.
>
> Not mentioned here is the `gnt-fsn` Ganeti cluster upgrade, which is
> covered by ticket [tpo/tpa/team#40689][]. That alone could be a few
> day-person of work.
>
> Feedback and coordination of this batch happens in issue [tpo/tpa/team#40692][]
>
> ### High complexity, individually done
>
> Those machines are harder to upgrade, due to some major upgrades of
> their core components, and will require individual attention, if not
> major work to upgrade.
>
> ```
> alberti
> eugeni
> hetzner-hel1-01
> pauli
> ```
>
> Each machine could take a week or two to upgrade, depending on the
> situation and severity. To detail each server:
>
> * `alberti`: `userdir-ldap` is, in general, risky and needs special
> attention, but should be moderately safe to upgrade, see ticket
> [tpo/tpa/team#40693][]
> * `eugeni`: messy server, with lots of moving parts (e.g. Schleuder,
> Mailman), Mailman 2 EOL, needs to decide whether to migrate to
> Mailman 3 or replace with Discourse (and self-host), see
> [tpo/tpa/team#40471][], followup in [tpo/tpa/team#40694][]
> * `hetzner-hel1-01`: Nagios AKA Icinga 1 is end-of-life and needs to
> be migrated to Icinga 2, which involves fixing our git hooks to
> generate Icinga 2 configuration (unlikely), or rebuilding a Icinga
> 2 server, or replacing with Prometheus (see
> [tpo/tpa/team#29864][]), followup in [tpo/tpa/team#40695][]
> * `pauli`: Puppet packages are severely out of date in Debian, and
> Puppet 5 is EOL (with Puppet 6 soon to be). doesn't necessarily
> block the upgrade, but we should deal with this problem sooner than
> later, see [tpo/tpa/team#33588][], followup in [tpo/tpa/team#40696][]
>
> All of those require individual decision and design, and specific
> announcements will be made for upgrades once a decision has been made
> for each service.
>
> ### To retire
>
> Those servers are possibly scheduled for removal and may not be
> upgraded to bullseye at all. If we miss the summer deadline, they
> might be upgraded as a last resort.
>
> ```
> cupani
> gayi
> moly
> peninsulare
> vineale
> ```
>
> Specifically:
>
> * cupani/vineale is covered by [tpo/tpa/team#40472][]
> * gayi is [TPA-RFC-11: SVN retirement][], [tpo/tpa/team#17202][]
> * moly/peninsulare is [tpo/tpa/team#29974][]
>
> ### To rebuild
>
> Those machines are planned to be rebuilt and should therefore not be
> upgraded either:
>
> ```
> cdn-backend-sunet-01
> colchicifolium
> corsicum
> nutans
> ```
>
> Some of those machines are hosted at a Sunet and need to be migrated
> elsewhere, see [tpo/tpa/team#40684][] for details. `colchicifolium` will
> is planned to be rebuilt in the `gnt-chi` cluster, no ticket created
> yet.
>
> They will be rebuilt in new bullseye machines which should allow for a
> safer transition that shouldn't require specific coordination or
> planning.
>
> ### Completed upgrades
>
> Those machines have already been upgraded to (or installed as) Debian
> 11 bullseye:
>
> ```
> btcpayserver-02
> chi-node-01
> chi-node-02
> chi-node-03
> chi-node-04
> chi-node-05
> chi-node-06
> chi-node-07
> chi-node-08
> chi-node-09
> chi-node-10
> chi-node-11
> chi-node-14
> ci-runner-x86-05
> palmeri
> relay-01
> static-gitlab-shim
> tb-pkgstage-01
> ```
>
> ### Other related work
>
> There is other work related to the bullseye upgrade that is mentioned
> in the [%Debian 11 bullseye upgrade milestone][].
>
> # Alternatives considered
>
> We have not set aside time to automate the upgrade procedure any
> further at this stage, as this is considered to be a too risky
> development project, and the current procedure is fast enough for
> now.
>
> We could also move to the cloud, Kubernetes, serverless, and Ethereum
> and pretend none of those things exist, but so far we stay in the real
> world of operating systems.
>
> Also note that this doesn't cover Docker container images
> upgrades. Each team is responsible for upgrading their image tags in
> GitLab CI appropriately and is *strongly* encouraged to keep a close
> eye on those in general. We may eventually consider enforcing stricter
> control over container images if this proves to be too chaotic to
> self-manage.
>
> # Costs
>
> It is estimates this will take one or two person-month to complete, full
> time.
>
> # Approvals required
>
> This proposal needs approval from TPA team members, but service admins
> can request additional delay if they are worried about their service
> being affected by the upgrade.
>
> Comments or feedback can be provided in issues linked above.
>
> # Deadline
>
> Upgrades will start in the first week of April 2022 (2022-04-04)
> unless an objection is raised.
>
> This proposal will be considered adopted by then unless an objection
> is raised within TPA.
>
> # Status
>
> This proposal is currently in the `proposed` state.
>
> # References
>
> * [TPA bullseye upgrade procedure][]
> * [%Debian 11 bullseye upgrade milestone][]
>
> [TPA bullseye upgrade procedure]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/upgrades/bullseye/
> [%Debian 11 bullseye upgrade milestone]: https://gitlab.torproject.org/groups/tpo/tpa/-/milestones/5
> [bullseye]: https://wiki.debian.org/DebianBullseye
> [released on August 14 2021]: https://www.debian.org/News/2021/20210814
> [buster]: howto/upgrades/buster
> [one year after the stable release]: https://www.debian.org/security/faq#lifespan
> [OKR 2022 Q1/Q2 plan]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/roadmap/2022
> [tpo/tpa/team#40690]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40690
> [tpo/tpa/team#40692]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40692
> [tpo/tpa/team#40693]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40693
> [tpo/tpa/team#40471]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40471
> [tpo/tpa/team#29864]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/29864
> [tpo/tpa/team#33588]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/33588
> [tpo/tpa/team#40684]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40684
> [tpo/tpa/team#40694]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40694
> [tpo/tpa/team#40695]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40695
> [tpo/tpa/team#40696]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40696
> [tpo/tpa/team#40472]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40472
> [tpo/tpa/team#17202]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/17202
> [TPA-RFC-11: SVN retirement]: policy/tpa-rfc-11-svn-retirement
> [tpo/tpa/team#29974]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/29974
> [tpo/tpa/team#40689]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40689
>
> --
> Antoine Beaupré
> torproject.org system administration
> _______________________________________________
> tor-project mailing list
> tor-project at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project
More information about the tor-project
mailing list