[tor-project] TPA-RFC-71: Emergency email deployments, phase B
Antoine Beaupré
anarcat at torproject.org
Wed Oct 2 15:36:52 UTC 2024
Hi again,
It looks like some Thunderbird users couldn't read the attachment, so
here's a resend that flattens the email and should be more readable.
The proposal is also visible at:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-71-emergency-email-deployments-round-2
with the milestone tracking actual work issues in:
https://gitlab.torproject.org/groups/tpo/tpa/-/milestones/16
HTH,
A.
--
Antoine Beaupré
torproject.org system administration
On 2024-10-02 11:27:35, Antoine Beaupré wrote:
> Hi,
>
> So TPA has adopted this proposal, internally, to make yet another set of
> emergency changes to our mail system, to respond to critical issues
> affecting delivery and sustainability of our infrastructure.
>
> I encourage you to read the "Affected users" section and "Timeline"
> below. In particular, we will be experimenting with "sender rewriting"
> soon, which will involve mangling emails we forward around to try and
> fix deliverability on those.
>
> The schleuder mailing list will also move servers.
>
> Maintenance windows for those changes will be communicated separately.
>
> Thank you for your attention!
>
> PS: and no, we didn't submit this for adoption to everyone, because it
> was felt it was mostly technical changes that didn't warrant outside
> approval, let me know if that doesn't make sense, of course.
>
> --
> Antoine Beaupré
> torproject.org system administration
>
> From: Antoine Beaupré via tpa-team <tpa-team at lists.torproject.org>
> Subject: [tpa-team] TPA-RFC-71: Emergency email deployments, phase B
> To: tpa-team at lists.torproject.org
> Cc: micah anderson <micah at torproject.org>
> Date: Thu, 26 Sep 2024 16:09:20 -0400
>
> ---
> title: TPA-RFC-71: Emergency email deployments, phase B
> costs: staff
> approval: TPA
> affected users: all torproject.org email users
> deadline: 5 days, 2024-10-01
> status: draft
> discussion: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41778
> ---
>
> Summary: deploy a new sender-rewriting mail forwarder ASAP, migrate
> mailing lists off the legacy server to a new machine, migrate the
> remaining Schleuder list to the Tails server, upgrade `eugeni`.
>
> Table of contents:
>
> - Background
> - Proposal
> - Actual changes
> - Mailman 3 upgrade
> - New sender-rewriting mail exchanger
> - Schleuder migration
> - Upgrade legacy mail server
> - Goals
> - Must have
> - Nice to have
> - Non-Goals
> - Scope
> - Affected users
> - Personas
> - Timeline
> - Optimistic timeline
> - Worst case scenario
> - Alternatives considered
> - References
> - History
> - Personas descriptions
> - Ariel, the fundraiser
> - Blipblop, the bot
> - Gary, the support guy
> - John, the contractor
> - Mallory, the director
> - Nancy, the fancy sysadmin
> - Orpheus, the developer
>
> # Background
>
> In [#41773][], we had yet another report of issues with mail delivery,
> particularly with email forwards, that are plaguing Gmail-backed
> aliases like grants@ and travel at .
>
> [#41773]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41773
>
> This is becoming critical. It has been impeding people's capacity of
> using their email at work for a while, but it's been more acute since
> google's recent changes in email validation (see [#41399][]) as now
> hosts that have adopted the SPF/DKIM rules are bouncing.
>
> [#41399]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41399
>
> On top of that, we're way behind on our buster upgrade schedule. We
> still have to upgrade our primary mail server, `eugeni`. The plan for
> that ([TPA-RFC-45][], [#41009][]) was to basically re-architecture
> everything. That won't happen fast enough for the LTS retirement which
> we have crossed two months ago (in July 2024) already.
>
> [TPA-RFC-45]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-45-mail-architecture
> [#41009]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41009
>
> So, in essence, our main mail server is unsupported now, and we need
> to fix this as soon as possible
>
> Finally, we also have problems with certain servers (e.g. `state.gov`)
> that seem to dislike our bespoke certificate authority (CA) which
> makes *receiving* mails difficult for us.
>
> # Proposal
>
> So those are the main problems to fix:
>
> - Email forwarding is broken
> - Email reception is unreliable over TLS for some servers
> - Mail server is out of date and hard to upgrade (mostly because of
> Mailman)
>
> ## Actual changes
>
> The proposed solution is:
>
> - **Mailman 3 upgrade** ([#40471][])
>
> - **New sender-rewriting mail exchanger** ([#40987][])
>
> - **Schleuder migration**
>
> - **Upgrade legacy mail server** ([#40694][])
>
> [#40471]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40471
> [#40987]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40987
> [#40694]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40694
>
> ### Mailman 3 upgrade
>
> Build a new mailing list server to host the upgraded Mailman 3
> service. Move old lists over and convert them while retaining the old
> archives available for posterity.
>
> This includes lots of URL changes and user-visible disruption, little
> can be done to work around that necessary change. We'll do our best to
> come up with redirections and rewrite rules, but ultimately this is a
> disruptive change.
>
> This involves yet another authentication system being rolled out, as
> Mailman 3 has its own user database, just like Mailman 2. At least
> it's one user per site, instead of per list, so it's a slight
> improvement.
>
> This is issue [#40471][].
>
> ### New sender-rewriting mail exchanger
>
> This step is carried over from [TPA-RFC-45][], mostly unchanged.
>
> [Sender Rewriting Scheme]: https://en.wikipedia.org/wiki/Sender_Rewriting_Scheme
> [postsrsd]: https://github.com/roehling/postsrsd
> [postforward]: https://github.com/zoni/postforward
>
> Configure a new "mail exchanger" (MX) server with TLS certificates
> signed by our normal public CA (Let's Encrypt). This replaces that
> part of `eugeni`, will hopefully resolve issues with `state.gov` and
> others ([#41073][], [#41287][], [#40202][], [#33413][]).
>
> [#33413]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/33413
> [#40202]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40202
> [#41287]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41287
> [#41073]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41073
>
> This would handle forwarding mail to other services (e.g. mailing
> lists) but also end-users.
>
> To work around reputation problems with forwards ([#40632][],
> [#41524][], [#41773][]), deploy a [Sender Rewriting Scheme][] (SRS)
> with [postsrsd][] (packaged in Debian, but [not in the best shape][])
> and [postforward][] (not packaged in Debian, but zero-dependency
> Golang program).
>
> It's possible deploying [ARC][] headers with [OpenARC][], Fastmail's
> [authentication milter][] (which [apparently works better][]), or
> [rspamd's arc module][] might be sufficient as well, to be tested.
>
> [OpenARC]: https://tracker.debian.org/pkg/openarc
>
> [not in the best shape]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1017361
>
> Having it on a separate mail exchanger will make it easier to swap in
> and out of the infrastructure if problems would occur.
>
> The mail exchangers should also sign outgoing mail with DKIM, and
> *may* start doing better validation of incoming mail.
>
> [authentication milter]: https://github.com/fastmail/authentication_milter
> [apparently works better]: https://old.reddit.com/r/postfix/comments/17bbhd2/about_arc/k5iluvn/
> [rspamd's arc module]: https://rspamd.com/doc/modules/arc.html
> [#41524]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41524
> [#40632]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40632
> [ARC]: http://arc-spec.org/
>
> ### Schleuder migration
>
> Migrate the remaining mailing list left (the Community Council) to the
> Tails Shleuder server, retiring our Schleuder server entirely.
>
> This requires configuring the Tails server to accept mail for
> `@torproject.org`.
>
> Note that this may require changing the addresses of the existing
> Tails list to `@torproject.org` if Schleuder doesn't support virtual
> hosting (which is likely).
>
> ### Upgrade legacy mail server
>
> Once Mailman has been safely moved aside and is shown to be working
> correctly, upgrade Eugeni using the normal procedures. This should be
> a less disruptive upgrade, but is still risky because it's such an old
> box with lots of legacy.
>
> One key idea of this proposal is to keep the legacy mail server,
> `eugeni`, in place. It will continue handling the "MTA" (Mail Transfer
> Agent) work, which is to relay mail for other hosts, as a legacy
> system.
>
> The full eugeni replacement is seen as too complicated and unnecessary
> at this stage. The legacy server will be isolated from the rewriting
> forwarder so that outgoing mail is mostly unaffected by the forwarding
> changes.
>
> ## Goals
>
> This is not an exhaustive solution to all our email problems,
> [TPA-RFC-45][] is that longer-term project.
>
> ### Must have
>
> - Up to date, supported infrastructure.
>
> - Functional legacy email forwarding.
>
> ### Nice to have
>
> - Improve email forward deliverability to Gmail.
>
> ### Non-Goals
>
> - **Clean email forwarding**: email forwards *may* be mangled and
> rewritten to appear as coming from `@torproject.org` instead of the
> original address. This will be figured out at the implementation
> stage.
>
> - **Mailbox storage**: out of scope, see [TPA-RFC-45][]. It is hoped,
> however, that we *eventually* are able to provide such a service, as
> the sender-rewriting stuff might be too disruptive in the long run.
>
> - **Technical debt**: we keep the legacy mail server, `eugeni`.
>
> - **Improved monitoring**: we won't have a better view in how well we
> can deliver email.
>
> - **High availability**: the new servers will not add additional
> "single point of failures", but will not improve our availability
> situation (issue [#40604][])
>
> [#40604]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40604
>
> ## Scope
>
> This proposal affects the all inbound and outbound email services
> hosted under `torproject.org`. Services hosted under `torproject.net`
> are *not* affected.
>
> It also does *not* address directly phishing and scamming attacks
> ([#40596][]), but it is hoped the new mail exchanger will provide
> a place where it is easier to make such improvements in the future.
>
> [#40596]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40596
>
> ## Affected users
>
> This affects all users which interact with `torproject.org` and its
> subdomains over email. It particularly affects all "tor-internal"
> users, users with LDAP accounts, or forwards under `@torproject.org`,
> as their mails will get rewritten on the way out.
>
> ### Personas
>
> Here we collect a few "personas" and try to see how the changes will
> affect them, largely derived from [TPA-RFC-45][], but without the
> alpha/beta/prod test groups.
>
> For *all* users, a common impact is that emails will be rewritten by
> the sender rewriting system. As mentioned above, the impact of this
> still remains to be clarified, but at least the hidden `Return-Path`
> header will be changed for bounces to go to our servers.
>
> Actual personas are in the Reference section, see [Personas
> descriptions][].
>
> | Persona | Task | Impact |
> |---------|-------------|--------------------------------------------------------------------------|
> | Ariel | Fundraising | Improved incoming delivery |
> | Blipbot | Bot | No change |
> | Gary | Support | Improved incoming delivery, new moderator account on mailing list server |
> | John | Contractor | Improved incoming delivery |
> | Mallory | Director | Same as Ariel |
> | Nancy | Sysadmin | No change in delivery, new moderator account on mailing list server |
> | Orpheus | Developer | No change in delivery |
>
> [Personas descriptions]: #personas-descriptions
>
> ## Timeline
>
> ### Optimistic timeline
>
> - Late September (W39): issue raised again, proposal drafted (now)
> - October:
> - W40: proposal approved, installing new rewriting server
> - W41: rewriting server deployment, new mailman 3 server
> - W42: mailman 3 mailing list conversion tests, users required for testing
> - W43: mailman 2 retirement, mailman 3 in production
> - W44: Schleuder mailing list migration
> - November:
> - W45: `eugeni` upgrade
>
> ### Worst case scenario
>
> - Late September (W39): issue raised again, proposal drafted (now)
> - October:
> - W40: proposal approved, installing new rewriting server
> - W41-44: difficult rewriting server deployment
> - November:
> - W44-W48: difficult mailman 3 mailing list conversion and testing
> - December:
> - W49: Schleuder mailing list migration vetoed, Schleuder stays on
> `eugeni`
> - W50-W51: `eugeni` upgrade postponed to 2025
> - January 2025:
> - W3: `eugeni` upgrade
>
> # Alternatives considered
>
> We decided to not just run the sender-rewriting on the legacy mail
> server because too many things are tangled up in that server. It is
> just too risky.
>
> We have also decided to not upgrade Mailman in place for the same
> reason: it's seen as too risky as well, because we'd first need to
> upgrade the Debian base system and if that fails, rolling back is too
> hard.
>
> # References
>
> - [discussion issue][]
>
> [discussion issue]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41778
>
> ## History
>
> This is the the *fifth* proposal about our email services, here are
> the previous ones:
>
> * [TPA-RFC-15: Email services][] (rejected, replaced with TPA-RFC-31)
> * [TPA-RFC-31: outsource email services][] (rejected, in favor of
> TPA-RFC-44 and following)
> * [TPA-RFC-44: Email emergency recovery, phase A][] (standard, and
> mostly implemented except the sender-rewriting)
> * [TPA-RFC-45: Mail architecture][] (still draft)
>
> [TPA-RFC-15: Email services]: policy/tpa-rfc-15-email-services
> [TPA-RFC-31: outsource email services]: policy/tpa-rfc-31-outsource-email
> [TPA-RFC-44: Email emergency recovery, phase A]: policy/tpa-rfc-44-email-emergency-recovery
> [TPA-RFC-45: Mail architecture]: policy/tpa-rfc-45-mail-architecture
>
> ## Personas descriptions
>
> ### Ariel, the fundraiser
>
> Ariel does a lot of mailing. From talking to fundraisers through
> their normal inbox to doing mass newsletters to thousands of people on
> CiviCRM, they get a lot done and make sure we have bread on the table
> at the end of the month. They're awesome and we want to make them
> happy.
>
> Email is absolutely mission critical for them. Sometimes email gets
> lost and that's a major problem. They frequently tell partners their
> personal Gmail account address to work around those problems. Sometimes
> they send individual emails through CiviCRM because it doesn't work
> through Gmail!
>
> Their email forwards to Google Mail and they now have an LDAP account
> to do email delivery.
>
> ### Blipblop, the bot
>
> Blipblop is not a real human being, it's a program that receives
> mails and acts on them. It can send you a list of bridges (bridgedb),
> or a copy of the Tor program (gettor), when requested. It has a
> brother bot called Nagios/Icinga who also sends unsolicited mail when
> things fail.
>
> There are also bots that sends email when commits get pushed to some
> secret git repositories.
>
> ### Gary, the support guy
>
> Gary is the ticket overlord. He eats tickets for breakfast, then
> files 10 more before coffee. A hundred tickets is just a normal day at
> the office. Tickets come in through email, RT, Discourse, Telegram,
> Snapchat and soon, TikTok dances.
>
> Email is absolutely mission critical, but some days he wishes there
> could be slightly less of it. He deals with a lot of spam, and surely
> something could be done about that.
>
> His mail forwards to Riseup and he reads his mail over Thunderbird
> and sometimes webmail. Some time after TPA-RFC_44, Gary managed to
> finally get an OpenPGP key setup and TPA made him a LDAP account so he
> can use the submission server. He has already abandoned the Riseup
> webmail for TPO-related email, since it cannot relay mail through the
> submission server.
>
> ### John, the contractor
>
> John is a freelance contractor that's really into privacy. He runs his
> own relays with some cools hacks on Amazon, automatically deployed
> with Terraform. He typically run his own infra in the cloud, but
> for email he just got tired of fighting and moved his stuff to
> Microsoft's Office 365 and Outlook.
>
> Email is important, but not absolutely mission critical. The
> submission server doesn't currently work because Outlook doesn't allow
> you to add just an SMTP server. John does have an LDAP account,
> however.
>
> ### Mallory, the director
>
> Mallory also does a lot of mailing. She's on about a dozen aliases
> and mailing lists from accounting to HR and other unfathomable
> things. She also deals with funders, job applicants, contractors,
> volunteers, and staff.
>
> Email is absolutely mission critical for her. She often fails to
> contact funders and critical partners because `state.gov` blocks our
> email -- or we block theirs! Sometimes, she gets told through LinkedIn
> that a job application failed, because mail bounced at Gmail.
>
> She has an LDAP account and it forwards to Gmail. She uses Apple Mail
> to read their mail.
>
> ### Nancy, the fancy sysadmin
>
> Nancy has all the elite skills in the world. She can configure a
> Postfix server with her left hand while her right hand writes the
> Puppet manifest for the Dovecot authentication backend. She browses
> her mail through a UUCP over SSH tunnel using mutt. She runs her own
> mail server in her basement since 1996.
>
> Email is a pain in the back and she kind of hates it, but she still
> believes entitled to run their own mail server.
>
> Her email is, of course, hosted on her own mail server, and she has
> an LDAP account. She has already reconfigured her Postfix server to
> relay mail through the submission servers.
>
> ### Orpheus, the developer
>
> Orpheus doesn't particular like or dislike email, but sometimes has
> to use it to talk to people instead of compilers. They sometimes have
> to talk to funders (`#grantlyfe`), external researchers, teammates or
> other teams, and that often happens over email. Sometimes email is
> used to get important things like ticket updates from GitLab or
> security disclosures from third parties.
>
> They have an LDAP account and it forwards to their self-hosted mail
> server on a OVH virtual machine. They have already reconfigured their
> mail server to relay mail over SSH through the jump host, to the
> surprise of the TPA team.
>
> Email is not mission critical, and it's kind of nice when it goes
> down because they can get in the zone, but it should really be working
> eventually.
>
> --
> Antoine Beaupré
> torproject.org system administration
> --
> tpa-team mailing list
> tpa-team at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tpa-team
> _______________________________________________
> tor-project mailing list
> tor-project at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project
More information about the tor-project
mailing list