[tor-project] minutes from the sysadmin meeting

Antoine Beaupré anarcat at torproject.org
Thu Sep 9 17:35:24 UTC 2021


Hi!

I'm back from vacation, so we're meeting again!

Here are the minutes from the meeting held yesterday.

# Roll call: who's there and emergencies

anarcat, kez, lavamind, gaba

No emergencies.

# Milestones for TPA projects

Question: we're going to use the milestones functionality to sort large
projects in the roadmap, which projects should go in there?

We're going to review the roadmap before finishing off the other items
on the checklist, if anything. Many of those are a little too vague to
have clear deadlines and objective tasks. But we agree that we want to
use milestones to track progress in the roadmap.

Milestones may be created outside of the TPA namespace if we believe
they will affect other projects (e.g. Jenkins). Milestones will be
linked from the Wiki page for tracking.

# Roadmap review

Quarterly roadmap review: review priorities of the [2021 roadmap][] to
establish *everything* that we will do this year. Hint: this will
require making hard choices and postponing a certain number of things
to 2022.

[2021 roadmap]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/roadmap/2021

We did this in three stages:

 * Q3: what we did (or did not) do last quarter (and what we need to
   bring to Q4)
 * Q4: what we'll do in the final quarter
 * Must have: what we really need to do by the end of the year (really
   the same as Q4 at this point)

## Q3

We're reviewing Q3 first. Vacations and onboarding happened, and so
did making a plan for the blog.

Removed the "improve communications/monitoring" item: it's too vague
and we're not going to finish it off in Q4.

We kept the RT stuff, but moved it to Q4.

## Q4 review

 * blog migration is going well, we added the discourse forum as an item
   in the roadmap
 * the gitolite/gitweb retirement plan was removed from Q4, we're
   postponing to 2022
 * jenkins migration is going well. websites are the main blocker.
   anarcat is bottomlining it, jerome will help with the webhook
   stuff, migrating status.tpo and then blog.tpo
 * moving the [email submission server ticket][] to the end of the
   list, as it is less of a priority than the other things
 * we're not going to fix [btcpayserver hosting][] yet, but we'll need
   to [pay for it][]
 * kez' projects were not listed in the roadmap so we've added them:
   * donate react.js rewrite
   * rewrite bridges.torproject.org templates as part of Sponsor 30's
     project

[email submission server ticket]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/30608
[btcpayserver hosting]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/33750
[pay for it]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40303

## Must have review

 * **email delivery improvements**: postponed to 2022, in general, and
   will need a tighter/clearer plan, including [mail standards][]
   * we keep that at the top of the list, "continued email
     improvements", next year
 * **service retirements**: SVN/fpcentral will be retired!
 * **scale GitLab with ongoing and surely expanding usage**. this
   happened:
    * we resized the VM (twice?) and provided more runners, including
      the huge shadow runner
    * we can deploy runners with very specific docker configurations
    * we discussed implementing a better system for caching (shared
      caching) and artifacts (an object storage system with minio/s3,
      which could be reused by gitlab pages)
    * scaling the runners and CI infrastructure will be a priority in
      2022
 * **provide reliable and simple continuous integration services**:
   working well! jenkins will be retired!
 * **fixing the blog**: happening
 * **improve communications and monitoring**
   * moving root@ and noise to RT is still planned
   * Nagios is going to require a redesign in 2022, even if just for
     upgrading it, because it is a breaking upgrade. maybe rebuild a
     new server with puppet or consider replacing with Prometheus +
     alert manager

[mail standards]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40363

# Triage

Go through the [web][] and [TPA team board][] and:

 1. reduce the size of the Backlog
 2. establish correctly what will be done next

[web]: https://gitlab.torproject.org/tpo/tpa/team/-/boards/117
[TPA team board]: https://gitlab.torproject.org/groups/tpo/web/-/boards

*Discussion postponed to next weekly check-in.*

# Routine tasks review

A number of routine tasks have fallen by the wayside during my
vacations. Do we want to keep doing them? I'm thinking of:

 1. monthly reports: super useful
 2. weekly office hours: also useful, maybe do a reminder?
 3. "star of the weeks" and regular triage, also provides an
    interruption shield: does not work so well because two people are
    part-time. other teams do triage with gaba once a week, half an
    hour. important to rotate to share the knowledge. a triage-howto
    page would be helpful to have on the wiki to make rotation as
    seamless as possible (see [ticket 40382][])

[ticket 40382]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40382

# Other discussions

No other discussion came up during the meeting.

# Next meeting

In one month, usual time, to be scheduled.

# Metrics of the month

 * hosts in Puppet: 88, LDAP: 91, Prometheus exporters: 142
 * number of Apache servers monitored: 28, hits per second: 145
 * number of Nginx servers: 2, hits per second: 2, hit ratio: 0.82
 * number of self-hosted nameservers: 6, mail servers: 7
 * pending upgrades: 15, reboots: 0
 * average load: 0.33, memory available: 3.39 TiB/4.26 TiB, running
   processes: 647
 * bytes sent: 277.79 MB/s, received: 166.01 MB/s
 * [GitLab tickets][]: ? tickets including...
   * open: 0
   * icebox: 119
   * backlog: 17
   * next: 6
   * doing: 5
   * needs information: 3
   * needs review: 0
   * (closed: 2387)

 [Gitlab tickets]: https://gitlab.torproject.org/tpo/tpa/team/-/boards

# Ticket analysis

| date       | open | icebox | backlog | next | doing | closed | delta | sum  | new  | spill |
|------------|------|--------|---------|------|-------|--------|-------|------|------|-------|
| 2020-11-18 | 1    | 84     | 32      | 5    | 4     | 2119   | NA    | 2245 | NA   | NA    |
| 2020-12-02 | 0    | 92     | 20      | 9    | 8     | 2130   | 11    | 2259 | 14   | -3    |
| 2021-01-19 | 0    | 91     | 20      | 12   | 10    | 2165   | 35    | 2298 | 39   | -4    |
| 2021-02-02 | 0    | 96     | 18      | 10   | 7     | 2182   | 17    | 2313 | 15   | 2     |
| 2021-03-02 | 0    | 107    | 15      | 9    | 7     | 2213   | 31    | 2351 | 38   | -7    |
| 2021-04-07 | 0    | 106    | 22      | 7    | 4     | 2225   | 12    | 2364 | 13   | -1    |
| 2021-05-03 | 0    | 109    | 15      | 2    | 2     | 2266   | 41    | 2394 | 30   | 11    |
| 2021-06-02 | 0    | 114    | 14      | 2    | 1     | 2297   | 31    | 2428 | 34   | -3    |
| 2021-09-07 | 0    | 119    | 17      | 6    | 5     | 2397   | 100   | 2544 | 116  | -16   |
|------------|------|--------|---------|------|-------|--------|-------|------|------|-------|
| mean       | 0.1  | 102.0  | 19.2    | 6.9  | 5.3   | NA     | 30.9  | NA   | 33.2 | -2.3  |
<!-- #+TBLFM:$9=vsum($2..$7)::$10=@0$-1- at -1$-1::$11=$8-$10::@2$10=NA::@2$11=NA::$8=@0$-1- at -1$-1::@2$8=NA::@>$2..$6=vmean(@I.. at II);%.1fEN::@>$8=vmean(@I.. at II);%.1fEN::@>$9=NA::@>$10..$11=vmean(@I.. at II);%.1fEN -->
<!-- the above is an org-mode table and can be reculated by -->
<!-- uncommenting the above formula and hitting "C-c C-c" -->

We have knocked out an average of 33 tickets per month during the
vacations, which is pretty amazing. Still not enough to keep up with
the tide, so the icebox is still filling up.

Also note that there are 3 tickets ("Needs review") that are not
listed in the last month.

Legend:

 * date: date of the report
 * open: untriaged tickets
 * icebox: tickets triaged in the "icebox" ("stalled")
 * backlog: triaged, planned work for the "next" iteration (e.g. "next
   month")
 * next: work to be done in the current iteration or "sprint"
   (e.g. currently a month, so "this month")
 * doing: work being done right now (generally during the day or
   week)
 * closed: completed work
 * delta: number of new closed tickets from last report
 * sum: total number of tickets
 * new: tickets created since the last report
 * spill: difference between "delta" and "new", whether we closed more
   or less tickets than were created

-- 
Antoine Beaupré
torproject.org system administration
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-project/attachments/20210909/e4c1d3d2/attachment.sig>


More information about the tor-project mailing list