[tor-project] minutes from the sysadmin meeting
Antoine Beaupré
anarcat at torproject.org
Wed Jan 20 19:02:06 UTC 2021
Hi!
It feels so strange to say that this year around, but... happy new year
everyone! Let's hope we can do better this time around. ;)
Here's your first sysadmin report for 2021, hopefully we'll keep you
informed of our progress steadily in the coming year. Right now we're
working on the roadmap and, even though we asked you for feedback in the
user survey, it's still time to steer us in the good direction. We have
a meeting coming up where we're likely to set that more in stone, so now
is a good time if you forgot to respond to the survey...
Now onto the minutes.
Agenda:
- Roll call: who's there and emergencies
- Dashboard review
- Roadmap 2021 proposal
- 2020 retrospective
- Services survey
- Goals for 2021
- Other discussions
- Next meeting
- Metrics of the month
# Roll call: who's there and emergencies
present: hiro, gaba, anarcat
[GitLab backups are broken][]: it might need more disk space than we
need. just bump disk space in the short term, consider changing the
backups system, in the long term.
[GitLab backups are broken]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40143
# Dashboard review
We [reviewed the dashboard][], too much stuff in January, but we'll
review in February.
[reviewed the dashboard]: https://gitlab.torproject.org/tpo/tpa/team/-/boards
# Roadmap 2021 proposal
We discussed the [roadmap project][] anarcat worked on. We reviewed
the 2020 retrospective, talked about the services survey, and
discussed goals for 2021.
[roadmap project]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/roadmap/2021
## 2020 retrospective
We reviewed and discussed the [2020 roadmap evaluation][] that anarcat
prepared:
[2020 roadmap evaluation]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/roadmap/2021#2020-roadmap-evaluation
* **what worked?** we did the "need to have" even through the
apocalypse, staff reduction and all the craziness of 2020! success!
* **what was a challenge?**
* monthly tracking was not practical, and hard to do in
Trac. things are a lot easier with GitLab's dashboard.
* it was hard to work through the pandemic.
* **what can we change?**
* do quarterly-based planning
* estimates were off because so many things happened that we did
not expect. reserve time for the unexpected, reduce expectations.
* ticket triage is rotated now.
## Services survey
We discussed the [survey results analysis][] briefly, and how it is
used as a basis for the roadmap brainstorm. The two major services
people use are GitLab and email, and those will be the focus of the
roadmap for the coming year.
[survey results analysis]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/roadmap/2021#survey-results
## Goals for 2021
* email services stabilisation ("submission server", "my email end up
in spam", CiviCRM bounce handling, etc) - consider [outsourcing
email services][]
* gitlab migration continues (Jenkins, gitolite)
* simplify / improve puppet code base
* stabilise services (e.g. gitlab, schleuder)
[outsourcing email services]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/submission#cost
Next steps for the roadmap:
* try to make estimates
* add need to have, nice to have
* anarcat will work on a draft based on the brainstorm
* we meet again in one week to discuss it
# Other discussions
Postponed: metrics services to maintain until we hire new person
# Next meeting
Same time, next week.
# Metrics of the month
Fun fact: we crossed the 2TiB total available memory back in November
2020, almost double from the previous report (in July), even with the
number of hosts in Puppet remained mostly constant (78 vs 72). This is
due (among other things) to the new Cymru Ganeti cluster that added a
whopping 1.2TiB of memory to our infrastructure!
* hosts in Puppet: 82, LDAP: 85, Prometheus exporters: 134
* number of Apache servers monitored: 27, hits per second: 198
* number of Nginx servers: 2, hits per second: 3, hit ratio: 0.86
* number of self-hosted nameservers: 6, mail servers: 12
* pending upgrades: 3, reboots: 0
* average load: 0.29, memory available: 2.00 TiB/2.61 TiB, running
processes: 512
* bytes sent: 265.07 MB/s, received: 155.20 MB/s
* [GitLab tickets][]: 113 tickets including...
* open: 0
* icebox: 91
* backlog: 20
* next: 12
* doing: 10
* (closed: 2165)
[Gitlab tickets]: https://gitlab.torproject.org/tpo/tpa/team/-/boards
Now also available as the main Grafana dashboard. Head to
<https://grafana.torproject.org/>, change the time period to 30 days,
and wait a while for results to render.
--
Antoine Beaupré
torproject.org system administration
More information about the tor-project
mailing list