[tor-project] GitLab Runner updates

Tue Jun 21 15:33:52 UTC 2022

On 6/20/22 09:20, Antoine Beaupré wrote:
>> While
>> it's fairly straightforward to install a gitlab-runner and execute
>> locally, as far as I can tell a malicious GitLab installation could
>> still send a modified "script" (post-processed .gitlab-ci.yml) or repo
>> checkout down to the runner. Maybe there's some way to audit this, but I
>> couldn't find an obvious one. Maybe configuring the runner to log at
>> debug level would record enough?
>> https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section
> Thtat's not what I mean. I don't mean installing your own runner locally
> and hooking it up with GitLab. I mean installing the gitlab-runner
> package (only!) and *not* hooking it up in GitLab.
>
> Instead, you run the job completely locally, without involving GitLab at
> all. That's done with the `gitlab-runner exec` command:
>
> https://docs.gitlab.com/runner/commands/#gitlab-runner-exec
>
> We have docs about this here:
>
> https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/ci#running-a-job-locally
>
> This removes a large part of the attack surface because GitLab is taken
> out of the equation. It reduces the stack to:
>
>   * your local computer and operating system
>   * your git repository
>   * git
>   * gitlab-runner
>   * the executor (e.g. Docker) and its image
>
> It's still pretty darn large, but it's better than before. :)

Ahhh right, I'd forgotten about `gitlab-runner`'s `exec` feature. 
Unfortunately the current implementation of the feature is a bit hacky 
and not super well-documented. IIUC they took it from a 3rd party pull 
request, tried to rip it back out, but too many people screamed so it's 
still there in a semi-zombie state. It looks like they're working on 
designing a new implementation that they'll be happier with. 
https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2797.

The current version only runs a single job, not a whole pipeline, so you 
still need some wrapper logic for multi-job pipelines to run them in the 
right order, copy artifacts between each-other, initialize 
pipeline-level variables, etc.

For the debian package build I got it partly working, but couldn't find 
a way to run a single-job out of a parameterized matrix (which they use 
to build for multiple platforms and architectures). Given the other 
headaches and lack of documentation I shelved this approach for the 
moment 
(https://gitlab.torproject.org/tpo/core/tor/-/issues/40615#note_2808336).

I agree that this feature is potentially very useful. The "v2" proposal 
of the feature will run a whole pipeline, but communicates with Gitlab 
to help do so, which may defeat the purpose again from our perspective 
(at least without some careful auditing of the communication between 
gitlab and the runner). 
https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2797#proposal

>> For that issue I ended up hacking together a small python script that
>> processes the .gitlab-ci.yml into something to feed directly through
>> Docker. It's currently a bit hacky and specialized for the Debian tor
>> package build. I think it could be generalized further to be reusable if
>> that's of interest (maybe using Docker Compose to orchestrate jobs
>> within a pipeline), but am still thinking about whether there's a better
>> way...
>> https://gitlab.torproject.org/jnewsome/reproduce-tor-debian-build/-/blob/main/reproduce_pipeline.py
> Note that @eighthave has done a similar thing for F-Droid, you might
> want to collaborate.

Thanks, good to know!

> I think the improvement of that over the above is that you remove the
> "gitlab-runner" part of the attack surface. It's a pretty large attack
> surface because the runners are a surprisingly large amount of code, but
> I wonder if it's worth the trouble...
>
> What's the threat model here specifically? Backdoored gitlab-runner code?

Right - I agree there's not much security benefit over the 
`gitlab-runner exec` approach. I just found I ultimately wasn't getting 
that much benefit out of it since I was already having to write all the 
pipeline-orchestration, and got tired of wrestling with the lack of 
documentation etc :).

>> Right now my top candidate we haven't tried yet is to install a full
>> local GitLab in addition to a local gitlab-runner; maybe using their
>> published Docker imageshttps://docs.gitlab.com/ee/install/docker.html.
>> This seems like the least engineering effort (~none) but a bit more work
>> for every individual wanting to do such a local build.
> Other organisations run *two* GitLab instances for that purpose, by the
> way. GitLab.com included, from what I understand.
Interesting
>> Keeping as much logic out of the .gitlab-ci.yml as possible so that the
>> gitlab yml is trivial to manually reproduce outside of gitlab (e.g. run
>> `./build.sh`) is probably ideal, though gives up some gitlab
>> functionality.
> What functionality are you thinking of here?

For example the debian package build in particular makes heavy use of 
yml templating. The same thing could be achieved other ways - e.g. 
moving the yml snippets out to shell files/functions that can be invoked 
by the other "job scripts", but it adds more indirection and 
fragmentation vs having everything in one place in the yml file.

For multi-job pipelines, you also still end up having to duplicate the 
outer orchestration between jobs in the pipeline between yml and some 
other driver script. You can mitigate this by using fewer jobs (maybe 
just 1) but that's again giving up some gitlab functionality.

> Thanks for the input! :) 
:)