[tor-bugs] #14744 [GetTor]: Automate upload of latest Tor Browser to cloud services
Tor Bug Tracker & Wiki
blackhole at torproject.org
Sat Apr 4 04:54:56 UTC 2015
#14744: Automate upload of latest Tor Browser to cloud services
------------------------+----------------------
Reporter: ilv | Owner: ilv
Type: defect | Status: reopened
Priority: major | Milestone:
Component: GetTor | Version:
Resolution: | Keywords:
Actual Points: | Parent ID:
Points: |
------------------------+----------------------
Changes (by ilv):
* status: closed => reopened
* resolution: implemented =>
Comment:
Replying to [comment:6 isis]:
>
> Hey ilv! Great work! I see that
[https://github.com/ilv/gettor/blob/develop/upload/fetch_latest_torbrowser.py
your current script] still uses `os.system(cmd)`… were you still planning
to use Twisted? Using `os.system()` is really not recommended in the
Python world.
>
\\
hey isis, thanks! and thanks for taking the time to review this! tbh, I
discarded using Twisted (for SSL verification) because wget fails (and
thus the whole script) if the certificate is incorrect.
\\
> Some issues I see with the current implementation are:
>
> 1. If the `os.system("wget […]"` command fails entirely, or only
downloads a portion of a bundle, you'll never know because you're not
checking the returned exit status code.
>
> 2. There is no mechanism for resuming downloads, if !#1 happens.
>
\\
Correct, thanks for pointing this out.
\\
> 3. Doing
> {{{
> for provider in UPLOAD_SCRIPTS:
> os.system("python2.7 %s" % UPLOAD_SCRIPTS[provider])
> }}}
> doesn't scale to more provider scripts than the Gettor machine has
CPU cores, since most Python scripts will stupidly hog an entire core. It
also doesn't take into account memory limitations (and thus, the more
providers Gettor has, the more likely for this code to OOM the Gettor
machine), nor network bandwidth limitations (nor the effect that any
network bandwidth limitations might have on other upload scripts being
executed).
>
\\
Correct me if I'm wrong, but the scripts for each provider should be
executed sequentially, so I'm not sure about the scalability problems
related to the CPU cores. And you are right again, I haven't taken into
account nor the memory limitations nor the network bandwidth limitations.
I guess Twisted should be helpful for these points.
\\
> Second, which doesn't matter, but the syntax is a bit odd; normally
one might do
> {{{
> for provider, script in UPLOAD_SCRIPTS.items():
> os.system("python2.7 %s" % script)
> }}}
> or, if nothing is using `provider`, then the for loop should more
optimally look like:
> {{{
> for script in UPLOAD_SCRIPTS.values():
> […]
> }}}
>
\\
/me is still a python noob :P
\\
> By using Twisted instead, particularly if you have the
[https://pypi.python.org/pypi/service_identity service_identity] module
installed, and then with a trivially implementable amount of extra code,
having leaf or root certificate pinning is possible. Not to mention the
speed increases and parallelisation that become possible using Twisted.
If you want an example of a standalone script for downloading something
over TLS with Twisted,
[https://gitweb.torproject.org/user/isis/bridgedb.git/tree/scripts/get-
tor-exits?h=develop BridgeDB's script for downloading the list of Tor Exit
relays] (into memory or a file, in this case) might be helpful, as well as
[https://gitweb.torproject.org/user/isis/bridgedb.git/tree/lib/bridgedb/proxy.py?h=develop#n358
the way BridgeDB uses this script as a Protocol]
(`twisted.internet.protocol.Protocol`) and
[https://gitweb.torproject.org/user/isis/bridgedb.git/tree/lib/bridgedb/proxy.py?h=develop#n32
manages that Protocol within a Twisted program] (so that the list in this
case is loaded directly into memory for the servers in the cluster without
wasting a bunch of time doing disk I/O. This latter part is less
applicable to your case, but it does demonstrate how tasks such as these
can be running parallel to the rest of your program. Oh, and they can also
be
[https://gitweb.torproject.org/user/isis/bridgedb.git/tree/lib/bridgedb/Main.py?h=develop#n525
easily scheduled], because f!@# cron too.)
\\
Thanks a lot for this info! Now I'm convinced again that I should use
Twisted :)
\\
> You could also quite easily check the `*.asc` files on the downloaded
bundles to ensure that the whole thing downloaded properly. If you were to
use [https://pypi.python.org/pypi/gnupg python-gnupg] to do it, it would
look something like:
>
> {{{
> import gnupg
> import glob
> # The GNUPG_HOME_DIR should have the correct signing keys in its
pubring.gpg
> # file (so geko's and mikeperry's keys, and the Tor Browser signing key,
at
> # the minimum).
> gpg = gnupg.GPG(homedir=GNUPG_HOME_DIR)
> signatures = glob.glob("%s/*.asc" % latest_version)
> verified = []
> unverified = []
> for sig in signatures:
> bundle = sig.rstrip(".asc")
> with open(bundle, 'rb') as fh:
> data = fh.read()
> result = gpg.verify(data, sig)
> if result.valid:
> verified.append(bundle)
> }}}
\\
'''Awesome''', thanks again!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/14744#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list