[tor-dev] Dealing with frequent suspends on Android

Tue Nov 27 13:04:35 UTC 2018

Nick Mathewson:
> On Mon, Nov 5, 2018 at 12:38 PM Michael Rogers <michael at briarproject.org> wrote:
>>
>> Hi all,
>>
>> It's great to see that some children of #25500 have already been
>> released in the 0.3.4 series. Can I ask about the longer-term plan for
>> this work, and whether #23289 (or something similar) is part of it?
>>
>> The context for my question is that we're trying to reduce Briar's power
>> consumption. Until now we've held a wake lock to keep the CPU awake all
>> the time, but normally an Android device would go into "deep sleep"
>> (which corresponds to suspend on other platforms) whenever the screen's
>> turned off, apart from brief wakeups for alarms and incoming network
>> traffic. Holding a permanent wake lock has a big impact on battery life.
>>
>> Most of our background work can be handled with alarms, but we still
>> need to hold a wake lock whenever Tor's running because libevent timers
>> don't fire when the CPU's asleep, and Tor gets a nasty surprise when it
>> wakes up and all its timers are late.
>>
>> It looks like most of the work has been moved off the one-second
>> periodic timer, which is great, but I assume that work's now being
>> scheduled by other means and still needs to be done punctually, which we
>> can't currently guarantee on Android without a wake lock.
>>
>> As far as I can tell, getting rid of the wake lock requires one of the
>> following:
>>
>> 1. Tor becomes extremely tolerant of unannounced CPU sleeps. I don't
>> know enough about Tor's software architecture to know how feasible this
>> is, but my starting assumption would be that adapting a network-oriented
>> codebase that's been written for a world where time passes at a steady
>> rate and timers fire punctually, to a world where time passes in fits
>> and starts and timers fire eventually, would be a nightmare.
>>
>> 2. Tor tolerates unannounced CPU sleeps within some limits. This is
>> similar to the previous scenario, except the controller sets a regular
>> alarm to ensure the CPU never sleeps for too long, and libevent ensures
>> that when the CPU wakes up, any overdue timers fire immediately (maybe
>> this happens already?). Again, I'd assume that adapting Tor to this
>> environment would be a huge task, but at least there'd be limits on the
>> insanity.
>>
>> One of the difficulties with this option is that under some conditions,
>> the controller can only schedule one alarm every 15 minutes. Traffic
>> from the guard would also wake the CPU, so if we could ask the guard for
>> regular keepalives, we might be able to promise that the CPU will wake
>> once every keepalive interval, unless the guard connection's lost, in
>> which case it will wake once every 15 minutes. But keepalives from the
>> guard would require a protocol change, which would take time to roll
>> out, and would let the guard know (if it doesn't already) that the
>> client's running on Android.
>>
>> 3. Tor knows when it next needs to wake up, and relies on the controller
>> to wake it. This requires a way for the controller to query Tor, and Tor
>> to query libevent, for the next timer that needs to fire (perhaps from
>> some subset of timers that must fire punctually even if the CPU's
>> asleep). Libevent doesn't need to detect overdue timers by itself, but
>> it needs to provide a hook for re-checking whether timers are overdue.
>> The delay until the next timer needs to be at least a few seconds long,
>> at least some of the time, for sleeping to be worthwhile. And finally,
>> even if all those conditions are met, we run up against the 15-minute
>> limit on alarms again.
>>
>> None of these options are good. I'm not even sure if the first and third
>> are feasible. But I don't know enough about Tor or libevent to be sure.
>> If you do, I'd be really grateful for your help in understanding the
>> constraints here.
> 
> Hi!  I don't know if this will be useful or not, but I'm wondering if
> you've seen this ticket:
>   https://trac.torproject.org/projects/tor/ticket/28335
> 
> The goal of this branch is to create a "dormant mode" where Tor does
> not run any but the most delay- and rescheduling-tolerant of its
> periodic events.  Tor enters this mode if a controller tells it to, or
> if (as a client) it passes long enough without user activity.  When in
> dormant mode, it doesn't disconnect from the network, and it will wake
> up again if the controller tells it to, or it receives a new client
> connection.
> 
> Would this be at all helpful for any of this?

I think dormant mode sounds like it goes a long way towards making Tor
operate the way that Android and iOS apps are expected to.  The last
missing piece towards making tor daemon behave like native service on
those platforms would be a method to make the dormant mode survive being
killed and restarted.

In Android and iOS, there isn't really a "dormant" or "sleeping" state
for processes like there is for desktop processes.  The idea in mobile
is that the process serializes any required state out, and is then
killed entirely.  That process might then be restarted within a minute,
5 minutes, a day, a week depending on what the user does.  iOS is
especially strict with this.

So if this dormant mode would survive being killed and restarted, then
we'll see big gains in battery usage.  Usually, the best way to achieve
this is to always write required state out to disk as it changes, e.g.
to the built-in sqlite.  That is because Android/iOS will try to give a
process warning before killing it, but they do not _guarantee_ that any
warning will be given.  It is fully valid and indeed common for a
process to be killed without any warning whatsoever.

.hc

-- 
PGP fingerprint: EE66 20C7 136B 0D2C 456C  0A4D E9E2 8DEA 00AA 5556
https://pgp.mit.edu/pks/lookup?op=vindex&search=0xE9E28DEA00AA5556