instrumenting client downloads

Peter Palfrader peter at palfrader.org
Mon Jun 16 22:01:31 UTC 2008


Hi,

in order to confirm our idea of how much a Tor client needs to download
to work, and what it spends these bytes on we have added code to the
Tor client that lets us learn real numbers.

To enable this code you have to build the current 0.2.1.x tree in svn
with --enable-instrument-downloads passed to the configure script.

This makes the Tor process keep track of, among other things, how many
download requests it completed for each of certs, server descriptors and
consensuses and how large they were in total.  It also keeps track of
how many EXTEND and how many CREATE (FAST) cells it sends.

If your Tor is built with this support enabled you can get the numbers
via Tor's control port with doing a "GETINFO dir-usage", or, if your
Tor process has an open DirPort, at /tor/bytes.txt.

The data you'll get there looks like this:
| cell: create fast  271  271
| cell: extend  347  347
| dl/cert.z  9804  1
| dl/consensus  1865179  13
| dl/server  4285272  84

This Tor client sent, up to this point, 271 CREATE_FAST cells, 347 EXTEND
cells and it downloaded 9804 bytes of certificates in one request,
1865179 consensus documents in 13 requests, and 4285272 bytes of server
descriptors in 84 requests.  The number of bytes includes only the size
of the actual document bodies, it does not include any protocol overhead
at any level (tor/http/tcp/ip).


http://asteria.noreply.org/~weasel/Tor/louise-stats-1.gz has some
numbers for a 0.1.2.1-trunk (around June 9th) Tor client, on a fast
network. The Tor client was busy fetching a website every 5 minutes.

http://asteria.noreply.org/~weasel/Tor/tor-client-download-stats-longterm-cells.jpg
http://asteria.noreply.org/~weasel/Tor/tor-client-download-stats-startup-cells.jpg
http://asteria.noreply.org/~weasel/Tor/tor-client-download-stats-longterm-dl.jpg
http://asteria.noreply.org/~weasel/Tor/tor-client-download-stats-startup-dl.jpg

visualize them somehow.

The data shows that in the 19 hours the Tor client has run it fetched a
consensus 13 times, and that each consensus fetch was followed by a
download of all the referenced descriptors.  Each consensus was around
140kb in size (implemented proposal 138 shrinks them to ~90kb, and
proposal 140 will, once implemented, bring updates to the consensus down
to 13kb per hour).  Very roughly 150 kilobytes of server descriptors
had to be downloaded per hour on average to keep up with changes.

In its 19 hour lifetime this particular client sent 271 create fast
cells and 347 extend cells.  An average server descriptor currently
appears to be a bit over 2 kilobytes in size.  Even if we had downloaded
a 4kb descriptor for every single extend cell we sent we would only have
used about 1.4 megabytes, or roughly the capacity of one of those
ancient 3.5" high density floppy disks.

(Create_fast cells are most often used when doing encrypted dir
requests are done and in that case we wouldn't need to download a server
descriptor for the dirserver (also, we would probably not make that many
requests with proposal 141 anyway).   They are also used for the first
hop of a tor circuit, i.e. when connecting to our Guard node.  In those
cases we'd have to request the descriptor once but could then cache it.)


Yours,
weasel

PS: Attached is the script I used for dumping the data every 5 seconds.
It expects to read the controller password from a file called
"controller-password" in its working directory, and will dump the stats
to the file given as its first and only argument.  Control port number
is hardcoded to 19051, easy to change in the source.
-------------- next part --------------
#!/usr/bin/ruby

# Copyright (c) 2005, 2006, 2007, 2008 Peter Palfrader

require 'socket'
require 'yaml'
require 'thread'

Thread.abort_on_exception = true;

PASSWORD = File.new("controller-password").read.chop
outfile = File.new(ARGV[0], mode="a")
tor = TCPSocket.new('localhost', 19051)

def ctrl_read(fd)
	line = fd.readline.chop
	throw "short reply" if line.length <= 3

	code = line[0..2]
	spec = line[3..3]
	rest = line[4..line.length-1]

	if spec == " "
		msg = rest
		return { 'code' => code, 'msg' => msg };
	elsif spec == "-"
		(key,value) = rest.split('=', 2)
		return { 'code' => code, 'key' => key, 'value' => value };
	elsif spec == "+"
		throw "In line #{line} I epexcted the last char to be a =" unless rest[rest.length-1..rest.length-1] == '='
		key = rest[0..rest.length-2]
		value = []
		while line = fd.readline.chop
			break if line == "."
			line = [1..line.length-1] if line[0..0] == "."
			value << line
		end
		line = fd.readline.chop
		endcode = line[0..2]
		endspec = line[3..3]
		endmsg = line[4..line.length-1]
		throw "Expected line #{line} to be an EndReplyLine with status=#{code} after DataReplyLine(s)" unless endcode == code
		throw "Expected line #{line} to be an EndReplyLine with spec=' ' after DataReplyLine(s)" unless endspec == ' '
		return { 'code' => code, 'key' => key, 'value' => value, 'msg' => endmsg };
	end
end

tor.print "AUTHENTICATE \"#{PASSWORD}\"\r\n"
reply = ctrl_read(tor)
throw "Unexpected reply #{reply.to_yaml}" unless reply['code'] == '250'

Thread.new do
	detailed = 100
	while true do
		tor.print "GETINFO dir-usage\r\n"
		if detailed > 0
			sleep 1
			detailed = detailed - 1
		else
			sleep 5
		end
	end
end

while true do
	r = ctrl_read(tor)
	if r['code'] == '250' and r['key'] == 'dir-usage'
		now = Time.now
		v = r['value'].kind_of?(Array) ? r['value'].join("\n") : r['value'].to_s
		s = now.utc.strftime("%s  %Y-%m-%d %H:%M:%S\n") + v + "\n\n"
		puts s
		outfile.puts s
		outfile.flush
	else
		STDERR.puts("Cannot handle XX\n" + r.to_yaml+"\nXXXX\n");
	end
end


tor.print "QUIT\r\n"
reply = ctrl_read(tor)
throw "Unexpected reply #{reply.to_yaml}" unless reply['code'] == '250'


More information about the tor-dev mailing list