instrumenting client downloads
Peter Palfrader
peter at palfrader.org
Mon Jun 16 22:01:31 UTC 2008
Hi,
in order to confirm our idea of how much a Tor client needs to download
to work, and what it spends these bytes on we have added code to the
Tor client that lets us learn real numbers.
To enable this code you have to build the current 0.2.1.x tree in svn
with --enable-instrument-downloads passed to the configure script.
This makes the Tor process keep track of, among other things, how many
download requests it completed for each of certs, server descriptors and
consensuses and how large they were in total. It also keeps track of
how many EXTEND and how many CREATE (FAST) cells it sends.
If your Tor is built with this support enabled you can get the numbers
via Tor's control port with doing a "GETINFO dir-usage", or, if your
Tor process has an open DirPort, at /tor/bytes.txt.
The data you'll get there looks like this:
| cell: create fast 271 271
| cell: extend 347 347
| dl/cert.z 9804 1
| dl/consensus 1865179 13
| dl/server 4285272 84
This Tor client sent, up to this point, 271 CREATE_FAST cells, 347 EXTEND
cells and it downloaded 9804 bytes of certificates in one request,
1865179 consensus documents in 13 requests, and 4285272 bytes of server
descriptors in 84 requests. The number of bytes includes only the size
of the actual document bodies, it does not include any protocol overhead
at any level (tor/http/tcp/ip).
http://asteria.noreply.org/~weasel/Tor/louise-stats-1.gz has some
numbers for a 0.1.2.1-trunk (around June 9th) Tor client, on a fast
network. The Tor client was busy fetching a website every 5 minutes.
http://asteria.noreply.org/~weasel/Tor/tor-client-download-stats-longterm-cells.jpg
http://asteria.noreply.org/~weasel/Tor/tor-client-download-stats-startup-cells.jpg
http://asteria.noreply.org/~weasel/Tor/tor-client-download-stats-longterm-dl.jpg
http://asteria.noreply.org/~weasel/Tor/tor-client-download-stats-startup-dl.jpg
visualize them somehow.
The data shows that in the 19 hours the Tor client has run it fetched a
consensus 13 times, and that each consensus fetch was followed by a
download of all the referenced descriptors. Each consensus was around
140kb in size (implemented proposal 138 shrinks them to ~90kb, and
proposal 140 will, once implemented, bring updates to the consensus down
to 13kb per hour). Very roughly 150 kilobytes of server descriptors
had to be downloaded per hour on average to keep up with changes.
In its 19 hour lifetime this particular client sent 271 create fast
cells and 347 extend cells. An average server descriptor currently
appears to be a bit over 2 kilobytes in size. Even if we had downloaded
a 4kb descriptor for every single extend cell we sent we would only have
used about 1.4 megabytes, or roughly the capacity of one of those
ancient 3.5" high density floppy disks.
(Create_fast cells are most often used when doing encrypted dir
requests are done and in that case we wouldn't need to download a server
descriptor for the dirserver (also, we would probably not make that many
requests with proposal 141 anyway). They are also used for the first
hop of a tor circuit, i.e. when connecting to our Guard node. In those
cases we'd have to request the descriptor once but could then cache it.)
Yours,
weasel
PS: Attached is the script I used for dumping the data every 5 seconds.
It expects to read the controller password from a file called
"controller-password" in its working directory, and will dump the stats
to the file given as its first and only argument. Control port number
is hardcoded to 19051, easy to change in the source.
-------------- next part --------------
#!/usr/bin/ruby
# Copyright (c) 2005, 2006, 2007, 2008 Peter Palfrader
require 'socket'
require 'yaml'
require 'thread'
Thread.abort_on_exception = true;
PASSWORD = File.new("controller-password").read.chop
outfile = File.new(ARGV[0], mode="a")
tor = TCPSocket.new('localhost', 19051)
def ctrl_read(fd)
line = fd.readline.chop
throw "short reply" if line.length <= 3
code = line[0..2]
spec = line[3..3]
rest = line[4..line.length-1]
if spec == " "
msg = rest
return { 'code' => code, 'msg' => msg };
elsif spec == "-"
(key,value) = rest.split('=', 2)
return { 'code' => code, 'key' => key, 'value' => value };
elsif spec == "+"
throw "In line #{line} I epexcted the last char to be a =" unless rest[rest.length-1..rest.length-1] == '='
key = rest[0..rest.length-2]
value = []
while line = fd.readline.chop
break if line == "."
line = [1..line.length-1] if line[0..0] == "."
value << line
end
line = fd.readline.chop
endcode = line[0..2]
endspec = line[3..3]
endmsg = line[4..line.length-1]
throw "Expected line #{line} to be an EndReplyLine with status=#{code} after DataReplyLine(s)" unless endcode == code
throw "Expected line #{line} to be an EndReplyLine with spec=' ' after DataReplyLine(s)" unless endspec == ' '
return { 'code' => code, 'key' => key, 'value' => value, 'msg' => endmsg };
end
end
tor.print "AUTHENTICATE \"#{PASSWORD}\"\r\n"
reply = ctrl_read(tor)
throw "Unexpected reply #{reply.to_yaml}" unless reply['code'] == '250'
Thread.new do
detailed = 100
while true do
tor.print "GETINFO dir-usage\r\n"
if detailed > 0
sleep 1
detailed = detailed - 1
else
sleep 5
end
end
end
while true do
r = ctrl_read(tor)
if r['code'] == '250' and r['key'] == 'dir-usage'
now = Time.now
v = r['value'].kind_of?(Array) ? r['value'].join("\n") : r['value'].to_s
s = now.utc.strftime("%s %Y-%m-%d %H:%M:%S\n") + v + "\n\n"
puts s
outfile.puts s
outfile.flush
else
STDERR.puts("Cannot handle XX\n" + r.to_yaml+"\nXXXX\n");
end
end
tor.print "QUIT\r\n"
reply = ctrl_read(tor)
throw "Unexpected reply #{reply.to_yaml}" unless reply['code'] == '250'
More information about the tor-dev
mailing list