The WindowsBufferProblems

Mike Chiussi chiussi at gmail.com
Thu Jun 15 08:21:06 UTC 2006


On second thought we should have this discussion over or-dev, just so
anyone else can learn from it or share their ideas.

Sorry about the delayed response, I'm on a vampire-ish sleep schedule right now.

On 6/14/06, Ge van Geldorp <ge at gse.nl> wrote:
> Hello Mike,
>
> Thanks for your reply!
> First of all, I've seen the time skips too. I don't remember if it was
> during my experimentation or on one of my "production" Tor nodes though.
> I've seen it only two or three times.

Good! It's nice to know I'm not crazy.

Roger has upgraded this message to a warning and lowered its
reportable value (the current release doesn't report anything under
100 seconds), we'll get a chance to see how widespread this phenomena
is, maybe it's only exaggerated on my systems because of low memory or
processing power.

> Before I can even start to think about solutions to the sockets problem, I'd
> like to be able to reproduce the problem, which as I said I can't at the
> moment. So I hope you can give me some extra information allowing me to
> recreate the problem.
> First of all, which Tor and Windows versions (Home/Prof, SP2?) are you
> using? How much physical memory? When the WSAENOBUFS problems occur, how
> much NPPool are you using? How are you connected? Roughly how many Tor
> connections do you have? Are you running other network related apps at the
> same time? Would you be willing to send the output of "netstat -n" and your
> torrc file to me?

You probably haven't been online long enough. For reasons that I'm
still not clear on, Tor clients don't "trust" servers that have a
short uptime.

A trick is to open up a DirPort, this will draw a lot of connections,
my torrc is here
http://www.cdf.toronto.edu/~g4mike/torrc

Here is a netstat -n from my system taken briefly after the first
wsaenobufs incident i noticed.
http://www.cdf.toronto.edu/~g4mike/netstat


> The only problem I've been able to create is almost exhausting nppool and
> then starting Tor. This will totally exhaust nppool and then some circuits
> are closed. When I close my test app and Tor, all nppool is released (after
> the socket close timeout), while the Wiki makes it sound like the nppool mem
> is gone forever.

> I've been playing around with HKLM\SYSTEM\CurrentControlSet\Control\Session
> Manager\Memory Management\NonPagedPoolSize, which various sources (including
> Microsoft Resource Kit) claim controls the maximum size of the nppool. If
> the value is 0 (default), the system will compute a suitable max nppool
> size. Otherwise, it is the size of the nppool in bytes. However, I have been
> unable to verify that this actually works. I'd change the value, reboot and
> find the maximum nppool size unchanged.

I haven't yet experimented with the registry yet, however I don't
think that is going to help. My understanding (I might be wrong here)
is that there is a fairly large amount of space available in the NPP,
but Windows puts a limit per process with the exception of localhost
traffic. For example, when first getting into this I tried writing
aserver which accepted connections and a client which did nothing but
connect and write (all traffic was local). note, the server never read
from the clients the goal was to fill up the NPP. I didn't start
getting wsaenobufs errors until NP usage was around 40 megabytes.
However, Tor would generate wsaenobufs at around 4-5 megabyes of
usage.

-Mike

> Best regards, Gé van Geldorp.
>
> > -----Original Message-----
> > From: owner-or-dev at freehaven.net
> > [mailto:owner-or-dev at freehaven.net] On Behalf Of Mike Chiussi
> > Sent: Wednesday, June 14, 2006 19:40
> > To: or-dev at freehaven.net
> > Subject: Re: The WindowsBufferProblems
> >
> > Yes, it definitely still exists. You don't need to "reboot"
> > because Tor is able to cope with failed read/write/connects.
> > But you might as well need to, because when the NPP is full,
> > socket operations are basically useless. Something magical
> > happened in 0.1.1.x that causes wsaenobufs to not occur on
> > select(), I still haven't figured out exactly why.
> > Programming for Windows is very strange, sort of like playing
> > Jenga blindfolded.
> >
> > The solution, as mentioned on the wiki, is to implement
> > overlapped I/O in libevent. Although changing the socket
> > paradigm would have serious repercussions in the other
> > libraries, so I see it as a last resort.
> >
> > My current working solution is hack around with libevent so
> > that it uses the built in socket notification routines (see
> > WSAEventSelect() and WSAWaitForMultipleEvents()), although
> > this is proving quite difficult since these functions are
> > only intended for use with a small number of sockets, and if
> > not implemented properly become just as inefficient as select().
> >
> > An issue which seems to be seriously dampening my progress
> > are mysterious "time skips". I seem to be the only one
> > encountering them, but they occur on two different test
> > machines on two different ISPs in two different cities.  They
> > are of the form "[notice] Your clock just jumped 176 seconds
> > forward; assuming established circuits no longer work."
> > Have you noticed anything like this in your logs?
> >
> > I had done some tracing and determined that connect() was
> > blocking in the Tor win32 socketpair implementation for an
> > unknown reason (even early in execution, when the NPP wasn't
> > yet being strained). I implemented my own socketpair over the
> > weekend using non-blocking sockets hoping it would resolve
> > the issue, but alas it is still occuring, exactly where it is
> > blocking I have yet to find.
> >
> > Thanks for looking into this, feel free to get in touch with
> > me personally if you want to compare notes.
> >
> > -Mike
>
>



More information about the tor-dev mailing list