[tor-bugs] #2687 [Torperf]: Update filter.R to parse Torperf's new .mergedata format
Tor Bug Tracker & Wiki
torproject-admin at torproject.org
Tue Apr 26 16:11:47 UTC 2011
#2687: Update filter.R to parse Torperf's new .mergedata format
-------------------------+--------------------------------------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: needs_review
Priority: major | Milestone:
Component: Torperf | Version:
Keywords: | Parent:
Points: 4 | Actualpoints:
-------------------------+--------------------------------------------------
Comment(by rransom):
Replying to [comment:14 tomb]:
> W/r/t adding all the data into a big vector, this may be the problem,
and it is what I meant when I said above that "I have some ideas" about
where to look for the problem.
>
> If R makes any kind of sense at all this could not possibly be O(n^2^)
since it is tail recursion. If list insertion at the tail of a vector is
any more than adding a pointer to the already allocated memory for the
line, then R is not the right choice. This is not impossible. My
experience with R like languages makes me intuit that insertion at the end
of a vector is a pointer operation, but I have not yet gotten the chance
to learn the details of R that are not listed in the language
specification which doesn't say. I know that it is constant time in other
list oriented languages such as LISP, Haskel, and SML.
'''NO.'''
A vector is an array. By using the vector concatenation operation, you
were explicitly creating a new array in each iteration, containing a copy
of the preceding array and one additional element.
It is possible to build a vector from beginning to end in amortized linear
time: Store a pointer to a mutable vector and the number of elements
already stored in the vector in a mutable data structure, add each new
element to the vector by mutating the vector and element count (known in
Common LISP as a vector's ‘tail pointer’), and double the length of the
vector (its ‘capacity’ in Python; I don't know what Common LISP calls
this) every time you run out of room and need to copy all preceding
elements to a new vector.
It is also possible to build a linked list of processed lines in reverse
order: In LISP terms, replace your list with `(CONS NEW-ELEMENT OLD-
LIST)`.
Or you can build a linked list of processed lines in the same order in
which you received them, by mutating the ‘CDR’s of list cells. `lapply`
(known in Scheme as `map` and in Common LISP as `MAPCAR`) most likely does
this.
> I plan to try i/o code more in a c style with data output in small
buffer flushes rather than being in memory in a single large data
structure, _however_ I used the data structure approach because of my
general impression that this is more in the spirit of R. If I understand
correctly, R was designed to handle really large matrices, which is the
structure I chose. I think this is almost a defining feature of R.
Use the R standard library. Library functions are more likely to be
properly designed, implemented, and optimized than anything you can write
in the R interpreted language without digging into the guts of how R is
implemented.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2687#comment:15>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list