[tor-bugs] #2687 [Torperf]: Update filter.R to parse Torperf's new .mergedata format

Tue Apr 26 16:11:47 UTC 2011

#2687: Update filter.R to parse Torperf's new .mergedata format
-------------------------+--------------------------------------------------
 Reporter:  karsten      |          Owner:  karsten     
     Type:  enhancement  |         Status:  needs_review
 Priority:  major        |      Milestone:              
Component:  Torperf      |        Version:              
 Keywords:               |         Parent:              
   Points:  4            |   Actualpoints:              
-------------------------+--------------------------------------------------

Comment(by rransom):

 Replying to [comment:14 tomb]:

 > W/r/t adding all the data into a big vector, this may be the problem,
 and it is what I meant when I said above that "I have some ideas" about
 where to look for the problem.
 >
 > If R makes any kind of sense at all this could not possibly be O(n^2^)
 since it is tail recursion.  If list insertion at the tail of a vector is
 any more than adding a pointer to the already allocated memory for the
 line, then R is not the right choice.  This is not impossible.  My
 experience with R like languages makes me intuit that insertion at the end
 of a vector is a pointer operation, but I have not yet gotten the chance
 to learn the details of R that are not listed in the language
 specification which doesn't say.  I know that it is constant time in other
 list oriented languages such as LISP, Haskel, and SML.

 '''NO.'''

 A vector is an array.  By using the vector concatenation operation, you
 were explicitly creating a new array in each iteration, containing a copy
 of the preceding array and one additional element.

 It is possible to build a vector from beginning to end in amortized linear
 time: Store a pointer to a mutable vector and the number of elements
 already stored in the vector in a mutable data structure, add each new
 element to the vector by mutating the vector and element count (known in
 Common LISP as a vector's ‘tail pointer’), and double the length of the
 vector (its ‘capacity’ in Python; I don't know what Common LISP calls
 this) every time you run out of room and need to copy all preceding
 elements to a new vector.

 It is also possible to build a linked list of processed lines in reverse
 order: In LISP terms, replace your list with `(CONS NEW-ELEMENT OLD-
 LIST)`.

 Or you can build a linked list of processed lines in the same order in
 which you received them, by mutating the ‘CDR’s of list cells.  `lapply`
 (known in Scheme as `map` and in Common LISP as `MAPCAR`) most likely does
 this.

 > I plan to try i/o code more in a c style with data output in small
 buffer flushes rather than being in memory in a single large data
 structure, _however_ I used the data structure approach because of my
 general impression that this is more in the spirit of R.  If I understand
 correctly, R was designed to handle really large matrices, which is the
 structure I chose.  I think this is almost a defining feature of R.

 Use the R standard library.  Library functions are more likely to be
 properly designed, implemented, and optimized than anything you can write
 in the R interpreted language without digging into the guts of how R is
 implemented.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2687#comment:15>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online