[tor-dev] [GSoC 2013] Status report - Searchable metrics archive
Karsten Loesing
karsten at torproject.org
Mon Aug 19 12:49:50 UTC 2013
Hi Kostas,
On 8/15/13 9:50 PM, Kostas Jakeliunas wrote:
> On Wed, Aug 14, 2013 at 1:33 PM, Karsten Loesing <karsten at torproject.org>wrote:
>
>>
>> Looks like pg_trgm is contained in postgresql-contrib-9.1, so it's more
>> likely that we can run something requiring this extension on a
>> torproject.org machine. Still, requiring extensions should be the last
>> resort if no other solution can be found. Leaving out searches for
>> nickname substrings is a valid solution for now.
>
>
> Got it.
>
> >> Do you have a list of searches you're planning to support?
>>>
>>>
>>> These are the ones that should *really* be supported:
>>>
>>> - ?search=nickname
>>> - ?search=fingerprint
>>> - ?lookup=fingerprint
>>> - ?search=address [done some limited testing, currently not focusing
>> on
>>> this]
>>
>> The lookup parameter is basically the same as search=fingerprint with
>> the additional requirement that fingerprint must be 40 characters long.
>> So, this is the current search parameter.
>>
>> I agree, these would be good to support.
>>
>> You might also add another parameter ?address=address for ExoneraTor.
>> That should, in theory, be just a subset of the search parameter.
>>
>
> Oh yes, makes a lot of sense, OK.
>
> By the way: I considered having the last consensus (all the data for at
> least the /summary document, or /details as well) be stored in memory (this
> is possible) (probably as a hashtable where key = fingerprint, value = all
> the fields we'd need to return) so that when the backend is queried without
> any search criteria, it would be possible to avoid hitting the database
> (which is always nice), and just dump the last consensus. (There's also
> caching of course, which we could discuss at a (probably quite a bit) later
> point.)
Okay.
>>> - ?running=<boolean>
>>
>> This one is tricky. So far, Onionoo looks only at the very latest
>> consensus or bridge status to decide if a relay or bridge is running or
>> not.
>>
>> But now you're adding archives to Onionoo, so that people can search for
>> a certain consensus or certain bridge status in the past, or they can
>> search for a time interval of consensuses or bridge statuses. How do
>> you define that a relay or bridge is running, or more importantly
>> included as not running?
>>
>
> Agree, this is not clear. (And whatever ends up being done, this should be
> well documented and clearly articulated (of course.))
>
> For me at least, 'running' implies the clause whether a given relay/bridge
> is running *right now*, i.e. whether it is present in the very last
> consensus. (Here's where that hashtable (with fingerprints as keys) in
> memory might be able to help: no need to run a separate query / do an inner
> join / whatnot; it would depend on whether there's a LIMIT involved though,
> etc.)
>
> I'm not sure which one is more useful (intuitively for me, the "whether it
> is running *right now*" is more useful.) Do you mean that it might make
> sense to have a field (or have "running" be it) indicating whether a given
> relay/bridge was present in the last consensus in the specified date range?
> If this is what you meant, then the "return all that are/were not running"
> clause would indeed be kind of..peculiar (semantically - it wouldn't be
> very obvious what's it doing.)
>
> Maybe it'd be simpler to first answer, what would be the most useful case?
>
>> How do you define that a relay or bridge [should be] included as not
> running?
>
> Could you rephrase maybe? Do you mean that it might be difficult to
> construct sane queries to check for this condition? Or that the situation
> where
>
> - a "from..to" date range is specified
> - ?running=false is specified
>
> would be rather confusing ('exclude those nodes which are running *right
> now* ('now' possibly having nothing to do with the date range)?
I was referring to the situation you describe. But yes, I agree that
your definition of whether a relay or bridge is running *right now* can
work here. So, never mind my question/concern, this looks fine!
> > - ?flag=flag [every kind of clause which further narrows down the
>> query
>>> is not bad; the current db model supports all the flags that Stem
>> does, and
>>> each flag has its own column]
>>
>> I'd say leave this one out until there's an actual use case.
>>
>
> Ok, I won't focus on these now; just wanted to say that these should be
> possible without much ado/problems.
Okay.
>>> - ?first_seen_days=range
>>> - ?last_seen_days=range
>>>
>>> As per the plan, the db should be able to return a list of status
>> entries /
>>> validafter ranges (which can be used in {first,last}_seen_days) given
>> some
>>> fingerprint.
>>
>> Oh, I think there's a misunderstanding of these two fields. These
>> fields are only there to search for relays or bridges that have first
>> appeared or were last seen on a given day.
>>
>> You'll need two new parameters, say, from=datetime and to=datetime (or
>> start=datetime and end=datetime) to define a valid-after range for your
>> search.
>>
>
> Ah! I wasn't paying attention here. :) Ok, all good.
Okay.
I wonder, is there a document describing the new API somewhere? If not,
do you mind creating one?
All the best,
Karsten
More information about the tor-dev
mailing list