[metrics-bugs] #24384 [Metrics/Onionoo]: Decode percent-encoded characters in qualified search terms
Tor Bug Tracker & Wiki
blackhole at torproject.org
Mon Nov 27 21:49:19 UTC 2017
#24384: Decode percent-encoded characters in qualified search terms
-----------------------------+--------------------------------
Reporter: karsten | Owner: metrics-team
Type: defect | Status: needs_revision
Priority: High | Milestone: Onionoo-1.8.0
Component: Metrics/Onionoo | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-----------------------------+--------------------------------
Changes (by karsten):
* status: merge_ready => needs_revision
Comment:
Okay, I looked deeper into this issue. I'm writing down here what I found,
though I admit that some facts are only based on my memory, as I have a
few dozen browser tabs open here and have totally lost control over my
~~mind~~desktop. Anyway, here's what I found, to be treated with care:
- Relay Search encodes the space character as `%20`. So far so good, we
handled that before, and we handle that with the patch.
- Relay Search encodes `"` as `%22`, `'` as `%27`, `<` as `%3C`, and `>`
as `%3E`. We'll have to decode these in Onionoo, which we don't do yet.
- Relay Search does not encode `#` as `%23` but instead cuts off anything
starting at that character. This means we cannot search for `#` in contact
lines or other, future qualified search terms. I believe it should
percent-encode the `#` character, but I could be wrong.
- Relay Search does not encode `%` as `%25`. This leads to an
`IllegalArgumentException` in the newly patched (and not yet merged)
Onionoo code. I'm not entirely certain, but I'd think that this is illegal
on the Relay Search side. And we'll have to catch this in Onionoo and send
back a 400 status code.
- Relay Search does not encode `&` as `%26`. I think it'll have to do
this, because we're looking for the `&` character as delimiter to the next
parameter. I think the current Onionoo patch doesn't handle this very
well, by first decoding the entire query string and then splitting it up
at `&` characters which may have been `%26`'s that we decoded ourselves.
- Relay Search does not encode `+` as `%2B`. And to be clear, this is not
strictly required by the Onionoo specification. On the Onionoo side we
should treat the unencoded `+` the same regardless of whether it appears
in, say, `search=contact:bs+tor` or `contact=bs+tor`. But we don't:
- In the first case, `search=contact:bs+tor`, we treat the `+` as `+`
and not as space character and only return relays with `bs+tor` in the
contact line. This seems reasonable to me, even though we specify in the
"contact" parameter that `+` must be encoded as `%2B`. But this is the
"search" parameter.
- In the second case, `contact=bs+tor`, Jetty converts the `+` to a
space character, so that we return all relays with `bs` ''and'' with `tor`
in the contact line, which clearly outnumber those with `bs+tor` in the
contact line. I think we should fix this second case by doing the
conversion of all parameter values ourselves and not leaving that to
Jetty. If we do, we can remove the requirement to encode `+` in the
"contact" parameter. But let's focus on the Relay Search side first.
- Relay Search does not encode `/` as `%2F`. Random example: The search
for `AJQ2YvMOznqi0SYrNk/AQx94+Aw` does not even trigger a query to
Onionoo. Only if I truncate at the `/` by searching for
`AJQ2YvMOznqi0SYrNk`, I get a result. This is a bug in Relay Search that
we need to fix.
- Our unit tests in `ResponseServletTest` treat the `+` character
different than Jetty by not decoding it to a space character.
- Last but not least, when we attempted to support space characters in
the qualified search term, like `search=contact:"a b c"`, we overlooked
that this does not match only relays with `"a b c"` as contact line part
but all contact lines containing `"a"`, `"b"`, and `"c"` ''somewhere'' in
the contact line. Potentially useful, but not the use case we had in mind.
Hilarious!
How about we take this out of 1.8.0 and resolve all related issues with
more time on our hands? And I don't mean to ignore this ticket until a few
days before the next planned release. But let's take the time we need to
fix everything that needs fixing, and let's include it in the next release
when it's ready. Sound okay?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24384#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list