[tor-bugs] #26420 [Core Tor/Stem]: Testing - specify literal patterns instead of regex patterns (was: Discuss: Testing - specify literal patterns instead of regex patterns)

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Jun 20 20:29:12 UTC 2018


#26420: Testing - specify literal patterns instead of regex patterns
------------------------------------------+---------------------
 Reporter:  dmr                           |          Owner:  dmr
     Type:  enhancement                   |         Status:  new
 Priority:  Very Low                      |      Milestone:
Component:  Core Tor/Stem                 |        Version:
 Severity:  Minor                         |     Resolution:
 Keywords:  dev testing code-improvement  |  Actual Points:
Parent ID:                                |         Points:
 Reviewer:  atagar                        |        Sponsor:
------------------------------------------+---------------------
Changes (by dmr):

 * status:  needs_review => new
 * type:  task => enhancement


Old description:

> w.r.t.
> [[https://gitweb.torproject.org/stem.git/commit/?id=7711050619af1a2f8ecf4aa87f774baa5f367b3b|7711050619af1a2f8ecf4aa87f774baa5f367b3b]],
> I was planning to file this ticket anyways, so might as well now for the
> discussion.
>
> atagar linked [[https://stackoverflow.com/questions/8672754/how-to-show-
> the-error-messages-caught-by-assertraises-in-unittest-in-
> python2-7/8673096#8673096|this StackOverflow answer]] in the commit
> message.
>
> (I was a bit behind on filing this ticket, but already started doing the
> literal `re.escape()` bit in my test cases. Hence atagar's comment in the
> commit.)
>
> Anyway, here's the ticket text I had started to prep - now //slightly//
> tweaked:
> ===
> The testing codebase makes pretty extensive use of
> `unittest.TestCase.assertRaisesRegexp()`.
> An example is
> [[https://gitweb.torproject.org/stem.git/tree/test/integ/client/connection.py?id=0192b29a4784465e5f69f11ced584a54644e4a90#n36|here]]:
> {{{
>   def test_no_common_link_protocol(self):
>     """
>     Connection without a commonly accepted link protocol version.
>     """
>
>     for link_protocol in (1, 2, 6, 20):
>       self.assertRaisesRegexp(stem.SocketError, 'Unable to establish a
> common link protocol with 127.0.0.1:1113', Relay.connect, '127.0.0.1',
> test.runner.ORPORT, [link_protocol])
> }}}
> The second argument is treated as a regex pattern, so it will actually
> match more than intended - possibly leading to some false negatives
> (although unlikely in this example).
>
> The use of `unittest.TestCase.assertRaisesRegexp()` without `re.escape()`
> for a literal expression is decently common - the use of it intended for
> a regex is fairly rare (I haven't come across a test yet that I can
> recall).
>
> Having a "literal" check is possible in (at least) two ways:
> {{{
> with self.assertRaises(SomeException) as cm:
>   do_something(some_param)
> self.assertEqual(str(cm.exception), expected_literal)
> }}}
> {{{
> self.assertRaisesRegexp(SomeException, '^%s$' %
> re.escape(expected_literal), do_something, some_param)
> }}}
>
> Importantly, both of these check for //exact// string.
> Much of the codebase doesn't use `re.escape()`, and where it does, I
> didn't see any `^` or `$` (although I didn't search exhaustively), so
> that could match on substrings, also allowing for subtle bugs.
>
> So we may consider a helper class `StemTestCase(unittest.TestCase)` which
> adds an `assertRaisesLiteral()` method, to make it easier to do this. (Or
> some other means of adding that in.)
>
> We could of course take the second option with `re.escape()`, but since a
> lot of the codebase already doesn't seem to do that, it's probably quite
> easy to forget, especially the `'^%s$' % ` part.
>
> atagar: thoughts on these options? or leave things as-is / `wontfix`?
>
> (Filing this as a //task// ticket, as it's probably a discussion point
> more than anything else. I'd expect from the edge cases, there //could//
> be some defects, some enhancements.)

New description:

 ~~w.r.t.
 [[https://gitweb.torproject.org/stem.git/commit/?id=7711050619af1a2f8ecf4aa87f774baa5f367b3b|7711050619af1a2f8ecf4aa87f774baa5f367b3b]],
 I was planning to file this ticket anyways, so might as well now for the
 discussion.~~

 atagar linked [[https://stackoverflow.com/questions/8672754/how-to-show-
 the-error-messages-caught-by-assertraises-in-unittest-in-
 python2-7/8673096#8673096|this StackOverflow answer]] in the commit
 message.

 (I was a bit behind on filing this ticket, but already started doing the
 literal `re.escape()` bit in my test cases. Hence atagar's comment in the
 commit.)

 Anyway, here's the ticket text I had started to prep - now //slightly//
 tweaked:
 ===
 The testing codebase makes pretty extensive use of
 `unittest.TestCase.assertRaisesRegexp()`.
 An example is
 [[https://gitweb.torproject.org/stem.git/tree/test/integ/client/connection.py?id=0192b29a4784465e5f69f11ced584a54644e4a90#n36|here]]:
 {{{
   def test_no_common_link_protocol(self):
     """
     Connection without a commonly accepted link protocol version.
     """

     for link_protocol in (1, 2, 6, 20):
       self.assertRaisesRegexp(stem.SocketError, 'Unable to establish a
 common link protocol with 127.0.0.1:1113', Relay.connect, '127.0.0.1',
 test.runner.ORPORT, [link_protocol])
 }}}
 The second argument is treated as a regex pattern, so it will actually
 match more than intended - possibly leading to some false negatives
 (although unlikely in this example).

 The use of `unittest.TestCase.assertRaisesRegexp()` without `re.escape()`
 for a literal expression is decently common - the use of it intended for a
 regex is fairly rare (I haven't come across a test yet that I can recall).

 Having a "literal" check is possible in (at least) two ways:
 {{{
 with self.assertRaises(SomeException) as cm:
   do_something(some_param)
 self.assertEqual(str(cm.exception), expected_literal)
 }}}
 {{{
 self.assertRaisesRegexp(SomeException, '^%s$' %
 re.escape(expected_literal), do_something, some_param)
 }}}

 Importantly, both of these check for //exact// string.
 Much of the codebase doesn't use `re.escape()`, and where it does, I
 didn't see any `^` or `$` (although I didn't search exhaustively), so that
 could match on substrings, also allowing for subtle bugs.

 So we may consider a helper class `StemTestCase(unittest.TestCase)` which
 adds an `assertRaisesLiteral()` method, to make it easier to do this. (Or
 some other means of adding that in.)

 We could of course take the second option with `re.escape()`, but since a
 lot of the codebase already doesn't seem to do that, it's probably quite
 easy to forget, especially the `'^%s$' % ` part.

 ~~atagar: thoughts on these options? or leave things as-is / `wontfix`?~~

 ~~(Filing this as a //task// ticket, as it's probably a discussion point
 more than anything else. I'd expect from the edge cases, there //could//
 be some defects, some enhancements.)~~

 See especially [[comment:2|atagar's comment about implementation
 thoughts]].

--

Comment:

 Since this is no longer really a discussion, swapping some aspects of the
 ticket.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26420#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list