[tor-bugs] #6450 [Metrics Utilities]: Task #6329 Python script can't encode unicode characters
Tor Bug Tracker & Wiki
torproject-admin at torproject.org
Mon Jul 23 12:52:36 UTC 2012
#6450: Task #6329 Python script can't encode unicode characters
-------------------------------+--------------------------------------------
Reporter: karsten | Owner:
Type: defect | Status: new
Priority: minor | Milestone:
Component: Metrics Utilities | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------------+--------------------------------------------
Today I found that `tail` and `less` are unhappy about the task #6329
script printing out unicode characters. When piping its output into
`tail` or `less`, the script exits with a traceback. When writing to
stdout directly, Python is happy.
Here's how to reproduce the problem:
- Clone the metrics-tasks repository.
- Navigate to the #6329 script and make it download required data: `cd
task-6329/; ./tor-relays-stats.py -d`
- Find a unicode character in an AS name: `grep -B1 "as_name.*\\\\u"
details.json`
- Display relays in that AS, e.g. AS28548: `./tor-relays-stats.py -i -a
28548 | tail`
Python should print out the following traceback:
{{{
Traceback (most recent call last):
File "./tor-relays-stats.py", line 197, in <module>
short=70 if options.short else None)
File "./tor-relays-stats.py", line 110, in print_groups
print formatted_group[:short]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in
position 144: ordinal not in range(128)
}}}
I found that a possible solution is to replace all Unicode characters with
'?'s, but that doesn't seem very elegant:
{{{
- exit, guard, country, as_number, as_name)
+ exit, guard, country, as_number,
as_name.encode('ascii', 'replace'))
}}}
Are there better solutions?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6450>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list