[tor-commits] [metrics-lib/master] Fix a bug in recognizing bandwidth files.
karsten at torproject.org
karsten at torproject.org
Fri May 3 06:40:58 UTC 2019
commit 016d49f5142561476185105ef770006d9635f91e
Author: Karsten Loesing <karsten.loesing at gmx.net>
Date: Thu May 2 20:54:53 2019 +0200
Fix a bug in recognizing bandwidth files.
We're using a regular expression on the first 100 characters of a
descriptor to recognize bandwidth files. More specifically, if a
descriptor starts with ten digits followed by a newline, we parse it
as a bandwidth file. (This is ugly, but the legacy bandwidth file
format doesn't give us much of a choice.)
This regular expression is broken. The regular expression we want is
one that matches the first 100 characters of a descriptor, which ours
didn't do.
More detailed explanation of the code change:
- We don't need to start the pattern with `^`, because the regular
expression needs to match the whole string anyway.
- The `(?s)` part enables the dotall mode: "In dotall mode, the
expression . matches any character, including a line terminator. By
default this expression does not match line terminators. Dotall
mode can also be enabled via the embedded flag expression (?s).
(The s is a mnemonic for "single-line" mode, which is what this is
called in Perl.)"
- We need to end the pattern with `.*` to match any characters
following the first newline, which also includes newlines due to
the previously enabled dotall mode.
Fixes #30369.
---
CHANGELOG.md | 6 ++++++
.../java/org/torproject/descriptor/impl/DescriptorParserImpl.java | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 6a62528..aee65ea 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,9 @@
+# Changes in version 2.6.1 - 2019-05-??
+
+ * Medium changes
+ - Fix a bug in recognizing descriptors as bandwidth files.
+
+
# Changes in version 2.6.0 - 2019-04-29
* Medium changes
diff --git a/src/main/java/org/torproject/descriptor/impl/DescriptorParserImpl.java b/src/main/java/org/torproject/descriptor/impl/DescriptorParserImpl.java
index 119fe09..08ac909 100644
--- a/src/main/java/org/torproject/descriptor/impl/DescriptorParserImpl.java
+++ b/src/main/java/org/torproject/descriptor/impl/DescriptorParserImpl.java
@@ -132,7 +132,7 @@ public class DescriptorParserImpl implements DescriptorParser {
sourceFile);
} else if (fileName.contains(LogDescriptorImpl.MARKER)) {
return LogDescriptorImpl.parse(rawDescriptorBytes, sourceFile, fileName);
- } else if (firstLines.matches("^[0-9]{10}\\n")) {
+ } else if (firstLines.matches("(?s)[0-9]{10}\\n.*")) {
/* Identifying bandwidth files by a 10-digit timestamp in the first line
* breaks with files generated before 2002 or after 2286 and when the next
* descriptor identifier starts with just a timestamp in the first line
More information about the tor-commits
mailing list