[metrics-bugs] #21354 [Metrics/Onionoo]: Make Onionoo more memory-efficient
Tor Bug Tracker & Wiki
blackhole at torproject.org
Mon Jan 30 21:32:13 UTC 2017
#21354: Make Onionoo more memory-efficient
---------------------------------+--------------------------
Reporter: karsten | Owner: metrics-team
Type: enhancement | Status: new
Priority: Medium | Milestone:
Component: Metrics/Onionoo | Version:
Severity: Normal | Keywords:
Actual Points: | Parent ID:
Points: | Reviewer:
Sponsor: |
---------------------------------+--------------------------
When I re-imported 2 years of bridge network statuses to fix #20994, I
noticed that this is only possible when importing descriptors in batches
of a few months. This is pretty bad, because it also means that we
wouldn't be able to start a new Onionoo instance from scratch. And a new
operator might not even figure out what's wrong and assume that Onionoo
contains a serious bug. (Maybe not a totally unreasonable assumption.)
Today I found the reason for this problem: `UptimeStatusUpdater` stores
sets of `Long` values of bridge network status publication times, and that
simply does not scale for more than a few months of statuses.
Unfortunately, we need to store all these publication times in memory,
because opening and updating status files for each fingerprint contained
in a status also does not scale, for a different reason.
But we can do better here. We can apply the same trick that we're using
for storing relay flag strings, that is, resolve strings to indexes using
a single, static map, and store indexes in a bitset rather than strings in
a set. Here's a possible patch:
{{{
diff --git
a/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
b/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
index 2b5f5fc..6280902 100644
---
a/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
+++
b/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
@@ -86,14 +86,18 @@ public class UptimeStatusUpdater implements
DescriptorListener,
}
}
+ private static Map<Long, Integer> dateHourMillisToIndex = new
HashMap<>();
+
+ private static Map<Integer, Long> dateHourIndexToMillis = new
HashMap<>();
+
private SortedMap<Long, Flags> newRelayStatuses = new TreeMap<>();
private SortedMap<String, SortedMap<Long, Flags>>
newRunningRelays = new TreeMap<>();
- private SortedSet<Long> newBridgeStatuses = new TreeSet<>();
+ private BitSet newBridgeStatuses = new BitSet();
- private SortedMap<String, SortedSet<Long>>
+ private SortedMap<String, BitSet>
newRunningBridges = new TreeMap<>();
private void processRelayNetworkStatusConsensus(
@@ -123,13 +127,18 @@ public class UptimeStatusUpdater implements
DescriptorListener,
if (!fingerprints.isEmpty()) {
long dateHourMillis = (status.getPublishedMillis()
/ DateTimeHelper.ONE_HOUR) * DateTimeHelper.ONE_HOUR;
+ if (!dateHourMillisToIndex.containsKey(dateHourMillis)) {
+ dateHourIndexToMillis.put(dateHourIndexToMillis.size(),
dateHourMillis);
+ dateHourMillisToIndex.put(dateHourMillis,
dateHourMillisToIndex.size());
+ }
+ int dateHourIndex = dateHourMillisToIndex.get(dateHourMillis);
for (String fingerprint : fingerprints) {
if (!this.newRunningBridges.containsKey(fingerprint)) {
- this.newRunningBridges.put(fingerprint, new TreeSet<Long>());
+ this.newRunningBridges.put(fingerprint, new BitSet());
}
- this.newRunningBridges.get(fingerprint).add(dateHourMillis);
+ this.newRunningBridges.get(fingerprint).set(dateHourIndex);
}
- this.newBridgeStatuses.add(dateHourMillis);
+ this.newBridgeStatuses.set(dateHourIndex);
}
}
@@ -140,17 +149,19 @@ public class UptimeStatusUpdater implements
DescriptorListener,
this.updateStatus(true, e.getKey(), e.getValue());
}
this.updateStatus(true, null, this.newRelayStatuses);
- for (Map.Entry<String, SortedSet<Long>> e :
- this.newRunningBridges.entrySet()) {
+ for (Map.Entry<String, BitSet> e : this.newRunningBridges.entrySet())
{
+ BitSet dateHourIndexes = e.getValue();
SortedMap<Long, Flags> dateHourMillisNoFlags = new TreeMap<>();
- for (long dateHourMillis : e.getValue()) {
- dateHourMillisNoFlags.put(dateHourMillis, null);
+ for (int i = dateHourIndexes.nextSetBit(0); i >= 0;
+ i = dateHourIndexes.nextSetBit(i + 1)) {
+ dateHourMillisNoFlags.put(dateHourIndexToMillis.get(i), null);
}
this.updateStatus(false, e.getKey(), dateHourMillisNoFlags);
}
SortedMap<Long, Flags> dateHourMillisNoFlags = new TreeMap<>();
- for (long dateHourMillis : this.newBridgeStatuses) {
- dateHourMillisNoFlags.put(dateHourMillis, null);
+ for (int i = this.newBridgeStatuses.nextSetBit(0); i >= 0;
+ i = this.newBridgeStatuses.nextSetBit(i + 1)) {
+ dateHourMillisNoFlags.put(dateHourIndexToMillis.get(i), null);
}
this.updateStatus(false, null, dateHourMillisNoFlags);
}
}}}
But let's not merge this yet, as it only solves this particular problem of
importing lots of bridge network statuses, and maybe there's an ever
better way to fix this.
Let's rather look at this (possible) improvement together with earlier
improvements towards memory efficiency, and let's think about what remains
to be done.
For example, should we avoid certain data structures (like
`TreeSet/TreeMap`) and replace them with more memory-efficient data
structures?
Are there other tricks we should use more consistently?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/21354>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list