[tor-dev] [RFC] Proposal: A First Take at PoW Over Introduction Circuits
tevador
tevador at gmail.com
Sat May 9 19:38:52 UTC 2020
On 08 May, 21:53, tevador <tevador at gmail.com> wrote:
> In particular, the following parameters should be set differently from
> Monero:
>
> RANDOMX_ARGON_SALT = "RandomX-TOR-v1"
>
> The unique RandomX salt means we do not need to use a separate salt as PoW
> input as specified in ยง 3.2.
>
> RANDOMX_ARGON_ITERATIONS = 1
> RANDOMX_CACHE_ACCESSES = 4
> RANDOMX_DATASET_BASE_SIZE = 1073741824
> RANDOMX_DATASET_EXTRA_SIZE = 16777216
>
> These 4 changes reduce the RandomX Dataset size to ~1 GiB, which allows
> the number of iteration to be reduced from 8 to 4. The combined effect of
> this is that Dataset initialization becomes 4 times faster, which is needed
> due to more frequent updates of the seed (Monero updates once per ~3 days).
>
> RANDOMX_PROGRAM_COUNT = 2
> RANDOMX_SCRATCHPAD_L3 = 1048576
>
> Additionally, reducing the number of programs from 8 to 2 makes the hash
> calculation about 4 times faster, while still providing resistance against
> program filtering strategies (see [REF_RANDOMX_PROGRAMS]). Since there are
> 4 times fewer writes, we also have to reduce the scratchpad size. I suggest
> to use a 1 MiB scratchpad size as a compromise between scratchpad write
> density and memory hardness. Most x86 CPUs will perform roughly the same
> with a 512 KiB and 1024 KiB scratchpad, while the larger size provides
> higher resistance against specialized hardware, at the cost of possible
> time-memory tradeoffs (see [REF_RANDOMX_TMTO] for details).
>
> Lastly, we reduce the output of RandomX to just 8 bytes:
>
> RANDOMX_HASH_SIZE = 8
>
> 64-bit preimage security is more than sufficient for proof-of-work and it
> allows the result to be treated as a little-endian encoded unsigned integer
> for easy effort calculation.
I have implemented this in the tor-pow branch of the RandomX repository:
https://github.com/tevador/RandomX/tree/tor-pow
Namely I have changed the API to return the hash value as an uint64_t and
made corresponding changes in the benchmark.
Benchmark example:
./randomx-benchmark --mine \
--avx2 \
--jit \
--largePages \
--nonces 10000 \
--seed 1234 \
--init 1 \
--threads 1 \
--batch
RandomX-TOR-v1 benchmark
- Argon2 implementation: AVX2
- full memory mode (1040 MiB)
- JIT compiled mode
- hardware AES mode
- large pages mode
- batch mode
Initializing (1 thread) ...
Memory initialized in 5.32855 s
Initializing 1 virtual machine(s) ...
Running benchmark (10000 nonces) ...
Performance: 2535.43 hashes per second
Best result:
Nonce: 8bc3ded34d2dcdeed9000000f95cd20c
Result: d947ceff08750300
Effort: 18956
Valid: 1
At the end, it prints out the nonce that gives the highest effort value and
validates it.
For the actual implementation in TOR, the RandomX validator should run in
a separate thread that doesn't do anything else apart from validation and
moving valid requests into the Intro Queue. This way we can reach the maximum
performance of ~2000 processed requests per second.
Finally, here are some disadvantages of RandomX-TOR:
1) Fast verification requires ~1 GiB of memory. If we decide to use two
overlapping seed epochs, each service will need to allocate >2 GiB of RAM
just to verify the PoW. Alternatively, it is possible to use the slow
mode, which requires only 256 MiB per seed, but runs 4x slower.
2) The fast mode needs about 5 seconds to initialize every time the
seed is changed (can be reduced to under 1 second using multiple
threads). The
slow mode needs about 0.1 seconds to initialize.
3) RandomX includes a JIT compiler for maximum performance. The iOS operating
system doesn't support JIT compilation, so RandomX runs about 10x slower
there.
4) The JIT compiler in RandomX is currently implemented only for
x86-64 and ARM64 CPU architectures. Other architectures will run
very slowly
(especially 32-bit systems). However, the two supported architectures
cover the vast majority of devices, so this should not be an issue.
More information about the tor-dev
mailing list