While stress testing the write scalability of the proxy distribution with a long duration test, we noticed some strange patterns in the resulting throughput.
We have 5 separate machines, one for SDK (and pyforge controller), 1 for proxy and 1 for each DJ.
The test runs an addrate against the proxy
for 4 hours with only one shard setup, then 4 hours with 2 shards and 4 hours with 3.
With one shard, the proxy sustains a pretty stable throughput of around 1.25k ops/s.
With two shards, the throughput was higher at first 3.2k through proxy (1.6k ops/s for each DJ), but then dropped a little after ~2h to 2.8k through proxy (1.4k ops/s for each DJ)
With three shards though, the throughput is much less stable and much lower at first, then stabilizes at around 2.7k through proxy (~900 ops/s per DJ), and then drops even lower to 2.1k through proxy (~700 ops/s per DJ)
We expected the throughput to scale linearly with added shards, so the throughput with 3 DJs behind the proxy should be ~3x the throughput with one shard, which is obviously not the case.
I can provide graphs both for the servers, and the machines they were running on for the whole test duration.