While replication is able to replay updates at around 25k ops/sec, change number indexing currently maxes out at around 12-13k changes/s.
It means that on a very high throughput sustained for a very long time, the indexer is continuously falling behind the replayed updates. This prevents purging (purge only purges indexed changes) and contributes to filling the disks.
That being said, using the change number indexer is an invalid use case for AM CTS. The CTS does not need indexing updates to tokens.
Problem seen by running four replicas system test (scenario3)
Test ran during 12h with instances stop and start
4 DJ instances running on AWS vm instances
DS2 stops more than 3 hours
- revision : c0eaa20c9df
- build id : 2018-10-18 16:12:47
- test starts: 2018-10-22 16:43:26
- load stops : 2018-10-23 05:34:00
- test stops : 2018-10-23 07:35:39
- During the run, replayed update grafana graphs shows throughput up to 25k ops/sec
- There is some activity between 5:34 and 7:00 (after ldap loads stopped)
- No more activity after 7:00
- shows activity up to 6:50
- no more activity after 6:50
- DS1 : host1 1389 : lastchangenumber: 369950257
- DS2 : host2 1389 : lastchangenumber: 307872848
- DS3 : host3 1389 : lastchangenumber: 369950257
- DS4 : host4 1389 : lastchangenumber: 369950257
Monitoring this manually, seems indexing manages 12k ops/sec
- replayed updates can manage 25k ops/sec while indexing 12k ops/sec
- indexing activity is not available in grafana dashboard