[OPENDJ-7169] Apparent issue with failover functionality in rest2ldap Created: 04/May/20 Updated: 15/Jul/20 Resolved: 02/Jul/20
|Reporter:||Dirk Hogan||Assignee:||Yannick Lecaillez|
|Resolution:||Not a defect||Votes:||0|
|Epic Link:||Bugs 7.0|
Jake Feasel and I did testing on a GKE-deployed IDM instance with two DS instances, each configured as both replication and directory servers, in active-passive setup (one specified in primaryLdapServers, one in secondaryLdapServers). Load was generated with JMeter at ~60 user creates per second. Then the pod hosting the primary DS instance was killed. JMeter recorded ~10 failures following the initial pod kill, and again another ~10 failures when Kubernetes restored the pod. These failures were reproducible.
JMeter reported a NoHttpResponseException (stack trace below). Note that no logs were recorded corresponding to the failures in IDM. This means that either:
If it is important to make this distinction, I could surround the invocation of the repo layer for managed user creations with a try-finally, and increment an AtomicInteger prior to the repo layer invocation, and decrement it in the finally, and then reproduce the issue with Jake. A zero-valued AtomicInteger would indicate an unexpected exception, and a greater-than-0-valued AtomicInteger would indicate a repo-layer/rest2ldap call which simply did not return.
|Comment by Matthew Swift [ 05/May/20 ]|
Flagging as critical for 7.0 since this bug could leave client applications hanging as well as triggering resource leaks.
|Comment by Yannick Lecaillez [ 02/Jul/20 ]|
It appears that IDM, rest2ldap and DS SDK operates correctly.
The NoHttpResponseException seems to be caused by the eng-shared nginx-ingress controller. Indeed, once IDM returns 503 to the ingress, it automatically closes the connection which might contains more pending request due to keep-alive/pipelining.
The failing requests reported by jmeter are not even received by IDM.
When the test is run directly against IDM effectively bypassing the ingress (thanks to a port-forwarding) we can see that jmeter receives few 503 responses during the fail-over, as expected.
Note that I did not find any evidence on Internet about this weird behavior of nginx ingress controller. This is effectively a deduction from pure empirical testing.
|Comment by Dirk Hogan [ 13/Jul/20 ]|
Yannick Lecaillez Just for my understanding: I was under the impression, when I filed this JIRA, that you and the DS team expected zero failures from DS during the cutover from active->passive and back. It sounds like this impression was incorrect: we should expect a certain, relatively small number of 503 responses. You were concerned about the lack of responses in Jmeter and the lack of errors in IDM - and you are hypothesizing that this can be attributed to the way the nginx-ingress handles 503 responses. Am I understanding this correctly?
|Comment by Jean-Noël Rouvignac [ 15/Jul/20 ]|
Exactly and this is what Yannick investigated.
The impression was correct: there is no bug is DS according to Yannick's investigations.
It looks like a bug/feature/unexpected behaviour in the nginx ingress controller.
There is nothing more than can be done DS side, hence why this bug is fixed as "not a defect".
|Comment by Dirk Hogan [ 15/Jul/20 ]|
Seeing as IDM simply dispatches requests to rest2ldap, where do you see the source of these 503 responses?
|Comment by Yannick Lecaillez [ 15/Jul/20 ]|
It is expected that rest2ldap returns few 503 on failover for non idempotent request like add (which is the case here).
The issue was created because Ttere was a suspicion of non answered requests by rest2ldap during failover.
Some requests are indeed not answered but this is due to nginx controller. When IDM returns 503 (effectively forwarding the 503 returned by rest2ldap) it seems that nginx is surprisingly closing all connections to that host. That's why from rest2ldap/IDM all seems to work fine while on client side some requests are not answered: nginx effectively closed the underlying connection as a result of this 503 returned by IDM.