[OPENDJ-7291] Provide a debugging tool for connecting a proxy to backend servers Created: 17/Jun/20  Updated: 23/Jun/20

Status: Dev backlog
Project: OpenDJ
Component/s: ease of use, proxy
Affects Version/s: 7.0.0
Fix Version/s: None

Type: New Feature Priority: Major
Reporter: Mark Craig Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
is related to OPENDJ-7207 Retest the distribution example Done

 Description   

Not really sure what to ask for. Here's what it was this time.

When I end up with something like this:

{
  "eventName": "DJ-LDAP",
  "client": {
    "ip": "127.0.0.1",
    "port": 40560
  },
  "server": {
    "ip": "127.0.0.1",
    "port": 1636
  },
  "request": {
    "protocol": "LDAPS",
    "operation": "BIND",
    "connId": 0,
    "msgId": 1,
    "version": "3",
    "dn": "uid=openam_cts,ou=admins,ou=famrecords,ou=openam-session,ou=tokens",
    "authType": "SIMPLE"
  },
  "transactionId": "39a69afe-1405-4d2a-9243-7c2bd4248e2d-2",
  "response": {
    "status": "FAILED",
    "statusCode": "49",
    "elapsedTime": 11,
    "elapsedTimeUnits": "MILLISECONDS",
    "failureReason": "Unable to bind to the Directory Server because no such user exists in the server"
  },
  "userId": "uid=openam_cts,ou=admins,ou=famrecords,ou=openam-session,ou=tokens",
  "timestamp": "2020-06-17T10:06:28.053Z",
  "_id": "39a69afe-1405-4d2a-9243-7c2bd4248e2d-4"
}

The "failureReason" never seems to be any help at all.

It's especially disappointing when you find that you can proxy with the proxy server's account on backend DSs, no problem:

$ /path/to/proxy/bin/ldapsearch \
>  --hostname localhost \
>  --port 5636 \
>  --useSSL \
>  --saslOption mech="EXTERNAL" \
>  --certNickName ssl-key-pair \
>  --keyStorePath /path/to/proxy/config/keystore \
>  --keyStorePasswordFile /path/to/proxy/config/keystore.pin \
>  --trustStorePath /path/to/proxy/config/keystore \
>  --trustStorePasswordFile /path/to/proxy/config/keystore.pin \
>  --baseDn ou=tokens \
>  --searchScope base \
>  --proxyAs "dn:uid=openam_cts,ou=admins,ou=famrecords,ou=openam-session,ou=tokens" \
>  "(&)"
dn: ou=tokens
objectClass: top
objectClass: untypedObject
ou: tokens 

It's hard enough getting that far.

What next? Try to find the proxy's failed request in the (non-human-readable) logs.

$ grep -Ri fail ds-rs-*/logs/*
$ 

The trouble is, the only indication of the problem is the error on the proxy, and the log message is not helpful. My next guess is access control, but one can spend hours in blind alleys... and that's just to get a demo set up.



 Comments   
Comment by Mark Craig [ 17/Jun/20 ]

Debug logging is not a useful answer. I tried setting up debug logging just for the code where the message seems to be, which is in org.opends.server.core.BindOperation. That got me an empty logs/debug file. When I opened it up to org.opendj.server.core, I got 26527 lines.

Looking at the code, which people outside ForgeRock won't be able to do, is cheating anyway. But I am wondering why I seem to be getting a message from a local backend... when addressing what should be a proxy backend. Maybe something happened in the routing (another part of the configuration that I don't understand very well).

Comment by Mark Craig [ 23/Jun/20 ]

Back to going in circles around the drawing board this morning. This time I get 49 invalid credentials and no failure reason hint at all:

$ jq . proxy/logs/filtered-ldap-access.audit.json 
{
  "eventName": "DJ-LDAP",
  "client": {
    "ip": "127.0.0.1",
    "port": 33372
  },
  "server": {
    "ip": "127.0.0.1",
    "port": 1636
  },
  "request": {
    "protocol": "LDAPS",
    "operation": "BIND",
    "connId": 0,
    "msgId": 1,
    "version": "3",
    "dn": "uid=openam_cts,ou=tokens",
    "authType": "SIMPLE"
  },
  "transactionId": "ee9079a9-c6f4-4aef-934e-2eb2aebb092c-2",
  "response": {
    "status": "FAILED",
    "statusCode": "49",
    "elapsedTime": 32,
    "elapsedTimeUnits": "MILLISECONDS",
    "failureReason": ""
  },
  "userId": "uid=openam_cts,ou=tokens",
  "timestamp": "2020-06-23T06:36:32.589Z",
  "_id": "ee9079a9-c6f4-4aef-934e-2eb2aebb092c-4"
}

It does look like the proxy is managing to discover the DS/RSs:

[23/Jun/2020:08:36:23 +0200] category=BACKEND severity=NOTICE msgID=620 msg=Remote servers changed for the proxy backend 'distributeCts'. The proxy was using: primary servers=[], secondary servers=[]; and it will now be using: primary servers=[localhost:6636, localhost:16636], secondary servers=[]
[23/Jun/2020:08:36:23 +0200] category=BACKEND severity=WARNING msgID=613 msg=Proxy backend 'distributeCts' cannot failover: only primary servers have been discovered via the service discovery mechanism '[CTS Shard 1, CTS Shard 2]'. Primary servers are [localhost:6636, localhost:16636]
[23/Jun/2020:08:36:23 +0200] category=BACKEND severity=NOTICE msgID=620 msg=Remote servers changed for the proxy backend 'distributeCts'. The proxy was using: primary servers=[], secondary servers=[]; and it will now be using: primary servers=[localhost:5636, localhost:15636], secondary servers=[]
[23/Jun/2020:08:36:23 +0200] category=BACKEND severity=WARNING msgID=613 msg=Proxy backend 'distributeCts' cannot failover: only primary servers have been discovered via the service discovery mechanism '[CTS Shard 1, CTS Shard 2]'. Primary servers are [localhost:6636, localhost:16636]
[23/Jun/2020:08:36:23 +0200] category=BACKEND severity=WARNING msgID=613 msg=Proxy backend 'distributeCts' cannot failover: only primary servers have been discovered via the service discovery mechanism '[CTS Shard 1, CTS Shard 2]'. Primary servers are [localhost:5636, localhost:15636]
[23/Jun/2020:08:36:23 +0200] category=BACKEND severity=WARNING msgID=624 msg=Proxy backend 'distributeCts' cannot register itself against base DN uid=admin because this base DN is already registered against backend 'rootUser'
[23/Jun/2020:08:36:23 +0200] category=BACKEND severity=NOTICE msgID=621 msg=Proxy backend 'distributeCts' automatically registered itself against the base DNs [uid=admin, ou=tokens, uid=proxy]. It was previously registered against the base DNs [] 

Connecting directly to the a DS/RS and using --proxyAs works fine, but only for 1/2 the servers. So there must be something wrong with my script, or I've screwed up replication or something. So once I manage to figure that out myself and shut those servers down, the proxy fails over and I get the expected result.

This issue is not to say setting up a proxy is technically impossible. But debugging any problems is very frustrating.

Generated at Mon Sep 21 16:11:18 UTC 2020 using Jira 7.13.12#713012-sha1:6e07c38070d5191bbf7353952ed38f111754533a.