[OPENIDM-6656] Random error during startup causes startup fail or longer time to get ready Created: 15/Sep/16  Updated: 19/Oct/17  Resolved: 03/Oct/16

Status: Closed
Project: OpenIDM
Component/s: Module - OSGi Container / Framework integration, Module - Repository JDBC
Affects Version/s: OpenIDM 5.0.0
Fix Version/s: OpenIDM 5.0.0

Type: Bug Priority: Critical
Reporter: Tinghua Xu Assignee: Dirk Hogan
Resolution: Fixed Votes: 0
Labels: regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

IDM 5.0.0 trunk build runs on a CentOS with JDBC repos(see it with Postgres/DB2 and etc).


Attachments: Text File log_openidm_5.0.0_revision_232bb2e_launched_on_windows_with_mssql.txt     Text File log_openidm_5.0.0_revision_e40f5f5_launched_on_linux_with_mysql.txt     File openidm0.log.0    
Issue Links:
Backport
is backported by OPENIDM-9526 Backport OPENIDM-6656: Random error d... Closed
Verified Version/s:

 Description   

From time to time, see the following errors in IDM startup. The startup is either failed or took longer time for IDM to get ready(e.g. 4 minutes).

SEVERE: Bundle: org.forgerock.openidm.audit [138] [org.forgerock.openidm.audit.filter(21)] Circular reference detected, getService returning null
Sep 15, 2016 6:04:44 PM org.forgerock.openidm.logging.LogServiceTracker logEntry
SEVERE: Bundle: org.forgerock.openidm.api-servlet [44] FrameworkEvent ERROR
org.apache.felix.log.LogException: org.osgi.framework.ServiceException: Service factory returned null. (Component: org.forgerock.openidm.audit.filter (21))
        at org.apache.felix.framework.ServiceRegistrationImpl.getFactoryUnchecked(ServiceRegistrationImpl.java:380)
...

Sep 15, 2016 6:07:01 PM org.forgerock.openidm.quartz.impl.RepoJobStore getTriggerGroupNames
WARNING: Error getting trigger group names
org.forgerock.json.resource.ServiceUnavailableException: Service is starting
        at org.forgerock.openidm.filter.ServiceUnavailableFilter.<init>(ServiceUnavailableFilter.java:43)
        at org.forgerock.openidm.servlet.internal.ServletConnectionFactory.<clinit>(ServletConnectionFactory.java:96)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at java.lang.Class.newInstance(Class.java:442)
...
Sep 15, 2016 6:07:08 PM org.forgerock.openidm.info.impl.HealthService setState
INFO: OpenIDM ready

Note: The full IDM log is attached, it's a run with Kerberized DB2.
I first saw this symptom was with master build 1785 used Postgres.



 Comments   
Comment by Laurent Bristiel [X] (Inactive) [ 23/Sep/16 ]

Brendan Miller Nicolas Philippe [X] : I have this error regularly in the automated functional tests. This is both a bit frightening for the product quality (there seems to be a real problem on startup) and making our life as QA complicated (parsing the errors for a long time and then discovering we are hitting this issue again).

I added another LOG file hoping this could help finding out what could go wrong.
See file: log_openidm_5.0.0_revision_232bb2e_launched_on_windows_with_mssql.txt

Comment by Dirk Hogan [ 23/Sep/16 ]

Looking at this briefly, I suspect this may be a startup sequencing error. As soon as we initialize quartz, it will ask the RepoJobStore for available jobs, which results in a repo call, which seems to fail because IDM is still initializing. I will be looking at OPENIDM-1202 this Mezzo, and while I am in there, I will keep this bug in mind.

Note that this addresses the RepoJobStore elements of the defect-filing/stack-trace, not the 'circular-reference' bit with audit.

Comment by Laurent Bristiel [X] (Inactive) [ 23/Sep/16 ]

Thanks Dirk Hogan, that would be very helpful!

Comment by Laurent Bristiel [X] (Inactive) [ 26/Sep/16 ]

I had the problem once more today, this time on Linux with MySQL.
I am attaching the LOG file. Maybe that will help narrow down the problem.

Comment by Dirk Hogan [ 29/Sep/16 ]

This bug is very much like OPENIDM-4388: there the issue was a race-condition between repo startup and scheduler startup. Scheduler activation will cause Quartz to ask the RepoJobStore to pull triggers from the repo. If the repo is not ready, the error results.

The current bug is a bit different, as the fix for OPENIDM-4388 ensured that the repo will be up. The failure in this case is ultimately caused by:
Caused by: org.apache.felix.log.LogException: org.forgerock.json.resource.ServiceUnavailableException: Service is starting
at org.forgerock.openidm.filter.ServiceUnavailableFilter.<init>(ServiceUnavailableFilter.java:43)

This occurs because ServletConnectionFactory initially comes up with the ServiceUnavailableFilter. This filter blocks all access until the RouterConfig component comes up, which will set the appropriate Filter list in the ServletConnectionFactory. Thus a possible fix would be to add a dependency in the SchedulerService upon the RouterConfig.

However, I'm a bit confused by the fact that the SchedulerService obtains a connection to the database which routes through the ServletConnectionFactory, which, according to the javadocs, is 'responsible for providing Connections to routing requests initiated from an external request on the api servlet'. The IDMConnectionFactory obtained by the RepoJobStore would seem to be the JsonResourceRouterService, as it looks up a IDMConnectionFactory with a openidm.router.prefix value of '/', which correlates to the JsonResourceRouterService. But it looks like the internal connection factory exposed by the JsonResourceRouterService simply wraps the ServletConnectionFactory ConnectionFactory. This explains why the 'external' filters must be traversed by an 'internal' call.

Comment by Laurent Bristiel [X] (Inactive) [ 04/Oct/16 ]

Checked OK in OpenIDM version "5.0.0-SNAPSHOT" (revision: 73fade4
No regression observed. We will see if the problem occurs again in the future (it was random)

Generated at Mon Mar 01 23:19:36 UTC 2021 using Jira 7.13.12#713012-sha1:6e07c38070d5191bbf7353952ed38f111754533a.