Affects Version/s: OpenIDM 5.0.0
Fix Version/s: OpenIDM 5.0.0
Right now we've identified the following problems :
- the startup.sh script doesn't take into account the environment variables / JAVA properties set in OPENIDM_OPTS : the precedence order defines that if the variables are defined in boot.properties then those values are used first. This makes it then very difficult to overwrite the product behavior during startup because most, if not all, properties have existing values in boot.properties ; we might actually need to think about breaking up Java options and OpenIDM specific options since they are currently mingled together (OPENIDM_OPTS="-Xmx1024m -Xms1024m" for example).
- the openidm.node.id variable in boot.properties is currently set to "node1" and we expect customers to update this when they deploy new instances ; there's really no reason to have this variable defined statically in a properties file for auto-scaling deployments. Instead we should use random generation of ids ; if this variable has not been specifically set in the properties file, then it is automatically generated when the new instance starts – note : we should keep the current feature of setting the node id available, but when none is provided a random id generation should get triggered.
- the configuration settings for whether to use the configuration on disk or in the repo are somewhat confusing currently as there are multiple places where those properties get changed ; we should make it easier to choose which strategy is being used and provide clear documentation on which one we recommend for automated deployments.
- there is no way to separate the location of the configuration from the user repo or to version the configuration itself : this is a potential limitation for new configuration roll-out.
- there is no way to use a different provider for configuration store (ex : etcd, zookeeper, consul, doozerd, etc.)
- we're tip-toeing between on disk configuration and repository based configuration : while the repository configuration seem to be the most relied upon, we've never really provided a good cli tool to support remote configuration of this central configuration repository ; in recent months also, we've seen the emergence of immutable servers and on-disk configuration, which, coupled with a proper CD / CI strategy seem to be preferable vs. the ability to store configuration in a central repository.
Please refer to the the 12 Factor App guidelines in order to understand certain concepts prevalent in the CD / CI world, especially when it comes to using Docker / Kubernetes for deployments.
We need to understand how our customers deploy the product and what their current pain points are. While using Docker / Kubernetes isn't necessarily the right choice for all customer (not that there is one right choice !), if forces us to think in terms of what the product's limitations are and how to resolve them. Alleviating those issues should benefit any CD / CI tool regardless of the technology chosen.
While the processes to manage application deployment vary widely between companies, there are a few high level categories :
- Blue-Green deployments or Canary Releases : those are deployments that allow 2 (or more!) versions of the application to co-exist in the production environment. There is no perceived down-time usually, unless a database migration is necessary which might introduce a service degradation.
- Legacy deployments with planned downtime : those are deployments which go through the usual flow of dev / staging / production but require a planned downtime to migrate the existing servers over to the new version of the application. There's a change process involved, with roll-backs, etc. But end-users experience a service outage.
- Wild West deployments : the name should be sufficient !
The ultimate goal for the Dev-Ops effort is to make it easy to create immutable OpenIDM servers. Scaling, upgrading and phasing out servers involved in typical applications like registration or reconciliation can then be fully-automated.
Immutable servers are created from a fully deployable artifact that's been tested and that is identical for all instances. As such the configuration from which the server operates is traditionally version controlled (to guarantee that all changes are tracked and validated).
This can be achieved by either local or remote storage of the configuration ; though when you mix in Blue-Green or Canary deployments, versioning becomes critical. This is offered for free when using file based configuration coming from a VCS system, such as git, or has to be implemented when storing configuration in a remote repository.
For example leveraging a Canary deployment would require that the configuration is either versioned in the remote repository or pointing to 2 different remote repositories – things we can't do today.
As such the aim for this Epic is to favor file based deployments and maximize what we can do with today's features around multi-instance deployments.
While the end result here is not to create Docker images of OpenIDM (not just yet), we want to make sure we enable that effort. In that regard, we want to be able to configure some of the most important variables needed during startup via environment variables. For example : port number, instance type, location of the keystore, keystore password, host and port number of the remote repository, etc. Pretty much all the variables found in the boot.properties file must be configurable via environment variables.
Currently, if those variables are set in the boot.properties, then it is not possible to pass those environment variables to influence the boot behavior : the values found in the properties file take precedence over the environment variables.
We currently only have the following flags for the instance type :
Those properties ONLY affect the strategy for configuring the keys in the keystore and whether or not to populate the repository with it. They have no bearing on the cluster behavior or scheduler. In order to rely on an "infrastructure-provided" keystore, for a multi-instance deployment, we currently have to set the instance type to standalone. This is at best confusing.
There are also limitations that were introduced when the keystore population of the repo was introduced : specifically the fact that we can' encrypt the database password. This entire area has to be re-architected to provide a more fluid behavior.
As mentioned earlier, relying on a serial instance ID (node1, node2, etc.) doesn't really make sense for deployment where instances come and go (or belong to different versions). As such using randomized unique ids (read RFC 4122) helps greatly to avoid collisions during boot time : the openidm.node.id property can be defined through an environment variable and set during the creation of the instance by the infrastructure. But this imposes a new constraint to the infrastructure as it needs to provide the generation of that identifier.
As an alternative, the instance should be able to generate its own uuid if none was provided and store that information locally – reboot are usually not allowed for immutable servers (they are just replaced by a new one if the process id fails), so persisting that value might be only beneficial to legacy deployments.
In a Canary deployment, relying on a centralized repo-based configuration is almost impossible unless we're able to use versioning or to point to a different repo (only for config). Since this isn't possible today, we should be able to configure a Docker image via the --project-location (or -p) and each instance should simply rely on their local file configuration, rather than the repository.
This is possible today but requires a little more documentation for the different flags to use. However the configuration part of the Admin UI completely looses its value-add in this context. This is mostly fine since Canary deployments should rely on immutable servers and therefore not rely in the Admin UI to do configuration changes at run-time.
However, there are other parts of the Admin UI which are extremely useful for run-time management (such as managing users, roles, triggering reconciliation, etc.). In this context, we should still be able to use the Admin UI, but purely for management of runtime data.