This story is intended to allow us to focus on the first part of a fairly significant project on the performance improvement epic.
The current agent logging mechanism is seen as a significant barrier to performance and stability in the agent. It is a consumer of shared memory, somewhat conflated with shared data structures that are required for more general purposes by the agent (e.g. for signalling configuration updates to worker processes).
This is an investigation task, since we don’t know the appropriate solutions at this stage, and we want to aim for a deliverable that cannot go across sprint boundaries. This task specifically will not be concerned with refactoring the agent codebase, but only with investigating alternative approaches.
We also want this to be something that can be worked on as a collaborative test/development task.
Logging is currently used for fault diagnosis as well as alerting admins to agent error events. The error events are expected to be few and far between, and not to impact performance. But when full logging is used for fault diagnosis, there are heavy demands on the agents logging mechanism. The diagnostic problems would only be alleviated to a large extent by providing performance monitoring capabilities in the agent.
We think that web agent logging mechanism is a major source of complexity and problems in support and this is a vital step towards a faster and more robust agent.
- brief comparison of ways forward, and analysis of complexity
- integration test framework to verify the ideas discussed
- ability to compare some other solution with the existing one