Now that the Load-Balancing infrastructure in the SDK performs the balancing on a per-request basis, we're opening a connection for each requests and close it once the request is done.
To optimize things we already have a CachedConnectionPool which keep opened a set of connections which can be used to send these requests without having to create a connection for each requests.
Still, because of the connect -> request -> close programming model, we end to send only one request per connection. Sending multiple concurrent request to the same server would then require the same number of connection to that server.
This has some significant drawbacks:
- A significant number of connections is required to be able to take advantage of the parallel processing done by the server. This significant number of connections creates processing overhead on both client and server-side.
- Because each connection only have one request at a time, only small TCP packets are exchanged on the network.
To solve such problems we propose to create a SharedConnectionPool which will acts like a CachedConnectionPool with the major difference that connect() will not remove the connection from the pool. That is, a subsequent invocation of connect() could return the same connection instance multiple times.
It might be interesting to test different algorithm on how the connections get balanced and measure the impact (latency and throughput):
- Least-request: the connection returned is the one having the least number of concurrent requests.
- Leaky bucket: Maximize connection usage by returning the same connection until it reach a pre-configured number of concurrent requests.