Replication code logs socket related problems as errors in most of the cases. However some of them are transient, such as the RS temporarily too busy to process a new connection, or a spurious timeout.
Replication handshake in particular tries to distinguish many cases, but most of the times, the reported problem is an error when occurs many times in a row, not necessarily when it happens once in a while.
One way to resolve the problem could be to implement a service counting how many times in the last 5 minutes or so a given message is to be logged and if it is too frequent (frequency TBD), log it as ERROR, otherwise log it as WARN or INFO.
Other errors, such as invalid certificates, are permanent errors, so correct classification of the original problem is important and tricky.