Hi all-
I’ve been doing some experiments with SNI and have noticed a potential problem: If SSL.Context->find_cert_domain() doesn’t return a matching certificate, the server handshake doesn’t complete. This causes the client to hang indefinitely. SSL.Context->add_cert() supports (but does not require) a fallback certificate to be specified. Because the server does not control the value it receives, an improperly configured SSL Context could cause a denial of service.
I propose to change SSL.ServerConnection so that an alert message is sent if a certificate cannot be chosen. Does that seem like a reasonable approach? If I don’t hear any objections in the next day or so, I’ll add this to master and 8.0 (where the problem definitely exists).
Bill
Hi all-
Just wanted to provide an update on this:
I've worked around the problem in my own code, but there definitely seems to be a problem. When find_cert_domain doesn't find a matching CertificatePair, the cipher suite selection process basically results in there being no options, and a fatal alert is sent (and one would expect that the connection would then be closed). However, that alert message never makes it out on the wire and the connection is left open despite SSL3.File thinking that things are closed. I've attached the SSL3 debug output on the server (an up-to-date illumos system running pike 8.0.702) as well as the Wireshark capture from the client (a Darwin system running 10.13.something) for a sample connection that demonstrates the problem.
It seems that the write is being correctly queued, and the write callback is installed but never called before the connection is torn down. Anyone have any thoughts before I dig deeper?
No. Time Source Destination Protocol Length Info 1 0.000000 192.168.1.143 1.2.3.4 TCP 78 52000 → 443 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=32 TSval=2183885393 TSecr=0 SACK_PERM=1 2 0.038286 1.2.3.4 192.168.1.14.3. TCP 74 443 → 52000 [SYN, ACK] Seq=0 Ack=1 Win=32806 Len=0 SACK_PERM=1 TSval=1575895158 TSecr=2183885393 MSS=1460 WS=32 3 0.038362 192.168.1.143 1.2.3.4 TCP 66 52000 → 443 [ACK] Seq=1 Ack=1 Win=131744 Len=0 TSval=2183885430 TSecr=1575895158 4 0.042588 192.168.1.143 1.2.3.4 TLSv1 583 Client Hello 5 0.080364 1.2.3.4 192.168.1.143 TCP 66 443 → 52000 [ACK] Seq=1 Ack=518 Win=1049792 Len=0 TSval=1575895200 TSecr=2183885434 6 61.314005 192.168.1.143 1.2.3.4 TCP 54 [TCP Keep-Alive] 52000 → 443 [ACK] Seq=517 Ack=1 Win=131744 Len=0 7 61.352194 1.2.3.4 192.168.1.143 TCP 66 [TCP Keep-Alive ACK] 443 → 52000 [ACK] Seq=1 Ack=518 Win=1049792 Len=0 TSval=1575956474 TSecr=2183885434 ...
No certificates. No common suites! SSL.Connection->send_packet: type 21, pri 1, "\2(" [thr:1,fd:45] ssl_read_callback: calling queue_write [thr:1,fd:45] queue_write: conn: SSL.ServerConnection(handshaking|local_failing), write_buffer: 0 SSL.Connection: writing packet of type 21, "\2(" SSL.Context->purge_session: "\210\236\342\35\17J\M0-S\216r\316\211\374T\342\267\301\204\264\311\252h\267""1\\325\5U\5" [thr:1,fd:45] queue_write: Got 7 bytes to write (7 bytes buffered) [thr:1,fd:45] queue_write: Connection closed abruptly [thr:1,fd:45] queue_write: Install the write callback. [thr:1,fd:45] queue_write: Returning -1 (7 bytes buffered) [thr:1,fd:45] ssl_read_callback: Got abrupt remote close - simulating System.EPIPE SSL.Context->purge_session: "" [thr:1,fd:45] ssl_read_callback: Got close packet [thr:1,fd:45] SSL.File->direct_write: Removing read/close_callback. [thr:1,fd:45] poll: handshaking|local_fatal|peer_closed [thr:1,fd:45] SSL.File->poll: Removing user close_callback. [thr:1,fd:45] SSL.File->set_close_callback (0) [thr:1,fd:45] ssl_read_callback: Calling close callback /opt/caudium/server/protocols/ssl3()->end (error Software caused connection abort) [thr:1,fd:45] SSL.File->set_close_callback (0) [thr:1,fd:45] SSL.File->set_read_callback (0) [thr:1,fd:45] SSL.File->set_blocking() [thr:1,fd:45] SSL.File->write (object[50]) [thr:1,fd:45] SSL.File->internal_write(object[50]) [thr:1,fd:45] SSL.File->write: Propagating old callback error: Software caused connection abort [thr:1,fd:45] SSL.File->close (0, 0, 0) [thr:1,fd:45] SSL.File->close: Already closed (0) SSL3:destroy()
n 2020-10-06 11:43, H William Welliver wrote:
Hi all-
I’ve been doing some experiments with SNI and have noticed a potential problem: If SSL.Context->find_cert_domain() doesn’t return a matching certificate, the server handshake doesn’t complete. This causes the client to hang indefinitely. SSL.Context->add_cert() supports (but does not require) a fallback certificate to be specified. Because the server does not control the value it receives, an improperly configured SSL Context could cause a denial of service.
I propose to change SSL.ServerConnection so that an alert message is sent if a certificate cannot be chosen. Does that seem like a reasonable approach? If I don’t hear any objections in the next day or so, I’ll add this to master and 8.0 (where the problem definitely exists).
Bill
On 2020-11-11 18:12, william@welliver.org wrote:
Hi all-
Just wanted to provide an update on this:
I've tracked down the root of the problem, and it's more concerning than I originally realized, so I'll restate the problem in terms of what I now know to be happening (verified in 8.0.702 and 8.0/HEAD:
If you create an SSLFile object and set the object to non-blocking mode before handshaking completes, any alert messages will cause the following to occur:
1. state of the connect to be set to local_fail 2. the close callbacks on the user side will be called 3. the outbound connection will be left open 4. no alert messages will actually be sent (though they are queued internally)
The pike end user will not be aware that the network connection is still open, and further, the connection cannot be closed through the SSLFile api, as the internal state indicates the connection is already closed. The connected peer will remain connected until they (hopefully) time out. I have mostly been concerned with server mode SSLFile, but I suspect the client behavior would be similar.
Critically, everything works happily with the SSLFile set to non-blocking mode before accept() is called, so long as no alerts are generated during the handshake. It's possible that a race condition could enable this even if the order is switched. I am working around the problem by waiting to set the connection non-blocking in the accept callback.
My initial statement-of-problem suggested that the cause was limited to configuration error, but I now think the reality is that a denial of service situation could be caused by a client that advertised cipher preferences unsupported by the pike SSL peer.
Ultimately, I think the proper solution is above my pay grade, but I have verified that the alert message makes it into the alert queue, and that the write callback on the stream is set, but never gets called. I believe this is the source of the problem.
Any thoughts or suggestions are welcome!
Bill
pike-devel@lists.lysator.liu.se