Update:
I've got most of the critical problems sorted out in preparation of a new 8.0 build. I've noticed that there are a number of SSL test suite failures, particularly on my NetBSD systems, but a number of pikefarm clients also report failures. I spent the afternoon digging around and it looks like they're all timing related.
Does anyone have an objection to me increasing the amount of time individual tests in question are allowed to run? Based on my read of the code, there is only 1 spot where doing so would make the tests run longer on (faster) systems unaffected by the problem.
That's basically a spot where a SSL server thread writes some data and immediately closes the connection. The client thread first checks to see if the SSL.File is closed before reading and fails occasionally. Adding a bit of sleep() between the write() and close() on the server thread gives the client some time to get past that check before the server closes the connection. I recognize that is not the best way to solve the problem, but I'd like to get a release candidate prepared in the next few days. I will do so and get an RC prepared tomorrow afternoon if I don't hear any objections.
Thanks in advance!
Bill
I think the client should be able to handle the server not sleeping. In the wild it's impossible for the server to know how much to sleep since it will depend on both the network connection and which client it is. Also, the client should work with non-pike servers, which might not sleep.
Is the problem that not all the data is submitted to TCP/IP, or that the client somehow aborts before receiving it from the IP stack?
Well, I think the problem is (possibly) more a quirk of how the test is written than an indication of an actual problem:
The test in question uses SSL.File in blocking mode, and not through the TCP stack.
The client and server SSL.File are created, connected by a pipe and then connect/accept() ed. A thread for each is started. The server thread write()s its data followed immediately by close(). The client thread first checks is_open() before reading any data, and this is where the tests are failing. I am reasonably sure (but have not tested beyond scanning the SSL debug) that there is data is sitting in the SSL.File to be read regardless of whether the “network side” is open or not.
What the client side of the test seems to be trying to check is that the handshake worked and that the SSL.File is able to communicate with the server. It does appear to be working, it’s just all happening too fast for the client side to detect that. A secondary part of the problem is the facade that SSL.File is just like a regular Stdio.File is breaking down a bit: is_open() is false, but it’s really more like “is open for writing” but true for “is open for reading”.
Thoughts?
On Dec 3, 2020, at 2:18 AM, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum 10353@lyskom.lysator.liu.se wrote:
I think the client should be able to handle the server not sleeping. In the wild it's impossible for the server to know how much to sleep since it will depend on both the network connection and which client it is. Also, the client should work with non-pike servers, which might not sleep.
Is the problem that not all the data is submitted to TCP/IP, or that the client somehow aborts before receiving it from the IP stack?
The client thread first checks is_open() before reading any data, and this is where the tests are failing.
What is "is_open()" expected to return in this case? Should it return nonzero until all pending data has been fetched by the application, or is the application expected to fetch the data before checking is_open()? But how can you then know if you can fetch more data or not?
It seems reasonable that you would call is_open() first, and if it returns nonzero then you call read() (which will throw an exception if close_state > STREAM_OPEN, hence the need to check first). The pikedoc mentions is_open() being able to return 2 sometimes in the nonblocking case. I'm not sure if it's a completely equivalent scenario, but the same mechanism could be used. Another option would of course be to call read() anyway but catch the exception.
If the testsuite can't do this in a correct and non-klugey way, then real code will not be able to either...
Also, in case the read() function throws the exception without returning the data, that's would be a bug as well.
pike-devel@lists.lysator.liu.se