Commit 20d1fa961d0059657d471d700ce63bcb3ab9d3ca mentions that there is a bug in the Stdio.Buffer support in Stdio.File()->write() (I assume). Stephen, can you explain how it is broken? Do you have a test case? Does it have something to do with the fact that you are using a subclass of Stdio.Buffer?
Arne
Arne Goedeke wrote:
Commit 20d1fa961d0059657d471d700ce63bcb3ab9d3ca mentions that there is a bug in the Stdio.Buffer support in Stdio.File()->write() (I assume). Stephen, can you explain how it is broken? Do you have a test case? Does it have something to do with the fact that you are using a subclass of Stdio.Buffer?
Yes, I subclass Stdio.Buffer, but I don't think it's an inherit-related bug.
Inside that reverted commit you can see what I did. I tried to eliminate usage of output_to() in favour of using Stdio.File()->write(Stdio.Buffer) instead.
Literally here:
consume(socket->write(this));
this points to a Stdio.Buffer inherited object.
As to what exactly goes wrong... What I can report is that using that commit, it mostly works, until the scheduling gets tight. I.e. under intense pgsql query load, things go wrong. So that suggests that there might be issues that occur when in one thread the write(Stdio.Buffer) runs, and while simultaneously in a different thread bytes are added to that very same Stdio.Buffer.
It's hard to give you a concise broken case. Maybe a case can be synthesised using the above speculation.
As to what exactly goes wrong... What I can report is that using that commit, it mostly works, until the scheduling gets tight. I.e. under intense pgsql query load, things go wrong. So that suggests that there might be issues that occur when
Hmm, curious. If we tried to reproduce this without pgsql though, what would we be looking for? Data going out on the Stdio.File being not what one would expect it to be? I.e. not a crash or so?
Tobias S. Josefowitz @ Pike developers forum wrote:
As to what exactly goes wrong... What I can report is that using that commit, it mostly works, until the scheduling gets tight. I.e. under intense pgsql query load, things go wrong. So that suggests that there might be issues that occur when
Hmm, curious. If we tried to reproduce this without pgsql though, what would we be looking for? Data going out on the Stdio.File being not what one would expect it to be? I.e. not a crash or so?
It did not crash. What happened was that the communication with the database got stuck. I.e. most likely cause would be that some of the add()s get lost.
Theoretically, something like this could reproduce it:
Stdio.File thesocket; Stdio.Buffer thebuf;
Thread A: Tight loop with: thesocket->write(thebuf);
Thread B: Tight loop with: thebuf->add()
Some of the add()s if thread B might get lost (race condition).
This sounds like something which could be connected to the fact that Stdio.File()->write(Stdio.Buffer()) does not release the interpreter lock. This makes it less useful for situations where write is not called from within one backend thread. This makes me think that in your situation using a temporary string would be better. I think this version is thread-safe, allows other threads to add to the buffer in parallel and avoids output_to:
string tmp = buffer->peek(SIZE); int bytes = fd->write(tmp);
if (bytes < 0) { // handle error } buffer->consume(bytes);
It would be possible to make Stdio.File()->write(Stdio.Buffer()) release the interpreter lock at the additional cost of adding synchronization to Stdio.Buffer(). However, the main objective of this API (for me at least) was to make the common use-case more efficient in which you have one single backend doing all the writing on non-blocking sockets. In that situation I think it often makes more sense to have the IO thread complete one loop and release the interpreter lock when calling epoll. Other threads can then run while the backend thread is waiting for new IO events.
Arne
On Mon, 9 Mar 2020, Stephen R. van den Berg wrote:
Tobias S. Josefowitz @ Pike developers forum wrote:
As to what exactly goes wrong... What I can report is that using that commit, it mostly works, until the scheduling gets tight. I.e. under intense pgsql query load, things go wrong. So that suggests that there might be issues that occur when
Hmm, curious. If we tried to reproduce this without pgsql though, what would we be looking for? Data going out on the Stdio.File being not what one would expect it to be? I.e. not a crash or so?
It did not crash. What happened was that the communication with the database got stuck. I.e. most likely cause would be that some of the add()s get lost.
Theoretically, something like this could reproduce it:
Stdio.File thesocket; Stdio.Buffer thebuf;
Thread A: Tight loop with: thesocket->write(thebuf);
Thread B: Tight loop with: thebuf->add()
Some of the add()s if thread B might get lost (race condition).
Stephen.
Arne Goedeke wrote:
This sounds like something which could be connected to the fact that Stdio.File()->write(Stdio.Buffer()) does not release the interpreter lock. This makes it less useful for situations where write is not called from within one backend thread. This makes me think that in your situation using a temporary string would be better. I think this version is thread-safe, allows other threads to add to the buffer in parallel and avoids output_to:
string tmp = buffer->peek(SIZE); int bytes = fd->write(tmp);
if (bytes < 0) { // handle error } buffer->consume(bytes);
I understand the workaround, but having to do this violates the principle of least surprise (IMO). The Pike API should shield me from this.
It would be possible to make Stdio.File()->write(Stdio.Buffer()) release the interpreter lock at the additional cost of adding synchronization to Stdio.Buffer(). However, the main objective of this API (for me at least) was to make the common use-case more efficient in which you have one single backend doing all the writing on non-blocking sockets. In that situation I think it often makes more sense to have the IO thread complete one loop and release the interpreter lock when calling epoll. Other threads can then run while the backend thread is waiting for new IO events.
I agree with the reasoning, but isn't there some middle-ground solution which might work?
We're talking about merely allowing appending to the buffer while a write is in progress. That would mean the write "locks" (using Stdio.Buffer internals) the range of the buffer that it currently can grab and try to write. Other threads *are* allowed to add() to the end of the buffer, but nothing else. Releasing the interpreter lock for just doing that should not be necessary?
Other threads *are* allowed to add() to the end of the buffer, but nothing else. Releasing the interpreter lock for just doing that should not be necessary?
Without releasing the interpreter lock, no other pike threads can run, which means they can't call add() (even if they are allowed to).
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Other threads *are* allowed to add() to the end of the buffer, but nothing else. Releasing the interpreter lock for just doing that should not be necessary?
Without releasing the interpreter lock, no other pike threads can run, which means they can't call add() (even if they are allowed to).
But then, in its current form, the other threads should stall and the add() should not get lost. Yet that appears to be happening.
On Thu, 12 Mar 2020, Stephen R. van den Berg wrote:
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Other threads *are* allowed to add() to the end of the buffer, but nothing else. Releasing the interpreter lock for just doing that should not be necessary?
Without releasing the interpreter lock, no other pike threads can run, which means they can't call add() (even if they are allowed to).
But then, in its current form, the other threads should stall and the add() should not get lost. Yet that appears to be happening.
That is correct, currently no thread should be able to call add() in parallel to write(). If that happens, this is clearly a bug, no data should get lost here.
pike-devel@lists.lysator.liu.se