Is it allowed/supported to put a string created with make_static_string() into an svalue and then give it to say the write() method of an arbitrary Stdio object?
No that is not allowed. make_static_string() is an internal API to create string with static storage (e.g. compile time constant strings). If you want to pass this string to pike code you need to use make_shared_static_string() which creates a "normal" Pike string.
What use-case do you have?
Arne
On Mon, 15 Mar 2021, Stephen R. van den Berg wrote:
Is it allowed/supported to put a string created with make_static_string() into an svalue and then give it to say the write() method of an arbitrary Stdio object? -- Stephen.
No that is not allowed. make_static_string() is an internal API to create string with static storage (e.g. compile time constant strings). If you want to pass this string to pike code you need to use make_shared_static_string() which creates a "normal" Pike string.
What use-case do you have?
Simple, actually. I'm doing some Shuffler fixing again, and noticed that I have to create very temporary strings out of the various source objects and then pass them to a write() method. Those strings are not hashed in most cases and it would be a waste to make them a shared string only to remove them again from the shared string pool a few microseconds later.
I.e. this is the code I am contemplating:
struct pike_string *s = make_static_string(iov[i].iov_base, iov[i].iov_len, 0); push_string(s); apply (t->file_obj, "write", 1); pop_stack();
It seems to be a valid use case, since I provide the iov array and the source of the strings. The source will stay put while the write method runs, and I'll take care of freeing the source myself after the pop_stack(). So, yes, the only thing I need is some window dressing so that the string can be accessed from the Pike stack, and supposedly can be freed there, which should not touch the strings itself.
On Mon, Mar 15, 2021 at 5:05 PM Stephen R. van den Berg srb@cuci.nl wrote:
It seems to be a valid use case, since I provide the iov array and the source of the strings. The source will stay put while the write method runs, and I'll take care of freeing the source myself after the pop_stack(). So, yes, the only thing I need is some window dressing so that the string can be accessed from the Pike stack, and supposedly can be freed there, which should not touch the strings itself.
This would need so much more than just window-dressing. You're asking for trouble. The GC might decide to run, I wouldn't necessarily expect that to work out - but then maybe it does, another thread might request a backtrace of your thread and non-hashed strings might leak into the system like that, or your write call might fail (say OOM), leaving your very own thread with a backtrace leaking the non-hashed string into the system.
Tobias S. Josefowitz wrote:
for trouble. The GC might decide to run, I wouldn't necessarily expect that to work out - but then maybe it does, another thread might
I'd think that wouldn't be a problem, since all refs are accounted for.
request a backtrace of your thread and non-hashed strings might leak
I'm not quite sure what happens if someone copies that string for a backtrace. For this to work, the string should then be copied into a new shared string.
into the system like that, or your write call might fail (say OOM), leaving your very own thread with a backtrace leaking the non-hashed string into the system.
That has been taken care of. My non-hashed strings are accounted for and will be cleaned up properly, even in the midst of exceptions.
On Mon, Mar 15, 2021 at 5:26 PM Stephen R. van den Berg srb@cuci.nl wrote:
I'd think that wouldn't be a problem, since all refs are accounted for.
I'd love to share your optimism, but the GC also detects leaks and such stuffs, calls Pike level code (destroy()/_destruct()) which would all sit on top of your write() in a backtrace requested by those LFUNs).
request a backtrace of your thread and non-hashed strings might leak
I'm not quite sure what happens if someone copies that string for a backtrace. For this to work, the string should then be copied into a new shared string.
Yeah except absolutely no part of Pike or user code is aware of this requirement, and so naturally nothing does it, and that's why your non-hashed string would leak into the system this way and cause all sorts of mayhem later.
into the system like that, or your write call might fail (say OOM), leaving your very own thread with a backtrace leaking the non-hashed string into the system.
That has been taken care of. My non-hashed strings are accounted for and will be cleaned up properly, even in the midst of exceptions.
Nothing is taken care of, they will leak to whoever catches the exception and cause all sorts of mayhem as Pike is not equipped to encounter non-hashed+non-linked strings in the wild in arbitrary, pike-code-reachable svalues.
On Mon, Mar 15, 2021 at 5:31 PM Tobias S. Josefowitz t.josefowitz@gmail.com wrote:
Nothing is taken care of, they will leak to whoever catches the exception and cause all sorts of mayhem as Pike is not equipped to encounter non-hashed+non-linked strings in the wild in arbitrary, pike-code-reachable svalues.
Actually, maybe you're the one catching. In which case, nevermind, it would then indeed be taken care of in that very case, given that you make sure to not "leak" the original error/backtrace.
Maybe a better strategy would be to introduce a new type of buffer object (supported by the get_memory_object_* APIs) which can be used to (cheaply) carry some arbitrary binary data. That is in principle already supported by Stdio.File()->write().
You could also use Stdio.Buffer() for that today, but you would want to make sure to keep the buffer objects for longer than just one read and write pair.
Arne
On Mon, 15 Mar 2021, Stephen R. van den Berg wrote:
Tobias S. Josefowitz wrote:
for trouble. The GC might decide to run, I wouldn't necessarily expect that to work out - but then maybe it does, another thread might
I'd think that wouldn't be a problem, since all refs are accounted for.
request a backtrace of your thread and non-hashed strings might leak
I'm not quite sure what happens if someone copies that string for a backtrace. For this to work, the string should then be copied into a new shared string.
into the system like that, or your write call might fail (say OOM), leaving your very own thread with a backtrace leaking the non-hashed string into the system.
That has been taken care of. My non-hashed strings are accounted for and will be cleaned up properly, even in the midst of exceptions. -- Stephen.
Arne Goedeke wrote:
Maybe a better strategy would be to introduce a new type of buffer object (supported by the get_memory_object_* APIs) which can be used to (cheaply) carry some arbitrary binary data. That is in principle already supported by Stdio.File()->write().
Well, three things come to mind:
a. The Shuffler was broken so much that any practical use of it was impossible until I fixed it in 8.1 around a year ago, so that means that any backward compatibility with its API could be disregarded.
b. Using a Stdio.Buffer or memory object with the write method on the shuffler target will break the shuffler on targets that only support writing strings.
c. I don't think there is a clever way to detect if the target supports Stdio.Buffer/memory objects without triggering an exception (which probably is too much of a performance hit). So it would require changing the API to require support to write Stdio.Buffer/memory objects for shuffler targets.
Thoughts?
Rather than change the API you could extend it to allow the capability to be queried more cheaply perhaps? If the target does not implement the extended API you'd just fall back to using strings.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Rather than change the API you could extend it to allow the capability to be queried more cheaply perhaps? If the target does not implement the extended API you'd just fall back to using strings.
Well, refactoring shuffler code is complicated enough as it is.
So, I took the easy way out, and simply reverted to plain shared strings again for the degenerate case. The highest gain for the shuffler is by ending up on as-few-as-possible writev() calls anyway, so if it has to fallback to the write method of an object, using the shuffler is questionable in the first place.
The Stdio.Buffer output callback is required even for buffers which are not malloced. Now, the following test case
Stdio.Buffer b = Stdio.Buffer("foo"); b->__set_on_write(lambda(mixed ... args) { werror("New Data added.\n") }); b->add("bar");
does not work anymore. I believe this is requird by the Stdio.File buffer mode.
You also removed the __output variable which means that the GC cannot discover cycles anymore.
I think it would be best to simply revert that change.
Arne
Arne Goedeke wrote:
The Stdio.Buffer output callback is required even for buffers which are not malloced. Now, the following test case
Stdio.Buffer b = Stdio.Buffer("foo"); b->__set_on_write(lambda(mixed ... args) { werror("New Data added.\n") }); b->add("bar");
does not work anymore. I believe this is requird by the Stdio.File buffer mode.
Indeed. Forgot about this one.
You also removed the __output variable which means that the GC cannot discover cycles anymore.
Can you (or anyone) explain how the GC uses this variable to detect those? I'm guessing that the GC only works with svalues, not with plain objects?
On Sat, Mar 20, 2021 at 7:40 PM Stephen R. van den Berg srb@cuci.nl wrote:
Can you (or anyone) explain how the GC uses this variable to detect those? I'm guessing that the GC only works with svalues, not with plain objects?
That is precisely why it needs to be mapped, otherwise the GC cycle checks can't see it. If it is mapped, it will (presumably) be picked up by real_gc_cycle_check_object(), among possibly other places. You removed the map, the GC won't see it now, and that's a regression.
By the way, do you have any examples for when
commit 7def00d27ac7a31be123aaf7b63299d612d5802c Author: Stephen R. van den Berg srb@cuci.nl Date: Mon Mar 15 00:27:52 2021 +0100
Stdio.Buffer: Do not optimise the buffer to empty if there are subbuffers.
would cause any
"generic data corruption bugs"? As far as I could see having a quick glance, all changes to the data and/or allocation should be guarded by io_add_space() (and thus result in io_add_space_do_something() throwing an error via io_ensure_unlocked()), thus even if ->len and ->offset are set to zero, no changes to data or allocation should occur. Maybe you actually did notice corruption in Shuffler instead?
Tobias S. Josefowitz wrote:
By the way, do you have any examples for when
commit 7def00d27ac7a31be123aaf7b63299d612d5802c Stdio.Buffer: Do not optimise the buffer to empty if there are subbuffers.
"generic data corruption bugs"? As far as I could see having a quick glance, all changes to the data and/or allocation should be guarded by io_add_space() (and thus result in io_add_space_do_something() throwing an error via io_ensure_unlocked()), thus even if ->len and ->offset are set to zero, no changes to data or allocation should occur. Maybe you actually did notice corruption in Shuffler instead?
Well, retracing my steps I can probably say the following: - Yes, you are correct, that would prevent most issues. - The reason I actually noticed corruption was because I extended the code first to allow appending (only) to buffers that are large enough already, even if they have subbuffers.
The code was/is not robust to begin with though. In io_add_space() the code was:
if( io->len == io->offset ) io->offset = io->len = 0; if( !force && io->malloced && !io->locked && io->len+bytes < io->allocated && (!bytes || io->len+bytes > io->len)) return io->buffer+io->len;
Checking for locked in the second if, and not in the first seems sloppy at best; defensive programming prescribes more robust checks here.
On Sat, Mar 20, 2021 at 10:00 PM Stephen R. van den Berg srb@cuci.nl wrote:
The code was/is not robust to begin with though. In io_add_space() the code was:
if( io->len == io->offset ) io->offset = io->len = 0; if( !force && io->malloced && !io->locked && io->len+bytes < io->allocated && (!bytes || io->len+bytes > io->len)) return io->buffer+io->len;
Checking for locked in the second if, and not in the first seems sloppy at best; defensive programming prescribes more robust checks here.
I don't mind that change, I just wanted to be sure there was no corruption observed that could not actually be explained with this code.
pike-devel@lists.lysator.liu.se