make_static_string()

List overview All Threads
Download

newer

older

Destructed objects inside a...

Shuffler revisited (2)

Stephen R. van den Berg

15 Mar 2021 15 Mar '21

3:11 p.m.

Is it allowed/supported to put a string created with make_static_string() into an svalue and then give it to say the write() method of an arbitrary Stdio object?

-- Stephen.

Show replies by date

Arne Goedeke

15 Mar 15 Mar

3:45 p.m.

No that is not allowed. make_static_string() is an internal API to create string with static storage (e.g. compile time constant strings). If you want to pass this string to pike code you need to use make_shared_static_string() which creates a "normal" Pike string.

What use-case do you have?

Arne

On Mon, 15 Mar 2021, Stephen R. van den Berg wrote:

...

Is it allowed/supported to put a string created with make_static_string() into an svalue and then give it to say the write() method of an arbitrary Stdio object? -- Stephen.

Stephen R. van den Berg

4:05 p.m.

...

No that is not allowed. make_static_string() is an internal API to create string with static storage (e.g. compile time constant strings). If you want to pass this string to pike code you need to use make_shared_static_string() which creates a "normal" Pike string.

...

What use-case do you have?

Simple, actually. I'm doing some Shuffler fixing again, and noticed that I have to create very temporary strings out of the various source objects and then pass them to a write() method. Those strings are not hashed in most cases and it would be a waste to make them a shared string only to remove them again from the shared string pool a few microseconds later.

I.e. this is the code I am contemplating:

struct pike_string *s = make_static_string(iov[i].iov_base, iov[i].iov_len, 0); push_string(s); apply (t->file_obj, "write", 1); pop_stack();

It seems to be a valid use case, since I provide the iov array and the source of the strings. The source will stay put while the write method runs, and I'll take care of freeing the source myself after the pop_stack(). So, yes, the only thing I need is some window dressing so that the string can be accessed from the Pike stack, and supposedly can be freed there, which should not touch the strings itself.

-- Stephen.

Tobias S. Josefowitz

4:15 p.m.

On Mon, Mar 15, 2021 at 5:05 PM Stephen R. van den Berg srb@cuci.nl wrote:

...

It seems to be a valid use case, since I provide the iov array and the source of the strings. The source will stay put while the write method runs, and I'll take care of freeing the source myself after the pop_stack(). So, yes, the only thing I need is some window dressing so that the string can be accessed from the Pike stack, and supposedly can be freed there, which should not touch the strings itself.

This would need so much more than just window-dressing. You're asking for trouble. The GC might decide to run, I wouldn't necessarily expect that to work out - but then maybe it does, another thread might request a backtrace of your thread and non-hashed strings might leak into the system like that, or your write call might fail (say OOM), leaving your very own thread with a backtrace leaking the non-hashed string into the system.

Stephen R. van den Berg

4:26 p.m.

Tobias S. Josefowitz wrote:

...

for trouble. The GC might decide to run, I wouldn't necessarily expect that to work out - but then maybe it does, another thread might

I'd think that wouldn't be a problem, since all refs are accounted for.

...

request a backtrace of your thread and non-hashed strings might leak

I'm not quite sure what happens if someone copies that string for a backtrace. For this to work, the string should then be copied into a new shared string.

...

into the system like that, or your write call might fail (say OOM), leaving your very own thread with a backtrace leaking the non-hashed string into the system.

That has been taken care of. My non-hashed strings are accounted for and will be cleaned up properly, even in the midst of exceptions.

-- Stephen.

Tobias S. Josefowitz

4:31 p.m.

On Mon, Mar 15, 2021 at 5:26 PM Stephen R. van den Berg srb@cuci.nl wrote:

...

I'd think that wouldn't be a problem, since all refs are accounted for.

I'd love to share your optimism, but the GC also detects leaks and such stuffs, calls Pike level code (destroy()/_destruct()) which would all sit on top of your write() in a backtrace requested by those LFUNs).

...

...
request a backtrace of your thread and non-hashed strings might leak

I'm not quite sure what happens if someone copies that string for a backtrace. For this to work, the string should then be copied into a new shared string.

Yeah except absolutely no part of Pike or user code is aware of this requirement, and so naturally nothing does it, and that's why your non-hashed string would leak into the system this way and cause all sorts of mayhem later.

...

...
into the system like that, or your write call might fail (say OOM), leaving your very own thread with a backtrace leaking the non-hashed string into the system.

That has been taken care of. My non-hashed strings are accounted for and will be cleaned up properly, even in the midst of exceptions.

Nothing is taken care of, they will leak to whoever catches the exception and cause all sorts of mayhem as Pike is not equipped to encounter non-hashed+non-linked strings in the wild in arbitrary, pike-code-reachable svalues.

Tobias S. Josefowitz

4:46 p.m.

On Mon, Mar 15, 2021 at 5:31 PM Tobias S. Josefowitz t.josefowitz@gmail.com wrote:

...

Nothing is taken care of, they will leak to whoever catches the exception and cause all sorts of mayhem as Pike is not equipped to encounter non-hashed+non-linked strings in the wild in arbitrary, pike-code-reachable svalues.

Actually, maybe you're the one catching. In which case, nevermind, it would then indeed be taken care of in that very case, given that you make sure to not "leak" the original error/backtrace.

Arne Goedeke

6:02 p.m.

Maybe a better strategy would be to introduce a new type of buffer object (supported by the get_memory_object_* APIs) which can be used to (cheaply) carry some arbitrary binary data. That is in principle already supported by Stdio.File()->write().

You could also use Stdio.Buffer() for that today, but you would want to make sure to keep the buffer objects for longer than just one read and write pair.

Arne

On Mon, 15 Mar 2021, Stephen R. van den Berg wrote:

...

Tobias S. Josefowitz wrote:

...
for trouble. The GC might decide to run, I wouldn't necessarily expect that to work out - but then maybe it does, another thread might

I'd think that wouldn't be a problem, since all refs are accounted for.

...
request a backtrace of your thread and non-hashed strings might leak

I'm not quite sure what happens if someone copies that string for a backtrace. For this to work, the string should then be copied into a new shared string.

...
into the system like that, or your write call might fail (say OOM), leaving your very own thread with a backtrace leaking the non-hashed string into the system.

That has been taken care of. My non-hashed strings are accounted for and will be cleaned up properly, even in the midst of exceptions. -- Stephen.

Stephen R. van den Berg

7:19 p.m.

Arne Goedeke wrote:

...

Maybe a better strategy would be to introduce a new type of buffer object (supported by the get_memory_object_* APIs) which can be used to (cheaply) carry some arbitrary binary data. That is in principle already supported by Stdio.File()->write().

Well, three things come to mind:

a. The Shuffler was broken so much that any practical use of it was impossible until I fixed it in 8.1 around a year ago, so that means that any backward compatibility with its API could be disregarded.

b. Using a Stdio.Buffer or memory object with the write method on the shuffler target will break the shuffler on targets that only support writing strings.

c. I don't think there is a clever way to detect if the target supports Stdio.Buffer/memory objects without triggering an exception (which probably is too much of a performance hit). So it would require changing the API to require support to write Stdio.Buffer/memory objects for shuffler targets.

Thoughts?

-- Stephen.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

9:22 p.m.

Rather than change the API you could extend it to allow the capability to be queried more cheaply perhaps? If the target does not implement the extended API you'd just fall back to using strings.

Stephen R. van den Berg

16 Mar 16 Mar

3:48 p.m.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

Rather than change the API you could extend it to allow the capability to be queried more cheaply perhaps? If the target does not implement the extended API you'd just fall back to using strings.

Well, refactoring shuffler code is complicated enough as it is.

So, I took the easy way out, and simply reverted to plain shared strings again for the degenerate case. The highest gain for the shuffler is by ending up on as-few-as-possible writev() calls anyway, so if it has to fallback to the write method of an object, using the shuffler is questionable in the first place.

-- Stephen.

Arne Goedeke

20 Mar 20 Mar

4:05 p.m.

New subject: Stdio.Buffer()->__set_on_write() broken

The Stdio.Buffer output callback is required even for buffers which are not malloced. Now, the following test case

Stdio.Buffer b = Stdio.Buffer("foo"); b->__set_on_write(lambda(mixed ... args) { werror("New Data added.\n") }); b->add("bar");

does not work anymore. I believe this is requird by the Stdio.File buffer mode.

You also removed the __output variable which means that the GC cannot discover cycles anymore.

I think it would be best to simply revert that change.

Arne

Stephen R. van den Berg

6:40 p.m.

New subject: Stdio.Buffer()->__set_on_write() broken

Arne Goedeke wrote:

...

The Stdio.Buffer output callback is required even for buffers which are not malloced. Now, the following test case

...

Stdio.Buffer b = Stdio.Buffer("foo"); b->__set_on_write(lambda(mixed ... args) { werror("New Data added.\n") }); b->add("bar");

...

does not work anymore. I believe this is requird by the Stdio.File buffer mode.

Indeed. Forgot about this one.

...

You also removed the __output variable which means that the GC cannot discover cycles anymore.

Can you (or anyone) explain how the GC uses this variable to detect those? I'm guessing that the GC only works with svalues, not with plain objects?

-- Stephen.

Tobias S. Josefowitz

7:11 p.m.

New subject: Stdio.Buffer()->__set_on_write() broken

On Sat, Mar 20, 2021 at 7:40 PM Stephen R. van den Berg srb@cuci.nl wrote:

...

Can you (or anyone) explain how the GC uses this variable to detect those? I'm guessing that the GC only works with svalues, not with plain objects?

That is precisely why it needs to be mapped, otherwise the GC cycle checks can't see it. If it is mapped, it will (presumably) be picked up by real_gc_cycle_check_object(), among possibly other places. You removed the map, the GC won't see it now, and that's a regression.

By the way, do you have any examples for when

commit 7def00d27ac7a31be123aaf7b63299d612d5802c Author: Stephen R. van den Berg srb@cuci.nl Date: Mon Mar 15 00:27:52 2021 +0100

Stdio.Buffer: Do not optimise the buffer to empty if there are subbuffers.

would cause any

"generic data corruption bugs"? As far as I could see having a quick glance, all changes to the data and/or allocation should be guarded by io_add_space() (and thus result in io_add_space_do_something() throwing an error via io_ensure_unlocked()), thus even if ->len and ->offset are set to zero, no changes to data or allocation should occur. Maybe you actually did notice corruption in Shuffler instead?

Stephen R. van den Berg

9 p.m.

New subject: Stdio.Buffer()->__set_on_write() broken

Tobias S. Josefowitz wrote:

...

By the way, do you have any examples for when

...

commit 7def00d27ac7a31be123aaf7b63299d612d5802c Stdio.Buffer: Do not optimise the buffer to empty if there are subbuffers.

...

"generic data corruption bugs"? As far as I could see having a quick glance, all changes to the data and/or allocation should be guarded by io_add_space() (and thus result in io_add_space_do_something() throwing an error via io_ensure_unlocked()), thus even if ->len and ->offset are set to zero, no changes to data or allocation should occur. Maybe you actually did notice corruption in Shuffler instead?

Well, retracing my steps I can probably say the following: - Yes, you are correct, that would prevent most issues. - The reason I actually noticed corruption was because I extended the code first to allow appending (only) to buffers that are large enough already, even if they have subbuffers.

The code was/is not robust to begin with though. In io_add_space() the code was:

if( io->len == io->offset ) io->offset = io->len = 0; if( !force && io->malloced && !io->locked && io->len+bytes < io->allocated && (!bytes || io->len+bytes > io->len)) return io->buffer+io->len;

Checking for locked in the second if, and not in the first seems sloppy at best; defensive programming prescribes more robust checks here.

-- Stephen.

Tobias S. Josefowitz

9:27 p.m.

New subject: Stdio.Buffer()->__set_on_write() broken

On Sat, Mar 20, 2021 at 10:00 PM Stephen R. van den Berg srb@cuci.nl wrote:

...

The code was/is not robust to begin with though. In io_add_space() the code was:
if( io->len == io->offset )
  io->offset = io->len = 0;
if( !force && io->malloced && !io->locked && io->len+bytes < io->allocated &&
    (!bytes || io->len+bytes > io->len))
  return io->buffer+io->len;
Checking for locked in the second if, and not in the first seems sloppy at best; defensive programming prescribes more robust checks here.

I don't mind that change, I just wanted to be sure there was no corruption observed that could not actually be explained with this code.

1582

Age (days ago)

1587

Last active (days ago)

pike-devel@lists.lysator.liu.se

15 comments

4 participants

tags (0)

participants (4)

Arne Goedeke
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum
Stephen R. van den Berg
Tobias S. Josefowitz