callback support for encoding JSON

List overview All Threads
Download

newer

older

Fwd: Bug#708366: pike7.8: FTBFS...

Segfaults and module dumping...

Martin Bähr

9 May 2013 9 May '13

8:07 a.m.

hi,

the JSON module can only encode objects if they provide an encode_json function. i find this a bit limiting because it is hardly possible to add encode_json to all classes that would need it.

better would be to allow the caller of JSON.encode provide a callback for objects that the module can't handle by itself.

it seems to me the simplest would be to add an argument to JSON.encode to allow passing a function. and then call that function of the object has none of its own.

i am unfamiliar with CMODs but i think in src/post_modules/JSON/json.cmod i need to add the function to the encode_context and then call it in the PIKE_T_OBJECT case.

also what i am wondering is, should i store the whole svalue for the function or just the functionpointer itself? (is that u.efun->function?)

greetings, martin.

Show replies by date

Arne Goedeke

9 May 9 May

10:22 a.m.

Hi martin,

I agree, its probably a good thing to have. I would store the callback as an svalue into the context, as you propose and call it using apply_svalue. that way one can also pass an object with `() lfun.

arne

On Thu, 9 May 2013, Martin B�hr wrote:

...

hi,

the JSON module can only encode objects if they provide an encode_json function. i find this a bit limiting because it is hardly possible to add encode_json to all classes that would need it.

better would be to allow the caller of JSON.encode provide a callback for objects that the module can't handle by itself.

it seems to me the simplest would be to add an argument to JSON.encode to allow passing a function. and then call that function of the object has none of its own.

i am unfamiliar with CMODs but i think in src/post_modules/JSON/json.cmod i need to add the function to the encode_context and then call it in the PIKE_T_OBJECT case.

also what i am wondering is, should i store the whole svalue for the function or just the functionpointer itself? (is that u.efun->function?)

greetings, martin.

Martin BÃ¤hr

22 May 22 May

1:53 a.m.

i started working on that, but i am stuck missing something: http://paste.lisp.org/display/137237

this patch compiles but causes a segfault. it is not even getting to apply_svalue but fails already when testing if there is a function argument.

greetings, martin.

Arne Goedeke

9:48 a.m.

In your code callback can be NULL. You therefore need to check ctx->callback before you dereference it. Something like

if (ctx->callback && TYPEOF(*ctx->callback) == PIKE_T_FUNCTION)

should take care of the segfault.

arne

On Wed, 22 May 2013, Martin Bähr wrote:

...

i started working on that, but i am stuck missing something: http://paste.lisp.org/display/137237

this patch compiles but causes a segfault. it is not even getting to apply_svalue but fails already when testing if there is a function argument.

greetings, martin.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

10:05 a.m.

Also, testing on PIKE_T_FUNCTION might not be such a good idea, since then using an object with a `() operator won't work, neither will using a program as a callback. The builtin callablep() recognizes PIKE_T_FUNCTION, PIKE_T_PROGRAM, PIKE_T_OBJECT, and PIKE_T_ARRAY, but a better approach might be to just try apply_svalue if the svalue is nonzero and not a string (the case you want to handle differently).

Martin Bähr

1:36 p.m.

On Wed, May 22, 2013 at 09:05:02AM +0000, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

Also, testing on PIKE_T_FUNCTION might not be such a good idea, since then using an object with a `() operator won't work, neither will using a program as a callback. The builtin callablep() recognizes PIKE_T_FUNCTION, PIKE_T_PROGRAM, PIKE_T_OBJECT, and PIKE_T_ARRAY, but a better approach might be to just try apply_svalue if the svalue is nonzero and not a string (the case you want to handle differently).

thanks, that is very helpful! but why would i want to handle a string differently? (ok, i can imagine just using the string... did you mean that?)

greetings, martin.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

2:10 p.m.

I don't know why you want to have special handling of strings, I just noticed that you already do, and assumed you wanted to keep it that way. ;-)

Arne Goedeke

3:04 p.m.

Maybe its helpful to be able to pass a string there, aswell. That way one could encode all objects without encode_json() as 'undefined' cheaply.

Just a thought.

arne

On Wed, 22 May 2013, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

I don't know why you want to have special handling of strings, I just noticed that you already do, and assumed you wanted to keep it that way. ;-)

Martin Bähr

23 May 23 May

6:04 a.m.

On Wed, May 22, 2013 at 04:04:38PM +0200, Arne Goedeke wrote:

...

Maybe its helpful to be able to pass a string there, aswell. That way one could encode all objects without encode_json() as 'undefined' cheaply.

indeed, i added that.

since this is my first venture into CMODs i am attaching the patch here for review. (i also need to check or update my commit access, so i'd appreciate if someone else could apply this patch if it is ok)

greetings, martin.

Martin Bähr

10:16 a.m.

hi,

dealing with objects is pretty straight forward, but it gets more interesting with mapping-keys.

json only allows strings as keys, and the current json module hence fails at converting keys that are not strings.

the problem i need to solve involves converting user-data into json without failing, and short of walking the input data to weed out bad values, catching them in the encode step is the only way.

python http://stackoverflow.com/questions/1450957/pythons-json-module-converts-int-... converts ints simply to strings, which is one option, but i think inserting the callback here too, to let the caller deal with the problem is still better.

now i am wondering what arguments to send to the caller? just the key? or the key and the value? or the whole mapping?

one reason to send the whole mapping is help check for like ([ 1:"a", "1":"a" ]) which could result in duplicate keys, if the callback simply does { return (string)key; }

passing the whole mapping to the callback would allow the caller to analyze the situation, but somehow i feel it also would make things complicated.

one option would be to accept the value from the callback but check for and throw on duplicate keys in the module. that should deal with the most normal cases and only throw in unusual ones.

greetings, martin.

Arne Goedeke

1:24 p.m.

If performance does not matter, you could add a pike function to the JSON modules called encode_anyway(), which recursively filters the data and passes it on to encode. I personally prefer having a somewhat strict encoder/decoder, since that also prevents programmer mistakes, which could otherwise go unnoticed.

The duplicate key problem you mention is not easy to solve. It would also make the encoded mappings non-deterministic, since mappings have no order.

arne

ps. i would commit a slightly modified version of your patch, unless you think we need to solve the mapping problem, too

On Thu, 23 May 2013, Martin B�hr wrote:

...

hi,

dealing with objects is pretty straight forward, but it gets more interesting with mapping-keys.

json only allows strings as keys, and the current json module hence fails at converting keys that are not strings.

the problem i need to solve involves converting user-data into json without failing, and short of walking the input data to weed out bad values, catching them in the encode step is the only way.

python http://stackoverflow.com/questions/1450957/pythons-json-module-converts-int-... converts ints simply to strings, which is one option, but i think inserting the callback here too, to let the caller deal with the problem is still better.

now i am wondering what arguments to send to the caller? just the key? or the key and the value? or the whole mapping?

one reason to send the whole mapping is help check for like ([ 1:"a", "1":"a" ]) which could result in duplicate keys, if the callback simply does { return (string)key; }

passing the whole mapping to the callback would allow the caller to analyze the situation, but somehow i feel it also would make things complicated.

one option would be to accept the value from the callback but check for and throw on duplicate keys in the module. that should deal with the most normal cases and only throw in unusual ones.

greetings, martin.

Martin BÃ¤hr

4:16 p.m.

On Thu, May 23, 2013 at 02:24:02PM +0200, Arne Goedeke wrote:

...

If performance does not matter, you could add a pike function to the JSON modules called encode_anyway(), which recursively filters the data and passes it on to encode. I personally prefer having a somewhat strict encoder/decoder, since that also prevents programmer mistakes, which could otherwise go unnoticed.

well, the parser would remain strict if you don't use a callback that breaks that. (assuming we allow the callback to do that)

...

The duplicate key problem you mention is not easy to solve. It would also make the encoded mappings non-deterministic, since mappings have no order.

i don't see how a callback should change anything. the order in which keys are encoded is determined by the original mapping. a callback on keys would not influence that order.

one idea is, while encoding the mapping, to take the return value of the callback, and check it against the original mapping (which we should still have, even if we are already halfway done encoding). if the original mapping already has a key that matches the returned string, then throw an error, otherwise allow the key.

this would prevent duplicates and only throw an error with wierd data or if the callback returns the same string for any value.

...

ps. i would commit a slightly modified version of your patch, unless you think we need to solve the mapping problem, too

please go ahead, it's going to take a while until i'll get to continue working on this and solve the mapping problem. until then the current solution is already useful.

greetings, martin.

Arne Goedeke

6:35 p.m.

On Thu, 23 May 2013, Martin Bähr wrote:

...

one idea is, while encoding the mapping, to take the return value of the callback, and check it against the original mapping (which we should still have, even if we are already halfway done encoding). if the original mapping already has a key that matches the returned string, then throw an error, otherwise allow the key.

Unfortunately, its not that simple. Assume

([ 1 : 0, 2 : 0, 3 : 0 ])

and a callback that always returns ""foo"". ""foo"" would not be found in the mapping and therefore lead to invalid json (the rfc says that duplicate keys should not occur). Most parser will probably either use the first encountered key or the last one. Some will error. The pike one currently uses the last one, since that is easy to implement. The json rfc does not specify _what_ a parser is supposed to do.

Maybe my argument is overly pedantic: Currently, using encode_json() one can already create invalid json. In fact I have been using that regulary to produce javascript, for instance to encode objects as (new SomeClass()).

I guess my opposition to more hooks is that somewhere down the road that callback solution is not flexible enough. In particular your callback might want to encode objects as some string in case they are mapping keys but treat them differently, if they are not. How do you tell the difference? Of course that could be solved by either having two callbacks or passing some kind of flag...

Its seems more reasonable to me to have a limited set of tweaks; in particular since the mapping from pike data types to json is very well defined. Some things can be mapped, others cannot. Everything else has to be transformed first in the same way as it has to be transformed when decoding from json.

arne

Martin Bähr

24 May 24 May

12:56 a.m.

On Thu, May 23, 2013 at 07:35:51PM +0200, Arne Goedeke wrote:

...

Unfortunately, its not that simple. Assume ([ 1 : 0, 2 : 0, 3 : 0 ]) and a callback that always returns ""foo"". ""foo"" would not be found in the mapping and therefore lead to invalid json

ah, right, i'd have to keep track of the new keys as well.

...

Maybe my argument is overly pedantic: Currently, using encode_json() one can already create invalid json.

mind you, i am arguing on your side :-) but yes, also, there is a difference between the ability to control what the callback returns, and to control the original data. getting an error when the callback result conflicts with original data is more needed than an error about returning the same value multiple times, because presumably i can figure out the latter myself more easely.

more so, for a truly save output we'd need to parse the callback result to verify that it is valid json, or let the callback return a pike datastructure. that would be interesting, because it could mean that for an object the callback could return a mapping which would result in a valid JSON object.

(this brings up some other ideas: without an encode function, what about casting objects to a mapping or an array? presumably if an object supports such a cast, then the cast result is equivalent to the original object?

for keys, an automatic cast to string could be applied. (though this would be ugly if the key is an array or mapping, but that's another edge-case))

...

I guess my opposition to more hooks is that somewhere down the road that callback solution is not flexible enough. In particular your callback might want to encode objects as some string in case they are mapping keys but treat them differently, if they are not. How do you tell the difference? Of course that could be solved by either having two callbacks or passing some kind of flag...

i have been thinking about that too. it partly inspired the suggestion if maybe the key:value pair should be passed or the whole mapping.

either way though, i'd prefer to start with a simple solution. not knowing the difference between keys and values is still better than not handling keys at all, which is currently the case.

we can always improve on that later if it is needed.

...

Its seems more reasonable to me to have a limited set of tweaks;

what do you mean by limited set of tweaks? as opposed to one callback argument?

the main thing i am concerned about is to be able to convert arbitrary datastructures to json without throwing an error or needing an expensive walk through the datastructure to weed out bad values.

the latter is what i currently need to do if i want to prevent my application to stop working because a user somewhere decided to add a mapping with problematic keys. (it's not urgent because for the moment the only way to add that mapping would be by uploading it via json in the first place. but that will change later when we expose other APIs.)

greetings, martin.

Arne Goedeke

7:57 a.m.

On Fri, 24 May 2013, Martin B�hr wrote:

...

the main thing i am concerned about is to be able to convert arbitrary datastructures to json without throwing an error or needing an expensive walk through the datastructure to weed out bad values.

Yes, I see your point. However, I dont think that walk through is necessarily more expensive than calling some pike function for each element. Clearly, that depends on ratio of 'good' to 'bad' data, but I dont think having it would be a huge win, anyway.

arne

Martin BÃ¤hr

11:41 a.m.

On Fri, May 24, 2013 at 08:57:54AM +0200, Arne Goedeke wrote:

...

...
the main thing i am concerned about is to be able to convert arbitrary datastructures to json without throwing an error or needing an expensive walk through the datastructure to weed out bad values.

Yes, I see your point. However, I dont think that walk through is necessarily more expensive than calling some pike function for each element. Clearly, that depends on ratio of 'good' to 'bad' data, but I dont think having it would be a huge win, anyway.

one problem with the walkthough before passing things to the JSON.encode() is that it involves modifying the datastructure, and this may not always be possible without a full deep copy. when that involves objects this may be difficult or not even possible.

greetings, martin.

Martin Baehr

22 May 22 May

3:40 p.m.

On Wed, May 22, 2013 at 01:10:01PM +0000, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

I don't know why you want to have special handling of strings, I just noticed that you already do, and assumed you wanted to keep it that way. ;-)

the string handling that's there is to pick up the return value of the encode_json function (and with a working version of my patch, also of the callback) from the stack.

greetings, martin.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

4:05 p.m.

Yeah, but the special handling is to use the string without calling it first (or throwing an exception).

Old code:

if (function) call it if (string) do nothing // special case, don't call or throw else throw

New code:

if (string) do nothing // special case, don't call or throw else call it // will throw if not a callable

Martin Bähr

23 May 23 May

4:44 a.m.

On Wed, May 22, 2013 at 09:05:02AM +0000, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

Also, testing on PIKE_T_FUNCTION might not be such a good idea, since then using an object with a `() operator won't work, neither will using a program as a callback. The builtin callablep() recognizes PIKE_T_FUNCTION, PIKE_T_PROGRAM, PIKE_T_OBJECT, and PIKE_T_ARRAY, but a better approach might be to just try apply_svalue if the svalue is nonzero

in order to use one of those methods, what pike type do i need? program|object|function? or should i use mixed?

greetings, martin.

Martin BÃ¤hr

22 May 22 May

1:32 p.m.

On Wed, May 22, 2013 at 10:48:04AM +0200, Arne Goedeke wrote:

...

In your code callback can be NULL. You therefore need to check ctx->callback before you dereference it. Something like if (ctx->callback && TYPEOF(*ctx->callback) == PIKE_T_FUNCTION) should take care of the segfault.

pebkac, i know i tried testing for NULL, but probably when i did that i missed the * and made the wrong conclusion on the error.

thanks. it does indeed work now!

greetings, martin.

4460

Age (days ago)

4475

Last active (days ago)

pike-devel@lists.lysator.liu.se

19 comments

5 participants

tags (0)

participants (5)

Arne Goedeke
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum
Martin Baehr
Martin BÃ¤hr
Martin Bähr