possible sscanf misfeatures

List overview All Threads
Download

newer

older

7.8 segfault

foreign_ident frustration

H. William Welliver III

22 Jun 2011 22 Jun '11

6:43 a.m.

I don't recall if anyone else has reported the following problems with sscanf() or not. I'm using 7.8.352 and having the following problems:

1. The + and - modifiers to %c don't seem to work in combination, even though it seems reasonable that one might want to decode a signed value in big endian order. Separately they work fine.

2. The - modifier does not work with %F. I'm currently using a workaround of reading in the bytes as an array, reversing them and then feeding them back into sscanf().

Bill

Show replies by date

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

22 Jun 22 Jun

7 a.m.

...

The + and - modifiers to %c don't seem to work in combination, even though it seems reasonable that one might want to decode a signed value in big endian order. Separately they work fine.

(ITYM "little endian"...)

Seems working to me:

Pike v7.8 release 350 running Hilfe v3.5 (Incremental Pike Frontend)

...

array_sscanf("\xff\xf0","%+-2c");

(1) Result: ({ /* 1 element */ -3841 })

...

array_sscanf("\xff\xf0","%-+2c");

(2) Result: ({ /* 1 element */ -3841 })

...

Bill Welliver

6:12 p.m.

Hmm...

I've changed the original source of the data to not use signed ints, so I can't re-verify the original problem at the moment. I have run your tests and they appear to be correct here as well.

The problem with %F still stands, though, I think, as the first line doesn't work but second does:

array_sscanf(sprintf("%-F", 4.3), "%-F"); (4) Result: ({ /* 1 element */ -6.35011e-23 })

...

array_sscanf(reverse(sprintf("%-F", 4.3)), "%-F");

(5) Result: ({ /* 1 element */ 4.3 })

Which is definitely not correct behavior, right?

Additionally, I did run into a problem using the + modifier in sprintf(): while the resulting string is encoded as signed, it prepends a +, which seems incorrect to me, such that it's not possible to decode without first removing the +:

array_sscanf(sprintf("%-+4c", -65335), "%-+4c"); (16) Result: ({ /* 1 element */ -16725717 })

array_sscanf(sprintf("%-+4c", -65335)[1..], "%-+4c"); (18) Result: ({ /* 1 element */ -65335 })

On Wed, 22 Jun 2011, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

...

The + and - modifiers to %c don't seem to work in combination, even though it seems reasonable that one might want to decode a signed value in big endian order. Separately they work fine.

(ITYM "little endian"...)

Seems working to me:

Pike v7.8 release 350 running Hilfe v3.5 (Incremental Pike Frontend)

...
array_sscanf("\xff\xf0","%+-2c");

(1) Result: ({ /* 1 element */ -3841 })

...
array_sscanf("\xff\xf0","%-+2c");

(2) Result: ({ /* 1 element */ -3841 })

...

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

6:20 p.m.

...

Additionally, I did run into a problem using the + modifier in sprintf(): while the resulting string is encoded as signed, it prepends a +, which seems incorrect to me, such that it's not possible to decode without first removing the +:

array_sscanf(sprintf("%-+4c", -65335), "%-+4c"); (16) Result: ({ /* 1 element */ -16725717 })

While prepending the + seems a bit pointless, that's what + does in sprintf. You don't need to add it to get a "signed" binary number:

...

array_sscanf(sprintf("%-4c", -65335), "%-+4c");

(2) Result: ({ /* 1 element */ -65335 })

...

Bill Welliver

7:13 p.m.

...

While prepending the + seems a bit pointless, that's what + does in sprintf. You don't need to add it to get a "signed" binary number:

sprintf() seems to switch to signed mode automatically when a negative number is presented, which is clever but to me seems fundamentally flawed, if you're trying to interchange data. You certainly don't want a field to be encoded one way for certain situations but another way the rest of the time. There doesn't appear to be a way to force unsigned encoding, which makes things difficult when trying to pack data, right?

...

...
array_sscanf(sprintf("%-4c", -65335), "%-+4c");

(2) Result: ({ /* 1 element */ -65335 })

...

Mirar ＠ Pike developers forum

7:45 p.m.

A binary number is a binary number regardless, so I don't think it should do anything in sprintf. Did you want it to throw an error if the number doesn't fit?

These integers will have the same lowest bits, regardless:

...

sprintf("%1c",-2);

(13) Result: "\376"

...

sprintf("%1c",254);

(14) Result: "\376"

...

sprintf("%1c",254+65536);

(15) Result: "\376"

It seems '+' to sscanf means to read it as a signed integer, so for me it works as expected:

...

array_sscanf("\xff\xfe"*2,"%2c%-2c");

(23) Result: ({ /* 2 elements */ 65534, 65279 })

...

array_sscanf("\xff\xfe"*2,"%+2c%+-2c");

(24) Result: ({ /* 2 elements */ -2, -257 })

Bill Welliver

8:35 p.m.

I don't think that an error needs to be thrown, as it will just overflow and wrap around. My concern is that using %c may or may not throw a sign bit into the encoded value. That requires the recipient of that value to know whether the value was negative ahead of time.

Or, put another way, if you encode a 32 bit number using %4c, how do you decode it reliably?

Bill

On Wed, 22 Jun 2011, Mirar @ Pike developers forum wrote:

...

A binary number is a binary number regardless, so I don't think it should do anything in sprintf. Did you want it to throw an error if the number doesn't fit?

These integers will have the same lowest bits, regardless:

...
sprintf("%1c",-2);

(13) Result: "\376"

...
sprintf("%1c",254);

(14) Result: "\376"

...
sprintf("%1c",254+65536);

(15) Result: "\376"

It seems '+' to sscanf means to read it as a signed integer, so for me it works as expected:

...
array_sscanf("\xff\xfe"*2,"%2c%-2c");

(23) Result: ({ /* 2 elements */ 65534, 65279 })

...
array_sscanf("\xff\xfe"*2,"%+2c%+-2c");

(24) Result: ({ /* 2 elements */ -2, -257 })

Mirar ＠ Pike developers forum

8:45 p.m.

If you encode a signed number or an unsigned number, the reliable region will be the same (0<=x<2^(bits-1)). You need to know if it's stored unsigned or signed when you decode, but not when you encode, unless you want to throw an error for numbers that don't fit.

Compare to casting in C - cast -17 to (unsigned). It's not going to be 0, you just told the compiler that it was actually an unsigned stored there.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

8:50 p.m.

...

I don't think that an error needs to be thrown, as it will just overflow and wrap around. My concern is that using %c may or may not throw a sign bit into the encoded value. That requires the recipient of that value to know whether the value was negative ahead of time.

No, but he needs to know whether to interpret the MSB is a sign bit or not (or expressed differently, whether to sign-extend or zero-extend the value after decoding it). But he also need to know how many octets to decode, how is this any different? %1c can be used to encode either the range 0..255 or the range -128..127, and you need to decide which one to use when you specify the binary format you are encoding into, just like you need to decide whether to use 1, 2 or 17 octets to encode the number. If you use a number outside of the selected range, sscanf will truncate it. If you choose the range 0..255 (%1c), 270 will be truncated to 14 and -3 will be truncated to 253. If you choose the range -128..127 (%1c for encode, and %+1c for decode), 130 will be truncated to -126 and -200 to 56. sprintf will not check that the value is in the chosen range, that's up to you.

sprintf %Nc will never "throw a sign bit in", it will simply encode the N*8 least significant bits of the 2-complement binary representation of the integer. This is true whether or not the number is negative.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

8:50 p.m.

...

If you use a number outside of the selected range, sscanf will truncate it.

This was supposed to read "sprintf will truncate it". sscanf will always faithfully return the number which is encoded with the specified encoding.

Mirar ＠ Pike developers forum

6:35 p.m.

I didn't even know %c had a '+' modifier. What's that supposed to do? It's normally used to add a '+' characters to formatted positive numbers, not sure what that should accomplish in binary numbers?

The sscanf %-4F and %-8F have been non-working as far as I can remember. The '-' does nothing for %F.

Bill Welliver

7:17 p.m.

%c can take a + modifier in sscanf but not sprintf. There doesn't appear to be a way to force unsigned encoding using sprintf, it just automatically happens when there's a negative number presented. My concern about this is that the operation isn't symmetric when used with sscanf.

Am I wrong in saying that the fact that %-F doesn't work is a bug?

Bill

On Wed, 22 Jun 2011, Mirar @ Pike developers forum wrote:

...

I didn't even know %c had a '+' modifier. What's that supposed to do? It's normally used to add a '+' characters to formatted positive numbers, not sure what that should accomplish in binary numbers?

The sscanf %-4F and %-8F have been non-working as far as I can remember. The '-' does nothing for %F.

Henrik Grubbstr�m (Lysator) ＠ Pike (-) developers forum

8:25 p.m.

...

%c can take a + modifier in sscanf but not sprintf. There doesn't appear to be a way to force unsigned encoding using sprintf, it just automatically happens when there's a negative number presented. My concern about this is that the operation isn't symmetric when used with sscanf.

I agree that the behaviour for the + modifier in the sprintf case is a bit confusing.

Note that there's no such thing as unsigned encoding when it comes to fix-sized binary numbers.

...

Am I wrong in saying that the fact that %-F doesn't work is a bug?

No, I agree that %-F ought to work.

Bill Welliver

8:53 p.m.

...

I agree that the behaviour for the + modifier in the sprintf case is a bit confusing.

Note that there's no such thing as unsigned encoding when it comes to fix-sized binary numbers.

But there's certainly a reasonably well accepted way to represent, say, a 16 bit unsigned integer, right?

The "problem" with sprintf wasn't my original problem, I only noticed it when I was trying to troubleshoot my decoding difficulties with sscanf (which I'm using to decode a struct delivered to me from a little-endian microcontroller). I thought that I wasn't able to decode little-endian unsigned integers but I can't verify what was going wrong now, as I've switched to uints. For that, I'll have to change the firmware when I get some more time.

...

...
Am I wrong in saying that the fact that %-F doesn't work is a bug?

No, I agree that %-F ought to work.

Ok, at least I'm not completely crazy on one of the points.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

9 p.m.

...

But there's certainly a reasonably well accepted way to represent, say, a 16 bit unsigned integer, right?

Certainly.

16 bit unsigned integer representation:

Use %2c to encode and %2c to decode. If you try to encode a value that can not be represented as a 16 bit unsigned integer, you'll get a truncated value.

16 bit signed integer representation:

Use %2c to encode and %+2c to decode. If you try to encode a value that can not be represented as a 16 bit signed integer, you'll get a truncated value.

Mirar ＠ Pike developers forum

9 p.m.

Can't we just have the + in "%+2c" do nothing? Then you can have it symmetric, which is nice for readability.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

9 p.m.

In sprintf, you mean? Yes, that would at least make more sense than the current behaviour, which is to _always_ add a +, regardless of whether the number was positive or not... :-/

5033

Age (days ago)

5033

Last active (days ago)

pike-devel@lists.lysator.liu.se

16 comments

5 participants

tags (0)

participants (5)

Bill Welliver
H. William Welliver III
Henrik Grubbstr�m (Lysator) ＠ Pike (-) developers forum
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum
Mirar ＠ Pike developers forum