I’ve noticed an odd problem a few years ago with using parameter bindings with SQLite and never really took the time to figure out the cause until now. I think it’ a pretty major problem that can be easily fixed, though would cause problems for anyone relying on the (imo) currently broken behavior.
Basically, the problem is that when using parameter bindings with strings, the pike glue determines the type of the data based on the width of the string: if it’s a wide string, the parameter is bound as a text, but if it’s 8 bit, it’s bound as a blob. The problem is that text and blob values aren’t evaluated the same, so even though a text value and blob value may be identical byte-wise, they are not equal when doing comparisons and so queries that would seem to match actually don’t.
This is compounded by the fact that SQLite has no restrictions on what may be stored in a field (type definitions on a table are merely default suggestions) and SQLite will happily store whatever you want (such as a blob value in a text field).
For example, imagine a table m with an integer field id and a char field value:
s->query(“INSERT INTO m VALUES(1, 'value a’)); s->query(“INSERT INTO m VALUES(2, :value)”, ([“:value”, “value a”])); s->query(“SELECT * FROM m where value=‘value a’”);
The third query will return 1 record instead of the expected 2 because the field value in the second row is stored as a blob and thus does not match a text query. In order to retrieve both records, you must use LIKE or know to cast the value to text using “AS TEXT” in the query, which is obviously a lot less ideal in most cases.
If all of your data is stored using the pike glue, and you’re always using parameters, it’s possible never to see the problem, but if not, this causes all kinds of problems because of the need for constantly casting values and possibly using blob literals, which are not human readable when you’re really trying to query for a non-wide string.
My suggestion is to change the binding type for 8 bit string parameters to text in order to match the rest of the string handling. It would be good to have a datatype that caused a parameter to be bound as a blob, either a native byte array or some object that wraps an 8 bit string (Sql.BinaryString?) in order to cause the binding to act accordingly.
Of course, this causes a compatibility problem for anyone who was using pike binding with bindings previously (is there anyone who us doing this and hasn’t run into this problem?), as any 8-bit strings that weren’t binary data would be stored as blobs, even if they were in a text typed field. These records would need to be re-stored with the proper text type, which could be done with a query to update the table. My sense is that this is the proper thing to do, as blob fields should be reserved for data that’s actually binary data (as opposed to text).
Thoughts?
H. William Welliver III wrote:
My suggestion is to change the binding type for 8 bit string parameters to text in order to match the rest of the string handling. It would be good to have a datatype that caused a parameter to be bound as a blob, either a native byte array or some object that wraps an 8 bit string (Sql.BinaryString?) in order to cause the binding to act accordingly.
The "datatype" that comes closest to being a suitable native byte array these days is Stdio.Buffer.
Of course, this causes a compatibility problem for anyone who was using pike binding with bindings previously (is there anyone who us
I presume you mean SQLite with typed queries here?
doing this and hasn???t run into this problem?), as any 8-bit strings that weren???t binary data would be stored as blobs, even if they were in a text typed field. These records would need to be re-stored with the proper text type, which could be done with a query to update the table. My sense is that this is the proper thing to do, as blob fields should be reserved for data that???s actually binary data (as opposed to text).
Two things: - The contortions you describe to get the queries right with the current behaviour would indicate that anyone who would have tried to do the same would likely have ended up complaining here while trying to get it right. - In general I've noticed that very few, if any, people are actually using *typed* queries from Sql.Sql.
So that would suggest that your change would be beneficial, and would (for safety) require a bit of compat code to get it right for anyone unfortunate enough to rely on older behaviour.
> The "datatype" that comes closest to being a suitable native byte array > these days is Stdio.Buffer.
Is this true? If the binding were to interpret any Buffer object as binary data, would that cause surprise? My reading suggests that it is a grey area and I'd rather solve the problem unambiguously if it's possible.
> I presume you mean SQLite with typed queries here?
I'm not sure what a typed query is; this has to do with using binding parameters and the way the pike glue to SQLite binds a given parameter.
> doing this and hasn???t run into this problem?), as any 8-bit strings > that weren???t binary data would be stored as blobs, even if they were > in a text typed field. These records would need to be re-stored with > the proper text type, which could be done with a query to update the > table. My sense is that this is the proper thing to do, as blob fields > should be reserved for data that???s actually binary data (as opposed > to text).
> - The contortions you describe to get the queries right with the current > behaviour would indicate that anyone who would have tried to do the same > would likely have ended up complaining here while trying to get it right.
I have a hard time imagining that anyone has tried it. Not only does it cause all kinds of complications within pike, it also means that queries executed directly against the sqlite command would fail in similar ways, as there could be mixed types of data within a given column and writing some string as a blob literal means converting it to hex data first, which isn't something many people can do on the fly.
> - In general I've noticed that very few, if any, people are actually using > *typed* queries from Sql.Sql.
> So that would suggest that your change would be beneficial, and would > (for safety) require a bit of compat code to get it right for anyone > unfortunate enough to rely on older behaviour.
I would argue the opposite: the existing behavior is so broken that trying to come up with compat for it would just make existing code more brittle because the effect of the current code is to effectively corrupt the data in a column.
Example:
a column of type text may have individual elements that are text or blob (or number even) values depending on whether the data was inserted using a text literal 'some text', a blob literal X'14EC24', a binding parameter from pike that happened to be a wide string (stored as text) or a narrow string (stored as binary). So any query run against such a column would likely return incorrect results unless the values and the query values were always cast to one or the other.
I can't think of a scenario where this is desirable. Even if you ignore applications written for ASCII code only, I imagine there are lots of narrow values in languages that also have wide characters.
My proposal would be to fix this so that all strings are stored as text and that all values of some other object type (Stdio.Buffer or preferably some Sql datatype wrapper for bytestrings) are stored as blobs. A reasonable workaround for anyone crazy enough to use the existing broken functionality can just wrap their narrow strings with the object mentioned above and have that value bound as a blob... the perfect use of a release note.
I'd also argue that this ought to be fixed in 8.0 as well... the behavior is so bad that I honestly can't imagine anyone has used it successfully.
Bill
Just following up on this, after a little more research:
This has come up in the past, ref a conversation with the subject “SQLite and pike strings in bindings”. It seems that the SQLite binding is unique in this behavior and works as expected when you a) only use the pike module to access a database and b) always use bindings.
The binding emulation code which is used by mysql (at least) doesn’t work his way. Grubba pointed out that the Oracle binding code considers a string set within a multiset as binary data. I’m not sure how I feel about that… it’s probably lower overhead than something like Sql.Binary(“somedata”) but it’s certainly not intuitive (though it doesn’t seem like the binding behavior is really documented anywhere so I could be convinced that writing documentation might be an ok solution.
Does anyone have any objections to me removing the problematic code in the Sqlite module in 8.1? In 8.0?
I will let this simmer for a week or so before I do anything.
Bill
On May 24, 2017, at 4:09 AM, Stephen R. van den Berg srb@cuci.nl wrote:
H. William Welliver III wrote:
My suggestion is to change the binding type for 8 bit string parameters to text in order to match the rest of the string handling. It would be good to have a datatype that caused a parameter to be bound as a blob, either a native byte array or some object that wraps an 8 bit string (Sql.BinaryString?) in order to cause the binding to act accordingly.
The "datatype" that comes closest to being a suitable native byte array these days is Stdio.Buffer.
Of course, this causes a compatibility problem for anyone who was using pike binding with bindings previously (is there anyone who us
I presume you mean SQLite with typed queries here?
doing this and hasn???t run into this problem?), as any 8-bit strings that weren???t binary data would be stored as blobs, even if they were in a text typed field. These records would need to be re-stored with the proper text type, which could be done with a query to update the table. My sense is that this is the proper thing to do, as blob fields should be reserved for data that???s actually binary data (as opposed to text).
Two things:
- The contortions you describe to get the queries right with the current
behaviour would indicate that anyone who would have tried to do the same would likely have ended up complaining here while trying to get it right.
- In general I've noticed that very few, if any, people are actually using
*typed* queries from Sql.Sql.
So that would suggest that your change would be beneficial, and would (for safety) require a bit of compat code to get it right for anyone unfortunate enough to rely on older behaviour. -- Stephen.
pike-devel@lists.lysator.liu.se