I observed that the Mysql module doesnt seem to reset the connection charset after a reconnect. This occurs when using something like set_charset("unicode") which triggers a "SET NAMES UTF-8". However, in case the connection is lost and the sql object reconnects automatically, it does not send the SET NAMES again. The data then comes back in latin1 and the Sql modules tries to decode them in fetch_rows(). I am using a recent pike 7.8.365, etc.
Arne
Thanks for noticing that. Afaics it only happens if the mysql lib is too old to support mysql_set_character_set, which means older than 5.0.7. Does that sound correct?
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Thanks for noticing that. Afaics it only happens if the mysql lib is too old to support mysql_set_character_set, which means older than 5.0.7. Does that sound correct?
I have 5.0.70 installed right now. Will update to 5.0.84, rebuild the pike and see what happens.
Hmm, ok. Could you please try this patch without upgrading instead? That way I can verify that it fixes the problem.
Index: src/modules/Mysql/mysql.c =================================================================== RCS file: /pike/data/cvsroot/Pike/7.8/src/modules/Mysql/mysql.c,v retrieving revision 1.119 diff -u -r1.119 mysql.c --- src/modules/Mysql/mysql.c 5 Nov 2009 14:15:15 -0000 1.119 +++ src/modules/Mysql/mysql.c 14 Nov 2009 18:26:17 -0000 @@ -157,6 +157,9 @@ #define MYSQL_DISALLOW() #endif /* _REENTRANT */
+#define PIKE_MYSQL_FLAG_STORE_RESULT 1 +#define PIKE_MYSQL_FLAG_TYPED_RESULT 2 + #define CHECK_8BIT_NONBINARY_STRING(FUNC, ARG) do { \ if (sp[ARG-1-args].type != T_STRING || \ sp[ARG-1-args].u.string->size_shift || \ @@ -357,6 +360,8 @@ #endif /* HAVE_MYSQL_OPTIONS */ }
+static void low_query(INT32 args, char *name, int flags); + static void pike_mysql_reconnect (int reconnect) { MYSQL *mysql = PIKE_MYSQL->mysql; @@ -488,6 +493,17 @@ } } } + +#ifndef HAVE_MYSQL_SET_CHARACTER_SET + if (PIKE_MYSQL->conn_charset) { + push_constant_text ("SET NAMES '"); + ref_push_string (PIKE_MYSQL->conn_charset); + push_constant_text ("'"); + f_add (3); + low_query (1, "reconnect", PIKE_MYSQL_FLAG_STORE_RESULT); + pop_stack(); + } +#endif }
/* @@ -861,9 +877,6 @@ pop_n_elems(args); }
-#define PIKE_MYSQL_FLAG_STORE_RESULT 1 -#define PIKE_MYSQL_FLAG_TYPED_RESULT 2 - static void low_query(INT32 args, char *name, int flags) { MYSQL *mysql = PIKE_MYSQL->mysql;
Will do.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Hmm, ok. Could you please try this patch without upgrading instead? That way I can verify that it fixes the problem.
Index: src/modules/Mysql/mysql.c
RCS file: /pike/data/cvsroot/Pike/7.8/src/modules/Mysql/mysql.c,v retrieving revision 1.119 diff -u -r1.119 mysql.c --- src/modules/Mysql/mysql.c 5 Nov 2009 14:15:15 -0000 1.119 +++ src/modules/Mysql/mysql.c 14 Nov 2009 18:26:17 -0000 @@ -157,6 +157,9 @@ #define MYSQL_DISALLOW() #endif /* _REENTRANT */
+#define PIKE_MYSQL_FLAG_STORE_RESULT 1 +#define PIKE_MYSQL_FLAG_TYPED_RESULT 2
#define CHECK_8BIT_NONBINARY_STRING(FUNC, ARG) do { \ if (sp[ARG-1-args].type != T_STRING || \ sp[ARG-1-args].u.string->size_shift || \ @@ -357,6 +360,8 @@ #endif /* HAVE_MYSQL_OPTIONS */ }
+static void low_query(INT32 args, char *name, int flags);
static void pike_mysql_reconnect (int reconnect) { MYSQL *mysql = PIKE_MYSQL->mysql; @@ -488,6 +493,17 @@ } } }
+#ifndef HAVE_MYSQL_SET_CHARACTER_SET
- if (PIKE_MYSQL->conn_charset) {
- push_constant_text ("SET NAMES '");
- ref_push_string (PIKE_MYSQL->conn_charset);
- push_constant_text ("'");
- f_add (3);
- low_query (1, "reconnect", PIKE_MYSQL_FLAG_STORE_RESULT);
- pop_stack();
- }
+#endif }
/* @@ -861,9 +877,6 @@ pop_n_elems(args); }
-#define PIKE_MYSQL_FLAG_STORE_RESULT 1 -#define PIKE_MYSQL_FLAG_TYPED_RESULT 2
static void low_query(INT32 args, char *name, int flags) { MYSQL *mysql = PIKE_MYSQL->mysql;
The patch doesnt change anything for me right now. the "old" mysql i was running was 5.0.70, not 5.0.7 ;-). So I do have mysql_set_character_set and pike is using it. The library sends an SET NAMES utf8 when using set_charset("unicode"). After the reconnect it sends a SET character_set_client=latin1.
I think the reason for this is, that set_charset() does not change PIKE_MYSQL->conn_charset accordingly. I guess its different when I create the sql object with the appropriate options without changing the charset later on.
arne
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Hmm, ok. Could you please try this patch without upgrading instead? That way I can verify that it fixes the problem.
Index: src/modules/Mysql/mysql.c
RCS file: /pike/data/cvsroot/Pike/7.8/src/modules/Mysql/mysql.c,v retrieving revision 1.119 diff -u -r1.119 mysql.c --- src/modules/Mysql/mysql.c 5 Nov 2009 14:15:15 -0000 1.119 +++ src/modules/Mysql/mysql.c 14 Nov 2009 18:26:17 -0000 @@ -157,6 +157,9 @@ #define MYSQL_DISALLOW() #endif /* _REENTRANT */
+#define PIKE_MYSQL_FLAG_STORE_RESULT 1 +#define PIKE_MYSQL_FLAG_TYPED_RESULT 2
#define CHECK_8BIT_NONBINARY_STRING(FUNC, ARG) do { \ if (sp[ARG-1-args].type != T_STRING || \ sp[ARG-1-args].u.string->size_shift || \ @@ -357,6 +360,8 @@ #endif /* HAVE_MYSQL_OPTIONS */ }
+static void low_query(INT32 args, char *name, int flags);
static void pike_mysql_reconnect (int reconnect) { MYSQL *mysql = PIKE_MYSQL->mysql; @@ -488,6 +493,17 @@ } } }
+#ifndef HAVE_MYSQL_SET_CHARACTER_SET
- if (PIKE_MYSQL->conn_charset) {
- push_constant_text ("SET NAMES '");
- ref_push_string (PIKE_MYSQL->conn_charset);
- push_constant_text ("'");
- f_add (3);
- low_query (1, "reconnect", PIKE_MYSQL_FLAG_STORE_RESULT);
- pop_stack();
- }
+#endif }
/* @@ -861,9 +877,6 @@ pop_n_elems(args); }
-#define PIKE_MYSQL_FLAG_STORE_RESULT 1 -#define PIKE_MYSQL_FLAG_TYPED_RESULT 2
static void low_query(INT32 args, char *name, int flags) { MYSQL *mysql = PIKE_MYSQL->mysql;
Hm... confusion. the PIKE_MYSQL->conn_charset is used only in case mysql_set_charset is not available. So maybe instead add it to the mysql options as if it was given to create? Just speculating here, dont really know the code. Tell me if there is anything I can do to help you debug.
arne
Arne Goedeke wrote:
The patch doesnt change anything for me right now. the "old" mysql i was running was 5.0.70, not 5.0.7 ;-). So I do have mysql_set_character_set and pike is using it. The library sends an SET NAMES utf8 when using set_charset("unicode"). After the reconnect it sends a SET character_set_client=latin1.
I think the reason for this is, that set_charset() does not change PIKE_MYSQL->conn_charset accordingly. I guess its different when I create the sql object with the appropriate options without changing the charset later on.
arne
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Hmm, ok. Could you please try this patch without upgrading instead? That way I can verify that it fixes the problem.
Index: src/modules/Mysql/mysql.c
RCS file: /pike/data/cvsroot/Pike/7.8/src/modules/Mysql/mysql.c,v retrieving revision 1.119 diff -u -r1.119 mysql.c --- src/modules/Mysql/mysql.c 5 Nov 2009 14:15:15 -0000 1.119 +++ src/modules/Mysql/mysql.c 14 Nov 2009 18:26:17 -0000 @@ -157,6 +157,9 @@ #define MYSQL_DISALLOW() #endif /* _REENTRANT */
+#define PIKE_MYSQL_FLAG_STORE_RESULT 1 +#define PIKE_MYSQL_FLAG_TYPED_RESULT 2
#define CHECK_8BIT_NONBINARY_STRING(FUNC, ARG) do { \ if (sp[ARG-1-args].type != T_STRING || \ sp[ARG-1-args].u.string->size_shift || \ @@ -357,6 +360,8 @@ #endif /* HAVE_MYSQL_OPTIONS */ }
+static void low_query(INT32 args, char *name, int flags);
static void pike_mysql_reconnect (int reconnect) { MYSQL *mysql = PIKE_MYSQL->mysql; @@ -488,6 +493,17 @@ } } }
+#ifndef HAVE_MYSQL_SET_CHARACTER_SET
- if (PIKE_MYSQL->conn_charset) {
- push_constant_text ("SET NAMES '");
- ref_push_string (PIKE_MYSQL->conn_charset);
- push_constant_text ("'");
- f_add (3);
- low_query (1, "reconnect", PIKE_MYSQL_FLAG_STORE_RESULT);
- pop_stack();
- }
+#endif }
/* @@ -861,9 +877,6 @@ pop_n_elems(args); }
-#define PIKE_MYSQL_FLAG_STORE_RESULT 1 -#define PIKE_MYSQL_FLAG_TYPED_RESULT 2
static void low_query(INT32 args, char *name, int flags) { MYSQL *mysql = PIKE_MYSQL->mysql;
Aha, I somehow thought the setting would survive the reconnect if mysql_set_character_set is used. Of course it doesn't. Then the approach is a bit different. Please try this patch instead.
Martin Stjernholm wrote:
Aha, I somehow thought the setting would survive the reconnect if mysql_set_character_set is used. Of course it doesn't. Then the approach is a bit different. Please try this patch instead.
Great, seems to work. For me the charset is going into the connect options (i.e. RECONNECT_CHARSET_IS_SET), i guess, so i am not seeing any SET NAMES magic after reconnect.
There seems to be another problem with mysql reconnects and charsets. I was not able to track is down completely, but it happens in the following scenario. We are running the Sql.Sql object connected to mysql with charset "unicode". After a reconnect when an update is sent that fits into latin1 but contains some characters around \345, the server seems to discard all data after the first of these chars. To me it seems like the server is actually expecting utf8 and encounters a malformed sequence. Even though pike is sending 'SET character_set_client = latin1'. This can be checked by using something like "\345\202\254" which is ok for _can_send_as_latin1 and is a valid utf8 sequence at the same time (becomes \u586c which looks chinese).
We are currently fixing this by disabling the fallback to latin1 in lib/modules/Sql.pmod/mysql.pike:627.
Any hints appreciated.
best
arne
There seems to be another problem with mysql reconnects and charsets.
I was wondering a bit about these reconnects, when do they occur in practice? Is it when the mysql server restarts?
/.../ To me it seems like the server is actually expecting utf8 and encounters a malformed sequence. Even though pike is sending 'SET character_set_client = latin1'. /.../
And that bug only starts to occur after a reconnect?
To me it certainly looks like a mysql bug. What mysql versions have you seen it with? Do the mysql lists and bug database say anything about it?
pike-devel@lists.lysator.liu.se