They are not quite as large as you make them seem since you multiply by String.width() instead of String.width()/8. :-)
There seems to be a single 32-bit string with about 6000 characters in it. Maybe it comes from the Unicode module?
I'm not sure what you mean by "getting them to not be wide strings"; that would imply removing all non-latin1 characters from the strings, which would of course alter the semantics...