grok UTF8 directly
The input is not necessarily UTF8, and the output is definitiely not. So your proposal is to make two conversions instead of one. Not necessarily a problem, but it seems a bit convoluted, especially since the results of parsing would need to be converted individually (i.e. each string literal, each symbol etc).