Good! I thought it would be higher, though. Maybe I'll Shark it myself to see where the bottleneck is.
It should be noted that my suggestion does not save memory because there is already a pre-flight phase where the code estimates the output string size and implements an early exit for 7-bit content. However, the size calculation reads every byte in the string data and probably kicks out other data from the L1/L2 caches so the flag may still be worthwhile.