I was going to say that reverse() probably did 32-bit or 64-bit memory operations but that wasn't true either. Anyway, plenty of vectorization possibilities in reverse(), search() and string hashing in general.
By the way, what CPU did you test on? On older hardware a program could benefit immensely by having cache hints inserted in loops like this, but I believe they identify streaming access and preload automatically nowadays.