A good hash function is of course desireable, and a prime size is surely unnecessary then. But it's nice if the implementation behaves reasonably even for lousy hash functions, especially if the table is used for the ordinary mappings where it can be user supplied.
Here's a trick someone mentioned in the comments to calculate the modulo a bit quicker:
The thing is ... with a hash table, the table stays the same size for quite a whle, so what you're taking the modulus with respect to stays the same for quite a while, so you can computer an integer reciprocal of the size when you resize the hash table, and then use the same reciprocal for a long time. So for a hash code of H and a table size of N (with reciprocal R = 2^32/N) you're looking at:
bucket = H - ((R*H)>>32)*N bucket -= N if bucket >= N
OK, this is about ten times the cost of a simple AND, assuming you can grab just the upper half of a multiply, and cheap conditional execution (both of which I have on ARM). /.../
But I'm not sure. It could be a non-issue in practice.