On 6/04/20 12:57 pm, Stephen R. van den Berg wrote:
On Sun, Apr 5, 2020 at 7:57 PM Niels Möller wrote:
I'm not entirely sure halving the table size is a good tradeoff. If we want to do it, that should be a separete change.
You mean because it will cost an extra clockcycle on decode per character? That will only be relevant for very large sequences of base64 characters.
How are you defining "large" here?
Base64 data inputs can be any size. Megabytes are not uncommon in network messaging.
And the sequences are that large, the speed to decode becomes RAM-access bound; in which case there will be plenty of spare CPU cycles. This essentially makes the extra "if" cost-free.
This assumes that the CPU is waiting for data to load. Which is typically not the case in modern CPUs. Pre-fetching is typically used when processing sequential data such as long strings.
For this type of situation we should expect the L1/L2 cache lines to be loaded by the time they are needed. Delays will most likely be from interruptions to the decode thread, or less likely; code or table size causing swapouts from the L1/L2 caches.
AYJ