As it turns out from the testsuite, there's one problem with this optimization: When the duplicate-tracking mapping is allocated, it's not propagated upward so that the same instance in another part of the tree is detected.
I can see two ways to fix it: Either by making a double indirection for the mapping pointer, or by going back to allocating the mapping at the top. The latter isn't as silly as it first appears, since only a single fixed-size mapping struct is allocated as long as no inserts are done during the process. It might beat the overhead of the extra indirection and the additional checks required.