 
            In fact, why is there a lock in on_success and on_failure at all?
To make sure that callbacks are called exactly once per failure/success.
Can you explain that better? The only issue I can see is 1) it failes the state check 2) a different thread changes the state and consumes the array 3) cb is added to the array and never executed. Now, this can't happen as there is no function call between the state variable check and the array append. Even if it could happen, a mutex in on_success/on_failure wouldn't help, as it is access to the state variable that needs to be guarded.