Keep in mind that the glue code can also be buggy. Correctness of GMP's C API does not magically imply correctness of the corresponding Pike functionality.
To me, it makes a lot of sense that the testsuite makes as strict correctness checks on the returned data as is practical (say, less than 20 lines of code and less than 100 ms execution time on a reasonable development machine).
Whenever a problems is detected, it should be fairly easy to determine if the bug is in the testsuite, in the glue code, or in GMP itself.