Change arc4random_uniform() to calculate ``2**32 % upper_bound'' as
``-upper_bound % upper_bound''. Simplifies the code and makes it the
same on both ILP32 and LP64 architectures, and also slightly faster on
LP64 architectures by using a 32-bit remainder instead of a 64-bit
remainder.
Pointed out by Jorden Verwer on tech@
ok deraadt; no objections from djm or otto