I wonder if doing softmax in base 2 instead of base e could be a useful optimization on some fixed point hardware. Prepend a 1 to the fractional part and shift left by the integral part as a close-ish approximation of 2^fixed_point.