NVIDIA just dropped a banger paper on how they compressed a model from 16-bit to 4-bit and were able to maintain 99.4% accuracy, which is basically lossless. This is a must read. Link below.