the "experts" in blockchain BFT consensus will tell you that testing BFT algorithms is very challenging. they are wrong you just need to have buggy software that acts byzantine when it, say for example, equivocates a yes and a no vote at the same time.