Within the topic of AI alignment, there are a million tinier, but consequential, alignment choices. This paper looks at the willingness of AI to engage in scientific misconduct (p-hacking). The most recent AIs resist instructions to p-hack, but the guardrails can be breached.
"The models we test behave as competent, if conservative, analysts: they converge on textbook-default specifications and, when pressured for significance, identify the request as misconduct and refuse. Yet these protections are not absolute." Paper:
403