The Research on LLM Self-Correction
If you’re building with LLMs today, you’ve likely been sold a bill of goods about “reflection.” The narrative is seductive: just have the model check its own work, and watch quality magically improve. It’s the software equivalent of telling a student to “review your exam before turning it in.” The reality, backed by a mounting pile of peer-reviewed evidence, is far uglier. In most production scenarios, adding a self-reflection loop is the most expensive way to achieve precisely nothing—or worse, to degrade your output. The seminal paper that shattered the illusion is Huang et al.’s 2023 work, “Large Language Models Cannot Self-Correct Reasoning Yet.” Their finding was blunt: without external feedback, asking GPT-4 to review and correct its own answers on math and reasoning tasks consistently decreased accuracy. The model changed correct answers to wrong ones more often than it fixed errors. This isn’t an edge case; it’s a fundamental limitation of an autoregressive model critiquing its own autoregressive output with the same data, same biases, and zero new information.
