The researchers started with the GSM8K's standardized set of 8,000 grade-school level mathematics word problems, a common benchmark for testing LLMs. Then they slightly altered the wording without ...
Some results have been hidden because they may be inaccessible to you