AI Can Solve Math Course Problems at 81% Automatic Accuracy

MTI showed that a neural network pre-trained on the text and fine-tuned on code can answer questions from a mathematics course, explain solutions, and come up with new ones at a human level. Using few-shot learning and OpenAI's Codex converter, they automatically created programs and ran them to solve course problems with an automatic accuracy of 81%.

In addition to solving problems that are difficult to solve with ordinary machine learning models, the research also shows that the technology can be scaled up to solve curriculum and similar curriculum problems. It is also the first time that a single machine learning model has been able to solve a mathematical problem on such a large scale, while also explaining, mapping, and even generating new problems.

Milestones in Higher Education

Until this paper, most researchers believed that neural networks were incapable of solving high-number problems and could only solve simple math problems. Even though Transformer models outperform human performance in a wide variety of NLP tasks, they are still good at solving mathematical problems, mainly because large models such as GPT-3 are pre-trained only on text data.

Researchers have since found that language models can be chain-of-thoughts to answer simple math questions, but higher-level math problems aren't so easy.

When a target is a high number of problems, a wave of training data must be collected first. The MATH dataset, the current gold standard of advanced mathematics problems intended to test mathematical reasoning, contains questions in Prealgebra, Algebra, Counting, and Probability, Intermediate Algebra, Number Theory, and Precalculus. The neural network responds to these questions.
the next step is to use Codex to generate new questions for each course.

The researchers created a numbered list of questions written by students in each class, which was cut off after a random number of questions, and the results were used to prompt Codex to generate the next question.

This process is repeated until enough new questions are created for each course.

To assess the generated questions, the researchers surveyed MIT students who had taken these courses or their equivalents to compare the machine-generated questions with the differences in quality and difficulty of the original courses.


It can be seen from the results of the student survey: 

  • The machine's grading has come to rival the quality of human questions;
  • In terms of difficulty, human problems are more suitable as course topics, while machine-generated results are slightly more difficult;
  • More than half of the course topics can be models as model generated by students, with the closest being 18.01

If you need more details, you can check the paper from MTI.

Leave a Reply

Your email address will not be published.