While learning high-level mathematics is no easy feat, teaching math concepts can often be just as tricky. That may be why many teachers are turning to ChatGPT for help. According to a recent Forbes article, 51 percent of teachers surveyed stated that they had used ChatGPT to help teach, with 10 percent using it daily. ChatGPT can help relay technical information in more basic terms, but it may not always provide the correct solution, especially for upper-level math.
An international team of researchers tested what the software could manage by providing the generative AI program with challenging graduate-level mathematics questions. While ChatGPT failed on a significant number of them, its correct answers suggested that it could be useful for math researchers and teachers as a type of specialized search engine.
Portraying ChatGPT’s math muscles
The media tends to portray ChatGPT’s mathematical intelligence as either brilliant or incompetent. “Only the extremes have been emphasized,” explained Frieder Simon, a University of Oxford PhD candidate and the study’s lead author. For example, ChatGPT aced Psychology Today’s Verbal-Linguistic Intelligence IQ Test, scoring 147 points, but failed miserably on Accounting Today’s CPA exam. “There’s a middle [road] for some use cases; ChatGPT is performing pretty well [for some students and educators], but for others, not so much,” Simon elaborated.
At the testing level of high school and undergraduate math classes, ChatGPT performs well, ranking in the 89th percentile for the SAT math test. It even received a B on technology expert Scott Aaronson’s quantum computing final exam.
But different tests may be needed to reveal the limits of ChatGPT’s capabilities. “One thing media have focused on is ChatGPT’s ability to pass various popular standardized tests,” stated Leah Henrickson, a professor of digital media at the University of Leeds. “These are tests that students spend literally years preparing for. We’re often led to believe that these tests evaluate our intelligence, but more often than not, they evaluate our ability to recall facts. ChatGPT can pass these tests because it can recall facts that it has picked up in its training.”
Simon and his research team proposed a unique set of upper-level math questions to assess whether ChatGPT also had test-taking and problem-solving skills. “[Previous studies looked at] if the output has been correct or incorrect,” Simon added. “And we wanted to go beyond this and have implemented a much more fine-grained methodology where we can really assess how ChatGPT fails, if it does fail, and in what way it fails.” To create a more complex testing system, the researchers compiled prompts from several fields into a larger problem set they called GHOSTS.