UCLA researchers have discovered that AI language model GPT-3 exhibits an impressive ability to solve reasoning problems, similar to that of college students, though it’s not without its constraints. The advancements shown in GPT-4 indicate that AI may be on the path to achieving reasoning similar to humans, prompting key inquiries regarding future AI evolution.
The team at UCLA has established that GPT-3 can handle reasoning problems roughly on par with college students.
Humans have the natural ability to resolve unfamiliar problems by associating them with known issues and extrapolating the solution to the new problem, a process called analogical reasoning. This ability has long been considered unique to humans.
Interestingly, GPT-3’s performance matched human participants, and it even made comparable errors, according to Hongjing Lu.
This prompts reconsideration of the belief that humans hold exclusive rights to this cognitive skill.
The study, conducted by psychologists at UCLA, demonstrates that GPT-3’s performance on reasoning problems, typically found on intelligence and standardized tests like the SAT, matches that of college students. The findings will be published today (July 31) in Nature Human Behaviour.
Probing AI’s Cognitive Operations
However, the authors of the paper are questioning whether GPT-3 is mirroring human reasoning due to its enormous language training dataset, or if it’s demonstrating an entirely new cognitive process.
Without the ability to analyze GPT-3’s internal operations, which are protected by its creator, OpenAI, the UCLA researchers are uncertain about how the AI’s reasoning skills function. They also note that despite performing better than anticipated on some tasks, GPT-3 spectacularly fails at others.
Significant Constraints of AI in Reasoning Activities
Despite the impressive results, the system has significant limitations, says Taylor Webb, a UCLA postdoctoral researcher in psychology and lead author of the study. It’s capable of analogical reasoning, but it struggles with tasks that humans find simple, like using tools to solve a physical task. The AI often proposed nonsensical solutions when presented with such challenges.
In their experiment, the researchers tested GPT-3’s abilities using problems based on the Raven’s Progressive Matrices test, converting the images into text for the AI to process. They asked 40 UCLA undergraduate students to solve the same problems.
Unexpected Outcomes and Future Consequences
UCLA psychology professor Hongjing Lu, the senior author of the study, stated that not only did GPT-3 perform comparably to humans, but it also made similar errors. GPT-3’s success rate was 80% – higher than the human average of below 60%, but still within the range of the highest human scores.
The team also asked GPT-3 to solve SAT analogy questions, which it presumably had never encountered before. When comparing the AI’s performance to the average SAT scores of college applicants, GPT-3 outperformed them.
Progressing from GPT-3 to GPT-4
When asked to identify analogous stories based on short readings, the AI performed worse than the students, however, the latest version, GPT-4, outperformed GPT-3.
The UCLA team, who have been comparing the abilities of their cognition-inspired model with commercial AI, were surprised by the performance of GPT-3, according to psychology professor Keith Holyoak, a co-author of the study.
However, GPT-3 has yet to solve problems that necessitate understanding physical space, proposing peculiar solutions when provided with descriptions of tools that could be used to transfer gumballs between bowls.
The UCLA scientists aim to understand if language learning models are truly beginning to “think” like humans or if they’re exhibiting a new capability that just seems like human thought.
Similar to Human Thinking?
According to Holyoak, GPT-3 could be thinking like a human, but this requires further verification since humans didn’t learn by consuming the entirety of the internet. They want to determine if the AI is employing the same methods as humans or if it’s a genuinely new form of AI.
To do this, they would need access to the software and training data, then conduct tests that the software hasn’t been exposed to before. This would help decide the future direction of AI.
Having backend access to GPT models would be highly beneficial for AI and cognitive researchers, adds Webb. They’re currently inputting data and observing the results, which isn’t as conclusive as they’d prefer it to be.
Reference: Nature Human Behaviour, 31 July 2023.
DOI: 10.1038/s41562-023-01659-w
Table of Contents
Frequently Asked Questions (FAQs) about GPT-3 reasoning abilities
How well does the AI model GPT-3 perform in reasoning tasks compared to humans?
The AI model GPT-3, developed by OpenAI, can solve reasoning problems with proficiency comparable to that of college students. It’s able to engage in analogical reasoning – drawing parallels between familiar and unfamiliar problems to solve the latter, an ability previously considered unique to humans.
What types of reasoning problems were used in the study?
The study used problems based on Raven’s Progressive Matrices and SAT analogy questions. The former is a test used to gauge a subject’s ability to predict the next image in a complex arrangement of shapes, while the latter involves selecting pairs of words that share the same relationship.
What were the surprising results of the study?
GPT-3 not only performed at a level similar to humans in solving reasoning problems but also made errors similar to humans. In some instances, GPT-3 even outperformed the human subjects, scoring 80% compared to the human average of below 60%.
What are the limitations of GPT-3 in reasoning tasks?
Despite its impressive reasoning capabilities, GPT-3 has significant limitations. For example, it can’t perform tasks that humans find easy, like using tools to solve physical tasks. When confronted with such problems, GPT-3 often proposed nonsensical solutions.
How does GPT-4 compare to GPT-3 in terms of reasoning abilities?
The latest iteration of the AI model, GPT-4, has shown to perform better than GPT-3 in certain tasks. In the study, when asked to solve analogies based on short stories, GPT-4 outperformed GPT-3, hinting at continued improvements in AI reasoning abilities.
Does GPT-3 actually “think” like humans?
The UCLA researchers posit that while GPT-3 shows signs of thinking similarly to humans, it’s uncertain whether this is due to its extensive language training dataset or it signifies a fundamentally new cognitive process. To fully understand this, researchers would need access to the AI’s software and training data.
5 comments
this AI stuff is just mind blowing, it’s like we’re living in a scifi movie. Can’t wait to see what the future holds!
Fascinating study. But, I’d love to see how this tech evolves. GPT-4 is already showing promise, but can it get even better? What about GPT-5, 6, 7…?
This really gets into the heart of what AI is all about. It’s not just about mimicking human behavior, it’s about creating new kinds of cognitive processes. That’s the really exciting part.
honestly this is a little scary. We’re creating machines that can think like us, what if they start to outsmart us?!
These tests seem so simple but the implications are enormous! Looking forward to seeing where this research leads.