PhD-level AI? Musk's XAI launches Grok 4: multi-agent reasoning and prediction market features all at once.

Question

The artificial intelligence company xAI, led by Musk, officially released its latest generation AI model Grok 4 a few hours ago, claiming to be the "smartest AI on Earth." This model combines unprecedented reasoning capabilities, PhD-level academic standards, and integration of multiple tools, breaking records in various benchmark tests. He anticipates that Grok 4 will bring substantial technological inventions or academic significance within a year.

Introducing Grok 4, the world’s most powerful AI model. Watch the livestream now:

— xAI (@xai) July 10, 2025

Two versions: Grok 4 and Grok 4 Heavy

First, Grok 4 is divided into the simple single-agent version (Single-Agent Version) of Grok 4, and the multi-agent version (Multi-Agent Version) of Grok 4 Heavy.

Grok 4: Basic version, handling problems with a single AI agent.

Grok 4 Heavy: Adopts a multi-agent collaboration model (multi-agent collaboration), allowing multiple agents to solve problems individually first, then share solutions and compare results with each other like a "study group", ultimately arriving at the final answer.

Currently, the company has also launched its most expensive subscription plan to date, "SuperGrok Heavy," at $300 per month. Plan subscribers can experience the Grok 4 Heavy service ahead of time, as well as have priority access to features released in the future.

Doctorate Level Intelligence: From a Perfect SAT Score to a Genius in All Fields

Furthermore, xAI claims that Grok 4 possesses academic and logical abilities that surpass human levels, making it one of the models closest to general artificial intelligence (AGI) at this stage. It can achieve near-perfect scores on high-stakes tests such as the SAT and GRE in the United States, and demonstrates knowledge and understanding at a doctoral level across all subjects.

Additionally, Grok 4 has set new highs in multiple benchmark tests, demonstrating unprecedented capability boundaries. Specifically including:

The difficult graduate student problems (GPQA), the American Mathematics Exam (AIME 2025), and the American Mathematics Olympiad (USAMO) are all ranked first among existing AI models in terms of challenging mathematical logic.

In the Vending-Bench automated vending machine business operation simulation test, the asset income successfully doubled, demonstrating a stable and consistent strategy formulation ability.

The ARC Institute for Biomedical Research utilizes Grok 4 to automate its research processes, effectively advancing the progress of experiments.

Other fields such as medical imaging examination, financial strategy formulation, and game development have also seen practical applications.

At the same time, in Humanity’s Last Exam (, HLE), Grok 4 can solve 25.4% of the problems without assistance, while the Grok 4 Heavy version can solve 44.4% of the problems, ranking first among existing AI models.

Training Grok 4 with the Colossus supercomputer greatly enhances computational efficiency.

xAI revealed that the launch of Grok 4 is backed by a dual leap in hardware and training strategies: "The training volume of Grok 4 is 100 times that of Grok 2."

With our Colossus supercomputer featuring approximately 200,000 H100 GPUs, from pre-training to reinforcement learning (RLHF), Grok 4 enhances the model's focus and accuracy on reasoning tasks.

The team emphasized that as the difficulty of human-written exam questions has become "ineffective for training" Grok 4, the real world will become the ultimate testing ground, such as whether it can truly create useful inventions or technologies, in order to determine if they are practically effective.

Tool Integration and Real-World Interaction: Grok 4 Approaching Operational AI

At the same time, Grok 4 will not only think but also learn how to solve real-world problems hands-on. xAI indicates that, unlike other models, Grok 4 incorporates tool usage capabilities into the training process to enhance practical and adaptive skills:

In the coming months, Grok 4 will integrate with the engineering analysis tools used by Tesla and SpaceX, entering a more sophisticated engineering environment. We also plan to provide powerful enterprise-grade tools and highly accurate physical simulators to major companies by the end of this year.

The team added, "The current goal is to enable Grok to manipulate the humanoid robot Optimus and validate its logic and creativity in the physical world."

( Jensen Huang: Huawei's chips have caught up with NVIDIA H20, and Musk's Optimus robot has opened up trillion-dollar opportunities )

Beyond human reasoning capabilities: Can Grok 4 create new inventions?

Next is the reasoning capability that xAI takes the most pride in. Grok 4 not only extracts knowledge from the training data but also possesses logical thinking abilities developed through reinforcement training, allowing it to independently construct problem-solving methods in unknown situations and conduct multi-agent collective reasoning verification. Ultimately, it derives its own conclusions just like human scientists:

Grok 4 is designed as a thinking approach based on "first principles," capable of independently identifying problems, constructing logic, and completing complex deductions, a reasoning domain that previous AIs have struggled to reach.

xAI expects that Grok 4 will invent truly practical new technology as early as this year and at the latest by next year, and may discover currently unknown scientific principles within the next two years.

( Interview with Musk: AI superintelligence will explode, entrepreneurs should pursue a "useful" rather than "great" life )

From market predictions to game creation: Grok 4 application layer expands again

Finally, xAI also demonstrated the practical application potential of Grok 4 across multiple fields such as voice interaction and financial business. Taking event trend analysis as an example, Grok 4 Heavy is able to view the prediction market Polymarket, utilizing statistical calculations and reasoning abilities, to predict the Dodgers' chances of winning the World Series at 21.6% in just a few minutes, showcasing real-time computational power that surpasses traditional quantitative analysis tools.

Grok 4 estimates the Dodgers' winning probability in the MLB World Series through Polymarket data.

(X announces a partnership with Polymarket as the official prediction market partner )

Grok's future vision is also impressive. xAI stated that future versions will incorporate video understanding and game interaction capabilities, able to play games and assess what is called "fun," and even integrate game engines to create interactive and artistic content autonomously. This includes TV shows, films, and video games.

In terms of voice, Grok 4 has also seen significant upgrades. The new model introduces various voice styles and accents, making conversations more natural and fluid. During the launch event, there was a deliberate comparison with GPT, highlighting that Grok 4 not only does not interrupt users but also greatly reduces the delay in thinking and responding, becoming a major highlight of its interface.

Grok 4 is not just a tool, but a driver of human civilization.

The birth of Grok 4 not only represents AI entering a deeper stage of thinking and application, but according to Musk, it is also expected to trigger an intelligent revolution across education, science, business, and the creative industry, in which Grok will truly participate rather than merely serve as a language model or auxiliary tool.

The future vision of the xAI development team is grand and radical. They emphasize: "AI is no longer just thinking for us today, but is creating the world together with us."

This article Doctor-level AI? Musk's xAI launches Grok 4: multi-agent reasoning, market prediction features all at once, first appeared in Chain News ABMedia.