📢 Gate Square Exclusive: #PUBLIC Creative Contest# Is Now Live!
Join Gate Launchpool Round 297 — PublicAI (PUBLIC) and share your post on Gate Square for a chance to win from a 4,000 $PUBLIC prize pool
🎨 Event Period
Aug 18, 2025, 10:00 – Aug 22, 2025, 16:00 (UTC)
📌 How to Participate
Post original content on Gate Square related to PublicAI (PUBLIC) or the ongoing Launchpool event
Content must be at least 100 words (analysis, tutorials, creative graphics, reviews, etc.)
Add hashtag: #PUBLIC Creative Contest#
Include screenshots of your Launchpool participation (e.g., staking record, reward
Defeating Llama 2 and competing against GPT-3.5, Stability AI's new model topped the open source large model rankings
Original Source: Heart of the Machine
In the blink of an eye, the open source big model has improved again. Do Google and OpenAI really have no moat?
"I just took a 30-minute lunch break, and our field has changed again?" After seeing the latest open source large model rankings, an entrepreneur in the AI field asked his soul.
The "rookies" in the red box above are two large models from Stability AI and CarperAI lab: FreeWilly 1 and FreeWilly 2. Just now, they surpassed the Llama-2-70b-hf released by Meta three days ago, and successfully reached the top of HuggingFace's Open LLM leaderboard.
What's more striking is that FreeWilly 2 also beat ChatGPT (GPT-3.5) on many benchmarks, becoming the first open source model that can really compete with GPT-3.5, which is something that Llama 2 did not do.
From the blog published by Stability AI, we can see some details of these two new models:
Data Sources
The training method of the FreeWilly model is directly inspired by the method pioneered by Microsoft in their paper "Orca: Progressive Learning from Complex Explanation Traces of GPT-4". While FreeWilly's data generation process is similar, there are differences in the source of the data.
FreeWilly's dataset contains 600,000 data points (approximately 10% of the dataset size used in the original Orca paper), and it was generated by inspiring language models from the following high-quality instruction dataset created by Enrico Shippole:
Using this approach, the researchers generated 500,000 examples using a simpler LLM model and an additional 100,000 examples using a more complex LLM model. To ensure a fair comparison, they carefully screened these datasets and removed examples derived from the evaluation benchmark. Although the number of training samples is only 1/10 of the original Orca paper (which greatly reduces the cost and carbon footprint of training the model compared to the original paper), the resulting FreeWilly model performs well on various benchmarks, validating the effectiveness of their approach with synthetic datasets.
Performance Data
For internal evaluation of these models, the researchers used EleutherAI's lm--harness benchmark, incorporating AGI.
Among them, the lm--harness benchmark was created by the EleutherAI non-profit artificial intelligence research laboratory, which is behind the aforementioned HuggingFace Open LLM leaderboard.
AGI was created by Microsoft to evaluate the performance of the underlying model on "human-centric" standardized tests, such as math competitions and bar exams.
Both FreeWilly models perform very well on many fronts, including complex reasoning, understanding the subtleties of language, and answering complex questions involving specialized domains such as legal and mathematical questions.
The evaluation results of the two models on the lm--harness benchmark are as follows (these FreeWilly test results were evaluated by Stability AI researchers):
FreeWilly 1:
FreeWilly 2:
Judging from the reactions of all parties, the appearance of the FreeWilly model has brought a little shock to everyone, because they came too fast. After all, Llama 2 has only been launched for 3 days, and the ranking position is not hot. One researcher said that he recently had eye surgery and didn't watch the news for a week, but felt like he had been in a coma for a year. So, this is a "can't blink" period.
Reference link: