Large models can summarize what has happened in China's five thousand years of history, but they cannot tell the current time; they can explain what quantum mechanics is, but they struggle to create a professional-level PPT that is both text-rich and visually appealing.
Why do large models seem omnipotent, yet in practice, they always fall a bit short?
The reason is simple: being smart and knowledgeable does not equate to being able to get the job done.
Smartness requires large models to be trained and learned through vast amounts of knowledge, developing a sophisticated brain that can effectively answer a question.
At the same time, to satisfy both the elements of being smart and capable, it is necessary to equip this smart brain with agile limbs to achieve "deep thinking + deep delivery."
It is also for this reason that how to promote the evolution of large models from smart thinking to being "smart and capable" has become the key factor in determining whether this wave of large model hype is a fleeting phenomenon or a historical turning point.
Baidu provided a sample.
On April 25, at the Create 2025 Baidu AI Developer Conference, Baidu founder Robin Li introduced the world's first operating system in the content field, launched in collaboration with Baidu Wenku and Baidu Wangpan — Cangzhou OS.
By fully integrating the underlying technology, capabilities, and data accumulated from Baidu Wenku and cloud storage in the past, it can flow like water, seamlessly adapting to different scenarios, achieving low thresholds and high-quality end-to-end delivery in the most reasonable form and with the most convenient user interface.
Relying on the Cangzhou OS, Baidu Wenku and Baidu Wangpan's vision and expectations for AI is to achieve a truly one-stop, end-to-end delivery at any time, anywhere, and on any terminal device, allowing AI to be "omnipotent and ubiquitous."
01
Cangzhou OS, enabling AI to evolve to the operating system level.
There is a consensus in the technology industry that any technology needs to go through a long journey of the Gartner curve from the laboratory to truly reach thousands of households.
In this curve, the growth in the first phase mainly depends on the market frenzy expectations brought about by technological advancements. However, as the practical effects of the technology do not meet expectations, the growth in this phase will quickly enter a decline until the conditions required for the technology to be implemented gradually mature and are materialized into an almost zero-threshold, all-powerful, and ubiquitous infrastructure, leading to the ecological explosion in the second phase.
One of the hallmarks of the second phase of the software industry is usually the emergence of a mature operating system, such as Windows for the computer industry and iOS for the mobile industry.
So how do we define a mature operating system? About 15 years ago, there was a debate in the global technology industry: with the same capabilities of touchscreen operation, large screen phones, the ability to make calls, take photos, listen to music, and send text messages, why are Apple or smartphones considered two different species compared to past feature phones?
One of the core reasons is that iOS inherits the kernel-level stability and multitasking capability from MAC OS, and transforms it into an open ecosystem where developers can freely integrate these underlying capabilities from Apple to create their own innovative applications. This has shifted the definition of a mobile phone from being a matter for a couple of giants like Motorola and Nokia to a vast industry with infinite possibilities involving the entire ecosystem, thus opening the door to more than a decade of mobile internet.
Technology will roll forward, but the plot of business stories always echoes with a similar rhythm, and the underlying logic validated in mobile OS remains applicable in the construction of OS in the era of large models.
In summary, there are three: complete underlying capabilities, flexible central scheduling, and a prosperous application service ecosystem. This corresponds exactly to the three-tier architecture of Cangzhou OS: foundational infrastructure, central system, and application services. The only difference is that the bridge between the applications, the central system, and the foundation has changed from the previous API to a more standardized and accessible MCP.
Among them, the infrastructure of the MCP Server section has its core component as Chatfile plus, which mainly serves to decompose and analyze content of different modalities, forms, and formats at the element level through a knowledge-based framework, as well as a series of tool framework components for multimodal understanding, multimodal retrieval, and file transcoding analysis.
At the same time, Baidu Wenku and Wangpan have built three major libraries: a public knowledge base, a private knowledge base, and a memory base. The public knowledge base refers to the public knowledge data accumulated by Baidu Wenku over the years, the private knowledge base refers to the knowledge data authorized for use by users of Wangpan, and the memory base refers to the instructions, usage habits, and historical generation records from users' past operations in Wenku or Wangpan.
These data are often presented in different modalities, forms, and formats. Among them, the public knowledge base provides general knowledge, while the private knowledge base and memory bank store personalized user data.
In the knowledge-based framework, Cangzhou OS will vectorize and label the multi-modal content in the "three major libraries", that is, unstructured data such as pictures, text, video, audio, and documents will be converted into multi-dimensional vector data that can be read by computers through different professional models, that is, a set of tokens.
In the central system, Baidu Wenku and Wangpan have developed the "three major tools," which include an editor (for editing documents, PPTs, etc.), a reader (for reading documents and PPTs, etc.), and a player (for audio and video playback).
At the same time, Cangzhou OS can also utilize the "Scheduling Hub" to combine user memory and profile data through interactive components, intent models, and transmission infrastructure, effectively understanding user intent and efficiently allocating scheduling agents.
At the top level, there is a series of AI Agents. "Cangzhou OS" integrates hundreds of AI Agents, including library, cloud storage PPT, AI picture books, AI mind maps, AI posters, AI notes, AI scanning, AI listening notes, and more. The generated modalities cover various types such as images, text, video, and audio, comprehensively addressing scenarios in learning, work, and entertainment. Furthermore, it relies on the editing, modification, and fine-tuning capabilities of the integrated editor to enhance the quality of retrieval and content generation, making it better suited to actual personalized task requirements.
02
On Cangzhou OS,
Create more "smart and capable" Agents
Around the top-level application services, Baidu Wenku & Baidu Wangpan have launched hundreds of AI Agents that have been validated by hundreds of millions of users, while also integrating a large number of third-party professional Agents to expand the application ecosystem.
As a "one-stop AI content acquisition and creation platform", Baidu Wenku has more than 40 million paying users and 97 million monthly active AI users. Baidu Netdisk has also been upgraded to a "one-stop content service platform", serving more than 1 billion users, using more than 100 billion GB of total space, and more than 80 million AI monthly active users. Baidu Library and Baidu Netdisk have become the real "super productivity" in the era of large models.
At the conference, Baidu Wenku and Baidu Wangpan also showcased new capabilities developed based on the "Cangzhou OS": "GenFlow Super Buddy" and "AI Notes".
GenFlow Super Partner is a multi-agent collaboration capability launched by the Baidu Wenku APP. With the support of "Cangzhou OS", content generation can achieve multi-tasking in parallel and can complete various task deliveries based on the most comprehensive professional online information, as well as the user's own habits and preferences.
For example, a user wants to plan a wedding, but the initial input is just a simple sentence: I want to have an outdoor wedding in Hainan on May Day, help me create a planning proposal and invitation.
The requirements seem simple; just fill in the blanks against the historical template and it can be done. However, to satisfy the users, it is necessary to understand their aesthetic preferences, budget expectations, and process preferences. It is also essential to know the weather, crowd flow, and venue distribution in Hainan during the May Day period. After that, these images and knowledge need to be combined using PPT tools to generate a complete plan. Finally, based on the plan and the user's aesthetic preferences, a complete wedding invitation poster needs to be created.
To achieve the above, it is necessary to separately schedule the user's historical chat records, historical browsing records, as well as intent recognition, global search, and PPT tools, analyze user intent, understand user preferences, freely combine tools, and ultimately provide the user with a very specific complete plan that includes processes, dates, venues, budgets, themes, execution details, styles, and personnel arrangements.
At the same time, the planning proposals and posters required by users are complementary to each other, which requires that all information of both be consistent and output in parallel using the same operating system.
Of course, AI cannot generate results that satisfy everyone at once, which means that whether it is a wedding planning proposal or a poster, it needs to have editable capabilities. Supporting this capability is the integrated editor capability of "Cangzhou OS".
It is not difficult to find that from deep thinking to deep delivery, GenFlow Super Partner is almost the only true "multi-agent collaboration" product available on the market. It not only addresses the common issues of high costs, long generation times, low efficiency, unstable delivery, and inability to fine-tune through multi-round dialogue in multi-agent collaboration products, but also directly integrates with mature products and user-authorized private data, allowing AI to truly have the opportunity to achieve the goal of "omnipotent and ubiquitous."
Baidu Wangpan's AI notes are a powerful tool for countless office workers and those preparing for exams.
AI Notes is the industry's first multimodal AI note-taking application that can embed various study videos and note pages stored by users in Baidu Cloud Drive into the same interface, achieving smooth interaction. The video content and notes are strongly interconnected, covering the entire user learning cycle from watching videos, to generating AI notes, to summarizing AI mind maps, and finally to AI-generated questions to test learning outcomes.
For example, the difficulty of the English graduate entrance examination has become a hot topic recently, and users want to focus on reviewing for the exam. The AI notes will first search for relevant materials stored in the user's cloud drive, while also checking publicly available information online for key points, and organize them. However, the entire process does not stop there; the AI notes will also combine past exam questions to perform a final verification of the generated key points. Only verified key points can continue to generate mind maps and exam predictions, helping users accelerate their learning progress.
In this process, the tools involved are no less than those required for wedding planning. For example, finding test centers and past exam papers requires the ability to search the entire internet, while past papers are often presented in PDF or even image formats. Expert explanations are presented in video format, which necessitates the ability to analyze multimodal content. Ultimately, generating the mind map and predicting exam questions require the reasoning ability of large models, the capability to generate multimodal content, and the ability to map and associate different content, while also ensuring absolute accuracy in content generation.
Behind this is the empowerment of "Cangzhou OS".
Of course, Baidu supports developers to fully embrace MCP, so the Cangzhou OS not only serves Baidu's internal ecosystem, but the most important part of the operating system's growth is its openness, which stimulates the innovative capabilities of a wide range of developers.
Therefore, in order to maximize the value of the ecosystem and applications, Baidu Wenku and Baidu Wangpan, based on "Cangzhou OS", have taken the lead in fully utilizing MCP in the connection between products and the ecosystem, constructing a three-layer system of MCP Server-Client-Host. The capabilities of Wenku and Wangpan are opened up in the form of MCP Server, and through the MCP Client SDK, it facilitates more enterprise users, developers, intelligent applications, and other MCP Hosts to connect.
Among them, the most representative case is Samsung phones. Samsung phones are connecting to multiple MCP servers for file uploading, downloading, searching, sharing, and content understanding on Baidu Wenku cloud storage.
On one hand, users can directly achieve functions such as file upload to cloud storage, cloud sharing, document summary, and content Q&A through speaking in the voice assistant interface on their mobile phones.
On the other hand, these servers can also enhance the cloud storage capabilities of Samsung mobile systems, addressing the issues of difficult bulk backup and sharing of large files and multiple files on the phone itself.
For example, a user in the phone's photo album can invoke the voice assistant and say: "Backup the photos taken yesterday at Aosen to Baidu Cloud, and send Xiaoming's photos to him." The relevant photos will be uploaded to the user's authorized cloud account, and a share link will be generated. The phone assistant will then access the contact list and send this link via SMS to the other person's phone. As long as the link is clicked, the user can directly enter Baidu Cloud to view or transfer the photos.
There is no doubt that the reliability of the underlying capabilities of an OS is not determined by the accumulation of tools or the amount of high-tech. The usability, maturity, and richness of the top-level application service ecosystem are the best standards for testing the capabilities of the OS.
03
The story of OS has no end.
In the capital market, the type of enterprise most recognized by investors is called "friends of time."
The so-called friend of time means that when a business does something right, it only needs to continue doing it, and then its performance will maintain a perpetual growth, allowing ecological developers to continuously benefit.
The operating system is a typical perpetual motion market. As long as the market for computers and smartphones still exists, the story of operating systems belonging to Microsoft, Apple, and Google will have no end.
The same applies to large models. When "deep thinking + deep delivery + public and private data + MCP ecosystem" come together, and AI becomes omnipotent and omnipresent in the new era, a continuous emergence of new species similar to the Cambrian explosion will occur.
In this process, looking down, there is the openness of Baidu Wenku, Baidu Wangpan, and others to their own capabilities. By actively embracing the ecosystem, they become the creators of a new species of large models and the formulators of new rules.
Looking up, there are countless new Agents created and seen based on the "Cangzhou OS", forming a magnificent and surging new application service ecosystem.
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
Why does Baidu start with "operating systems" to create an AI that is "omnipotent and ubiquitous"?
Author: Pumping Geek
Large models can summarize what has happened in China's five thousand years of history, but they cannot tell the current time; they can explain what quantum mechanics is, but they struggle to create a professional-level PPT that is both text-rich and visually appealing.
Why do large models seem omnipotent, yet in practice, they always fall a bit short?
The reason is simple: being smart and knowledgeable does not equate to being able to get the job done.
Smartness requires large models to be trained and learned through vast amounts of knowledge, developing a sophisticated brain that can effectively answer a question.
At the same time, to satisfy both the elements of being smart and capable, it is necessary to equip this smart brain with agile limbs to achieve "deep thinking + deep delivery."
It is also for this reason that how to promote the evolution of large models from smart thinking to being "smart and capable" has become the key factor in determining whether this wave of large model hype is a fleeting phenomenon or a historical turning point.
Baidu provided a sample.
On April 25, at the Create 2025 Baidu AI Developer Conference, Baidu founder Robin Li introduced the world's first operating system in the content field, launched in collaboration with Baidu Wenku and Baidu Wangpan — Cangzhou OS.
By fully integrating the underlying technology, capabilities, and data accumulated from Baidu Wenku and cloud storage in the past, it can flow like water, seamlessly adapting to different scenarios, achieving low thresholds and high-quality end-to-end delivery in the most reasonable form and with the most convenient user interface.
Relying on the Cangzhou OS, Baidu Wenku and Baidu Wangpan's vision and expectations for AI is to achieve a truly one-stop, end-to-end delivery at any time, anywhere, and on any terminal device, allowing AI to be "omnipotent and ubiquitous."
01
Cangzhou OS, enabling AI to evolve to the operating system level.
There is a consensus in the technology industry that any technology needs to go through a long journey of the Gartner curve from the laboratory to truly reach thousands of households.
In this curve, the growth in the first phase mainly depends on the market frenzy expectations brought about by technological advancements. However, as the practical effects of the technology do not meet expectations, the growth in this phase will quickly enter a decline until the conditions required for the technology to be implemented gradually mature and are materialized into an almost zero-threshold, all-powerful, and ubiquitous infrastructure, leading to the ecological explosion in the second phase.
One of the hallmarks of the second phase of the software industry is usually the emergence of a mature operating system, such as Windows for the computer industry and iOS for the mobile industry.
So how do we define a mature operating system? About 15 years ago, there was a debate in the global technology industry: with the same capabilities of touchscreen operation, large screen phones, the ability to make calls, take photos, listen to music, and send text messages, why are Apple or smartphones considered two different species compared to past feature phones?
One of the core reasons is that iOS inherits the kernel-level stability and multitasking capability from MAC OS, and transforms it into an open ecosystem where developers can freely integrate these underlying capabilities from Apple to create their own innovative applications. This has shifted the definition of a mobile phone from being a matter for a couple of giants like Motorola and Nokia to a vast industry with infinite possibilities involving the entire ecosystem, thus opening the door to more than a decade of mobile internet.
Technology will roll forward, but the plot of business stories always echoes with a similar rhythm, and the underlying logic validated in mobile OS remains applicable in the construction of OS in the era of large models.
In summary, there are three: complete underlying capabilities, flexible central scheduling, and a prosperous application service ecosystem. This corresponds exactly to the three-tier architecture of Cangzhou OS: foundational infrastructure, central system, and application services. The only difference is that the bridge between the applications, the central system, and the foundation has changed from the previous API to a more standardized and accessible MCP.
Among them, the infrastructure of the MCP Server section has its core component as Chatfile plus, which mainly serves to decompose and analyze content of different modalities, forms, and formats at the element level through a knowledge-based framework, as well as a series of tool framework components for multimodal understanding, multimodal retrieval, and file transcoding analysis.
At the same time, Baidu Wenku and Wangpan have built three major libraries: a public knowledge base, a private knowledge base, and a memory base. The public knowledge base refers to the public knowledge data accumulated by Baidu Wenku over the years, the private knowledge base refers to the knowledge data authorized for use by users of Wangpan, and the memory base refers to the instructions, usage habits, and historical generation records from users' past operations in Wenku or Wangpan.
These data are often presented in different modalities, forms, and formats. Among them, the public knowledge base provides general knowledge, while the private knowledge base and memory bank store personalized user data.
In the knowledge-based framework, Cangzhou OS will vectorize and label the multi-modal content in the "three major libraries", that is, unstructured data such as pictures, text, video, audio, and documents will be converted into multi-dimensional vector data that can be read by computers through different professional models, that is, a set of tokens.
In the central system, Baidu Wenku and Wangpan have developed the "three major tools," which include an editor (for editing documents, PPTs, etc.), a reader (for reading documents and PPTs, etc.), and a player (for audio and video playback).
At the same time, Cangzhou OS can also utilize the "Scheduling Hub" to combine user memory and profile data through interactive components, intent models, and transmission infrastructure, effectively understanding user intent and efficiently allocating scheduling agents.
At the top level, there is a series of AI Agents. "Cangzhou OS" integrates hundreds of AI Agents, including library, cloud storage PPT, AI picture books, AI mind maps, AI posters, AI notes, AI scanning, AI listening notes, and more. The generated modalities cover various types such as images, text, video, and audio, comprehensively addressing scenarios in learning, work, and entertainment. Furthermore, it relies on the editing, modification, and fine-tuning capabilities of the integrated editor to enhance the quality of retrieval and content generation, making it better suited to actual personalized task requirements.
02
On Cangzhou OS,
Create more "smart and capable" Agents
Around the top-level application services, Baidu Wenku & Baidu Wangpan have launched hundreds of AI Agents that have been validated by hundreds of millions of users, while also integrating a large number of third-party professional Agents to expand the application ecosystem.
As a "one-stop AI content acquisition and creation platform", Baidu Wenku has more than 40 million paying users and 97 million monthly active AI users. Baidu Netdisk has also been upgraded to a "one-stop content service platform", serving more than 1 billion users, using more than 100 billion GB of total space, and more than 80 million AI monthly active users. Baidu Library and Baidu Netdisk have become the real "super productivity" in the era of large models.
At the conference, Baidu Wenku and Baidu Wangpan also showcased new capabilities developed based on the "Cangzhou OS": "GenFlow Super Buddy" and "AI Notes".
GenFlow Super Partner is a multi-agent collaboration capability launched by the Baidu Wenku APP. With the support of "Cangzhou OS", content generation can achieve multi-tasking in parallel and can complete various task deliveries based on the most comprehensive professional online information, as well as the user's own habits and preferences.
For example, a user wants to plan a wedding, but the initial input is just a simple sentence: I want to have an outdoor wedding in Hainan on May Day, help me create a planning proposal and invitation.
The requirements seem simple; just fill in the blanks against the historical template and it can be done. However, to satisfy the users, it is necessary to understand their aesthetic preferences, budget expectations, and process preferences. It is also essential to know the weather, crowd flow, and venue distribution in Hainan during the May Day period. After that, these images and knowledge need to be combined using PPT tools to generate a complete plan. Finally, based on the plan and the user's aesthetic preferences, a complete wedding invitation poster needs to be created.
To achieve the above, it is necessary to separately schedule the user's historical chat records, historical browsing records, as well as intent recognition, global search, and PPT tools, analyze user intent, understand user preferences, freely combine tools, and ultimately provide the user with a very specific complete plan that includes processes, dates, venues, budgets, themes, execution details, styles, and personnel arrangements.
At the same time, the planning proposals and posters required by users are complementary to each other, which requires that all information of both be consistent and output in parallel using the same operating system.
Of course, AI cannot generate results that satisfy everyone at once, which means that whether it is a wedding planning proposal or a poster, it needs to have editable capabilities. Supporting this capability is the integrated editor capability of "Cangzhou OS".
It is not difficult to find that from deep thinking to deep delivery, GenFlow Super Partner is almost the only true "multi-agent collaboration" product available on the market. It not only addresses the common issues of high costs, long generation times, low efficiency, unstable delivery, and inability to fine-tune through multi-round dialogue in multi-agent collaboration products, but also directly integrates with mature products and user-authorized private data, allowing AI to truly have the opportunity to achieve the goal of "omnipotent and ubiquitous."
Baidu Wangpan's AI notes are a powerful tool for countless office workers and those preparing for exams.
AI Notes is the industry's first multimodal AI note-taking application that can embed various study videos and note pages stored by users in Baidu Cloud Drive into the same interface, achieving smooth interaction. The video content and notes are strongly interconnected, covering the entire user learning cycle from watching videos, to generating AI notes, to summarizing AI mind maps, and finally to AI-generated questions to test learning outcomes.
For example, the difficulty of the English graduate entrance examination has become a hot topic recently, and users want to focus on reviewing for the exam. The AI notes will first search for relevant materials stored in the user's cloud drive, while also checking publicly available information online for key points, and organize them. However, the entire process does not stop there; the AI notes will also combine past exam questions to perform a final verification of the generated key points. Only verified key points can continue to generate mind maps and exam predictions, helping users accelerate their learning progress.
In this process, the tools involved are no less than those required for wedding planning. For example, finding test centers and past exam papers requires the ability to search the entire internet, while past papers are often presented in PDF or even image formats. Expert explanations are presented in video format, which necessitates the ability to analyze multimodal content. Ultimately, generating the mind map and predicting exam questions require the reasoning ability of large models, the capability to generate multimodal content, and the ability to map and associate different content, while also ensuring absolute accuracy in content generation.
Behind this is the empowerment of "Cangzhou OS".
Of course, Baidu supports developers to fully embrace MCP, so the Cangzhou OS not only serves Baidu's internal ecosystem, but the most important part of the operating system's growth is its openness, which stimulates the innovative capabilities of a wide range of developers.
Therefore, in order to maximize the value of the ecosystem and applications, Baidu Wenku and Baidu Wangpan, based on "Cangzhou OS", have taken the lead in fully utilizing MCP in the connection between products and the ecosystem, constructing a three-layer system of MCP Server-Client-Host. The capabilities of Wenku and Wangpan are opened up in the form of MCP Server, and through the MCP Client SDK, it facilitates more enterprise users, developers, intelligent applications, and other MCP Hosts to connect.
Among them, the most representative case is Samsung phones. Samsung phones are connecting to multiple MCP servers for file uploading, downloading, searching, sharing, and content understanding on Baidu Wenku cloud storage.
On one hand, users can directly achieve functions such as file upload to cloud storage, cloud sharing, document summary, and content Q&A through speaking in the voice assistant interface on their mobile phones.
On the other hand, these servers can also enhance the cloud storage capabilities of Samsung mobile systems, addressing the issues of difficult bulk backup and sharing of large files and multiple files on the phone itself.
For example, a user in the phone's photo album can invoke the voice assistant and say: "Backup the photos taken yesterday at Aosen to Baidu Cloud, and send Xiaoming's photos to him." The relevant photos will be uploaded to the user's authorized cloud account, and a share link will be generated. The phone assistant will then access the contact list and send this link via SMS to the other person's phone. As long as the link is clicked, the user can directly enter Baidu Cloud to view or transfer the photos.
There is no doubt that the reliability of the underlying capabilities of an OS is not determined by the accumulation of tools or the amount of high-tech. The usability, maturity, and richness of the top-level application service ecosystem are the best standards for testing the capabilities of the OS.
03
The story of OS has no end.
In the capital market, the type of enterprise most recognized by investors is called "friends of time."
The so-called friend of time means that when a business does something right, it only needs to continue doing it, and then its performance will maintain a perpetual growth, allowing ecological developers to continuously benefit.
The operating system is a typical perpetual motion market. As long as the market for computers and smartphones still exists, the story of operating systems belonging to Microsoft, Apple, and Google will have no end.
The same applies to large models. When "deep thinking + deep delivery + public and private data + MCP ecosystem" come together, and AI becomes omnipotent and omnipresent in the new era, a continuous emergence of new species similar to the Cambrian explosion will occur.
In this process, looking down, there is the openness of Baidu Wenku, Baidu Wangpan, and others to their own capabilities. By actively embracing the ecosystem, they become the creators of a new species of large models and the formulators of new rules.
Looking up, there are countless new Agents created and seen based on the "Cangzhou OS", forming a magnificent and surging new application service ecosystem.
And now, all the stories have just begun.