Algorithms, computing power, and data are the three core elements of AI. China has become a global data powerhouse, with annual data output exceeding 40 zettabytes (ZB). However, China still lacks effective data: the national data retention rate is only 2.8%, with massive amounts of data discarded at the source. High-quality industry data is scarce. For example, in medical models, China’s model training data volume is only about 10% of that of leading Western countries. Data sharing is low, with large amounts of data stored in isolated “islands,” and the data sharing rate between companies is less than 25%. Zhou Yuefeng stated that AI is moving from generative AI to agentic AI, with data being upgraded to a “corpus + knowledge base.” Data infrastructure centered around data acquisition, storage, management, and utilization will become critical infrastructure in the AI era. Cities, industries, and enterprises will advance in a three-pronged approach to building advanced data infrastructure.
At the city level, we will build advanced storage centers to achieve the transition from isolated island governance to global data aggregation and trusted circulation.
The Advanced Storage Center, through its comprehensive management process encompassing “global data aggregation, efficient data governance, and trusted data circulation,” establishes a trusted data hosting, governance, development, and circulation center, effectively closing the loop from data resources to data assets. As one of the eight major hubs for East-West Computing, Guizhou leverages the Storage Center to aggregate data from key industries across the province, establishing itself as a leading computing hub and a hub for data value innovation.
At the industry level, build a high-quality industry corpus to achieve the transition from decentralized utilization to intelligent fusion of multi-dimensional data
High-quality corpora are the prerequisite and “teaching material” for the effective and easy-to-use large AI models. We must continue to promote the aggregation and preservation of data resources across all industries, and encourage leading enterprises to take the lead in building industry-wide data sharing and collaboration platforms. A national breeding institution in my country has integrated multiple datasets, including a basic corpus, soybean thematic database, corn thematic database, and pig thematic database. Relying on data lake storage and a unified data view, it has overcome the three bottlenecks of discrete data, inconsistent data quality, and cross-institutional sharing, creating a precision breeding technology system and accelerating the breeding cycle.
At the enterprise level, build an enterprise AI data lake to achieve the transition from “single intelligence” to “multi-agent collaboration”
Collaboration among multiple intelligent agents requires connecting private enterprise data and sharing knowledge bases to improve the accuracy of enterprise applications and the real-time nature of knowledge. Therefore, enterprise data foundations must evolve from stovepipe-style infrastructure to AI data lakes. For example, in autonomous driving, AI data lakes aggregate and integrate diverse data from road tests, simulations, and high-precision maps to support the collaboration of multiple intelligent agents, including environmental perception agents, pedestrian avoidance agents, and voice assistant agents, ultimately improving traffic safety.
The construction of advanced data infrastructure requires continuous technological innovation. By developing technologies such as AI storage and all-flash storage, we can provide high-speed and reliable data access capabilities. By supporting unified data views and trusted data management capabilities, we can achieve global data visibility, manageability, and usability. By deploying AI tool chains, we can build low-code development and rapid application rollout capabilities, providing one-stop tool support for data processing, model training and application deployment, and fully unlocking the value of data.
Finally, Zhou Yuefeng said that Huawei adheres to open source and openness, and brings together many industry-university-research forces such as the “Computing Power Industry Development Array Advanced Storage AI Inference Working Group” and the “China Electricity Standardization Association Data Storage Professional Committee” to continuously enrich my country’s data infrastructure technology ecosystem, jointly build a fertile industrial ecosystem, and promote AI applications and value creation.
Source: Huawei