DEVELOPMENT OF LARGE LANGUAGE MODELS

In the current wave of artificial intelligence, large language models (LLMs) are undoubtedly one of the most closely watched areas. They are profoundly changing people's lives and are seen as "a key step towards general artificial intelligence." The global AI innovation landscape is largely dominated by the U.S. and China. So, how do the two countries compare in their development of large language models?

1. What are large language models? Understanding AI from its origins

Today, the powerful capabilities of large language models are often astonishing. Understanding their basic principles not only helps us use these tools more efficiently but also deepens our comprehension of their developmental logic. To explore this, we start with the well-known concept of "computation" and trace the development of artificial intelligence.

In ancient times, humans invented the abacus, marking the beginning of mechanical computation. Over time, mechanical computation evolved. By the 19th century, the concept of programming and programs emerged, storing mathematical calculations as computational instructions.

In the 20th century, the birth of electronic computers marked a monumental step in the evolution of computing. However, scientists did not stop there; they aimed to "simulate the process of human brain knowledge processing and invent machines that could think like the human brain." In 1950, Alan Turing posed the question, "Can machines think?" in his seminal AI paper, "Computing Machinery and Intelligence." This sparked the first wave of AI enthusiasm. In 1956, several scholars gathered at Dartmouth College in the United States to discuss various issues related to simulating intelligence in machines. Although they did not reach a consensus after prolonged discussions, they coined a term for the subject: Artificial Intelligence (AI).

At the time, AI research was dominated by two primary approaches: Symbolism and Connectionism. The former, embraced by most scientists at the Dartmouth conference, was the mainstream method. It proposed encoding human a priori knowledge into computers as symbols, allowing the computer to reason and make judgments based on predefined rules and patterns. Connectionism, on the other hand, advocated for simulating the human learning process by mimicking the way neurons connect. Over time, due to its inherent limitations, Symbolism's applications became increasingly restricted, leading to its gradual replacement by Connectionism

By the late 20th and early 21st centuries, Connectionism achieved significant breakthroughs in fields like neural networks and deep learning. Simultaneously, the rise of the internet and big data addressed the hardware and data requirements for Connectionist research, enabling AI to realize increasingly effective commercial applications.

Progress has never ceased. In 2020, AI transitioned from "small models + discriminative approaches" to "large models + generative approaches," reaching new heights. Large models are characterized by their extensive parameters, vast training datasets, and substantial computational power requirements, all contributing to higher model accuracy. Generative AI, on the other hand, refers to the ability to proactively generate content, greatly expanding AI's application scenarios. Large language models emerged in this wave of technological advancement, combining these two characteristics and using vast amounts of human language data for training. According to IBM, large language models "capture complex patterns in language through billions, or even more, parameters, performing various language-related tasks... and playing a significant role in bringing generative AI to the forefront of public interest." However, it is essential to note that the logic behind these models is not to "understand" and respond as humans do. Instead, they predict which words are most likely to appear together in a sentence or paragraph, producing content that sounds human-written.

Additionally, three concepts need clarification: industry-specific large models, general-purpose large models, and multimodal models. The first two are relative terms. Industry-specific large models are trained for specific domains, such as Bloomberg's BloombergGPT, which uses years of accumulated financial data to train the model, making it more efficient and accurate in understanding financial information. General-purpose large models, like ChatGPT, are applicable across various domains. Multimodal models, in contrast to ordinary models, can process multiple types of information, such as documents and audio files. GPT-4 is one of the leading examples in this field.

2. Overview of the Industry in China and the U.S.

There is no doubt that China and the United States are leading the world in the development of large language models, far ahead of other countries. However, the progress between the two nations is not symmetrical, as there are significant differences in many aspects.

The United States has long been the center of global scientific research, and large language models are no exception. Decades ago, numerous research institutions and major companies in the U.S. began investing in the study of large language models, achieving groundbreaking results around 2017. Early modern large language models (from 2018 to 2019) were predominantly American in origin, such as Google’s BERT and T5, as well as OpenAI’s GPT-1 and GPT-2. Compared to the United States, China's development of large language models (LLMs) had a much dimmer start. Surprisingly, however, China's first large model was launched in early 2019, almost simultaneously with OpenAI's GPT-2. That model was Baidu's ERNIE 1.0 (Wenxin Yiyan 1.0). Yet, in the year following its release, no other similar products were introduced by Chinese companies, and Baidu itself did not continue to evolve into a leading player in LLMs like OpenAI.

We analyzed the release trends of major global LLMs in recent years and observed some clear patterns. After 2019, the number of LLM releases worldwide increased rapidly. In terms of sheer numbers, the U.S. has maintained its leading position, while China has experienced rapid growth. In fact, in the first 11 months of this year, the number of LLM releases in China has already surpassed that of the United States. (Note: The table below only includes major foundational models, with different parameter versions of the same model counted as a single release.)

The number of releases is, of course, not the sole standard for evaluating development progress; "quality"—the performance of the models—may be even more critical. Numerous research institutions and organizations around the world frequently publish performance rankings for mainstream models, but the criteria vary significantly.

Focusing on 2024, we analyzed 18 rankings from both Chinese and American institutions to examine the proportion of Chinese and American products among the top ten large language models. Overall, Chinese models accounted for 30% of the top ten, while American models held 70%. This indicates that, despite the release count of Chinese large language models approaching or even surpassing that of the U.S. in 2024, the U.S. remains the undisputed leader in cutting-edge technology.

Interestingly, when the organizations producing the rankings (including universities and companies) are categorized by their home country, a pattern emerges: rankings from Chinese institutions show a significantly higher average proportion of Chinese models in the top ten compared to rankings from American institutions.

Large language models that consistently rank in the top ten can be considered the most advanced and high-performing overall. In the U.S., these include the GPT-4, Claude, Gemini, Llama, and related series. In China, the leading models are Qwen (Tongyi Qianwen), Yi (Zero-One Everything), and ERNIE (Wenxin Yiyan).

Additionally, comparisons can also be made from qualitative perspectives. One major focal point in LLM development is the debate between open-source and closed-source approaches. Among the top-performing models mentioned above, Chinese models are predominantly open-source, whereas American models are largely closed-source. According to The New York Times, compared to many American companies, Chinese enterprises are more willing to share their technologies with users, even collaborating with other companies and software developers by releasing their underlying software code. Why do Chinese companies "bet on open-source AI"? According to the MIT Technology Review, "For Alibaba and other Chinese AI companies, open-source AI represents an opportunity for rapid commercialization and global recognition.” This may be one of the reasons why China's large language models have been able to develop so rapidly.

Commenting on the development of large language models, The New York Times concluded, "While the United States has a lead in artificial intelligence development, China is quickly catching up.” Kai-Fu Lee, founder of Zero-One Everything (Yi) and former head of Google China, believes that China's top-tier large language models lag behind the United States by 6 to 9 months, while the less advanced ones trail by about 15 months. He asserts that while the U.S. leads the world in groundbreaking scientific research, "the intelligence, diligence, and hard work of the Chinese people cannot be overlooked." He also notes that China surpasses the U.S. in areas like business models and user experience. Next, we will conduct a broader comparative analysis of large language models from both countries.

3. Large Language Models as Technology and Products

Large language models, as a technology, are characterized by their "scale" and "language," as mentioned in the first section. This highlights two key elements: computational power and data, particularly data based on human language. At the same time, large language models are also a product, meaning that their development is closely tied to commercial operating models and application scenarios.

The "scale" in large language models partly refers to the vast amount of data required for training. However, obtaining such large datasets can raise issues such as excessive collection of personal data and concerns over national data security. As a result, administrative oversight and legal regulations are indispensable. Moreover, how a nation manages data significantly impacts the development of large language models. The approaches to data management differ between the two countries. According to the China Forward Industry Research Institute, China's regulation is primarily government-led, focusing on centralized management and serving national strategies, while the United States places greater emphasis on individual privacy and data protection. Although the focus differs, both countries strictly regulate data related to artificial intelligence. Additionally, due to the widespread user base of American companies around the world, data from various countries could potentially flow into the United States, raising concerns among some scholars.

In the context of large language models, "data" is primarily based on human language and can be referred to as "corpus." To train models effectively, high-quality corpus is often required, characterized by attributes such as "diversity, large scale, legality, authenticity, coherence, impartiality, and harmlessness." However, as things stand, the entire world may face a shortage of high-quality corpus. According to an analysis by the AI research organization Epoch, tech companies are likely to exhaust all high-quality data available on the internet by 2026. The situation is even less promising for Chinese corpus. Gao Wen, an academician of the Chinese Academy of Engineering, has publicly stated that in the global standard 500-billion-parameter training dataset for large models, Chinese corpus accounts for only 1.3%.

The "scale" of large language models also refers to their massive computational power requirements. In recent years, China's central and local governments have introduced numerous policies to increase investments in computing power infrastructure, achieving rapid annual growth in computational capacity. However, a gap remains compared to the United States. Xu Bing, co-founder of China's SenseTime, stated in May this year, "Among the three key elements of AI—computational power, data, and talent—the biggest gap between China and the U.S. lies in computational power, which is approximately tenfold." An article in the American Affairs Journal emphasized that "America's leading position in AI relies on its advantages in computational power and infrastructure.” Computational power is closely tied to advanced chips, and in recent years, the U.S. government has repeatedly imposed chip sanctions on China. With Trump's re-election, the U.S.-China chip war faces even greater uncertainty.

From a commercial perspective, as a product, large language models show significant differences in pricing and applications between China and the U.S. In May of this year, Zhipu AI, incubated by Tsinghua University, launched the first salvo in a price war by reducing the cost of using GLM-3 Turbo by 80%. Following this, numerous Chinese large model providers significantly slashed their prices, with companies like Baidu and Tencent even offering their products for free. Although major U.S. tech giants such as OpenAI and Google also implemented price cuts in May, the reductions were far less dramatic than those in China.

The substantial price reductions by Chinese companies, seemingly chaotic, have an inherent logic. Industry experts have noted that U.S. companies excel at "hard technology" innovation, advancing from 0 to 1, while Chinese companies specialize in "soft application" innovation, advancing from 1 to n. Price competition is a necessary step in driving applications, as "without lowering prices, accelerating adoption is impossible.” In fact, according to a report by TMTPost International Think Tank, China does have an edge over the United States in large model applications. Chinese companies "are more inclined to leverage open-source models from leading enterprises to drive application-focused innovation and entrepreneurship.”

4. Cooperation, Competition, and Conflict Between the Two Countries

Large language models have become a focal point in the current development of artificial intelligence and the global technological race, with intensifying competition between the U.S. and Chinese governments. Between 2022 and 2023, the United States implemented two rounds of export bans on computing chips to China. On October 24, 2024, President Biden signed a memorandum aimed at consolidating the U.S.'s leadership in artificial intelligence, highlighting AI's crucial role in international competition. Voice of America commented that the memorandum's goal is to "prevent the U.S. from becoming a victim of adversaries like China exploiting AI technology," while Politico described it as "establishing new policies to compete with China.” On the 28th of the same month, the U.S. Department of the Treasury issued new regulations further restricting investments in China's semiconductor, quantum computing, and artificial intelligence sectors. The intensity of this technological competition is self-evident.

Beyond competition, the U.S. and China also demonstrate the potential for cooperation in the field of artificial intelligence. This is not only because AI has introduced numerous global challenges that require joint efforts but also because the unique characteristics of the two countries are inherently complementary. On November 15, 2023, the leaders of China and the U.S. met in San Francisco, reaching a significant consensus to establish a government-to-government dialogue mechanism on AI. In May of this year, the first U.S.-China AI government dialogue meeting was held in Geneva, Switzerland, where both sides discussed topics such as the "global AI governance framework and standards.” Engaging in in-depth discussions, the U.S. and China each bring unique advantages to the field of artificial intelligence— the U.S. boasts a mature investment environment and world-class research teams, while China has immense potential in data acquisition and market scale. These strengths provide both the motivation and foundation for collaboration. How to seek cooperation amid competition and address competition within cooperation will be a key issue for the future.

BACK TO PROJECTS