Language

Arabic
العربية

Chinese
中文

香港繁體
Traditional Chinese

臺灣正體
Traditional Chinese

English
English

French
Français

German
Deutsch

Italian
Italiano

Indonesian
Bahasa Indonesia

Japanese
日本語

Korean
한국어

Portuguese
Português

Russian
Русский

Spanish
español

Vietnamese
Tiếng Việt

Country/Area

Antigua and Barbuda
Antigua and Barbuda

Bosnia and Herzegovina
Bosna i Hercegovina

Central African Republic
République Centrafricaine

Congo, Democratic Republic of the
République Démocratique du Congo

Congo, Republic of the
République du Congo

Côte d'Ivoire
Côte d'Ivoire

Czech Republic
Česká republika

Dominican Republic
República Dominicana

Equatorial Guinea
Guinea Ecuatorial

Marshall Islands
Aolepān Aorōkin M̧ajeļ

North Macedonia
Северна Македонија

Papua New Guinea
Papua Niugini

Saint Kitts and Nevis
Saint Kitts and Nevis

Saint Vincent and the Grenadines
Saint Vincent and the Grenadines

Sao Tome and Principe
São Tomé e Príncipe

Saudi Arabia
المملكة العربية السعودية

Solomon Islands
Solomon Islands

Sri Lanka
ශ්‍රී ලංකාව

South Sudan
جنوب السودان

Trinidad and Tobago
Trinidad and Tobago

United Arab Emirates
الإمارات العربية المتحدة

United Kingdom
United Kingdom

Vatican City
Città del Vaticano

Language
Country/Area

Arabic
العربية

Chinese
中文

中国简体
Simplified Chinese

香港繁體
Traditional Chinese

臺灣正體
Traditional Chinese

English
English

French
Français

German
Deutsch

Italian
Italiano

Indonesian
Bahasa Indonesia

Japanese
日本語

Korean
한국어

Portuguese
Português

Russian
Русский

Spanish
español

Vietnamese
Tiếng Việt

Antigua and Barbuda
Antigua and Barbuda

The Bahamas
The Bahamas

Bosnia and Herzegovina
Bosna i Hercegovina

Burkina Faso
Burkina Faso

Cape Verde
Cape Verde

Central African Republic
République Centrafricaine

Congo, Democratic Republic of the
République Démocratique du Congo

Congo, Republic of the
République du Congo

Costa Rica
Costa Rica

Côte d'Ivoire
Côte d'Ivoire

Czech Republic
Česká republika

Dominican Republic
República Dominicana

El Salvador
El Salvador

Equatorial Guinea
Guinea Ecuatorial

The Gambia
The Gambia

Marshall Islands
Aolepān Aorōkin M̧ajeļ

North Macedonia
Северна Македонија

Papua New Guinea
Papua Niugini

Saint Kitts and Nevis
Saint Kitts and Nevis

Saint Lucia
Saint Lucia

Saint Vincent and the Grenadines
Saint Vincent and the Grenadines

San Marino
San Marino

Sao Tome and Principe
São Tomé e Príncipe

Saudi Arabia
المملكة العربية السعودية

Sierra Leone
Sierra Leone

Solomon Islands
Solomon Islands

South Africa
South Africa

Sri Lanka
ශ්‍රී ලංකාව

South Sudan
جنوب السودان

Trinidad and Tobago
Trinidad and Tobago

United Arab Emirates
الإمارات العربية المتحدة

United Kingdom
United Kingdom

United States
United States

Vatican City
Città del Vaticano

From the first generation to the advanced generation: How does early model collapse quietly affect little data?

With the rapid development of machine learning technology, many researchers have begun to notice an emerging phenomenon: model collapse. This phenomenon describes the process by which machine learning models gradually lose information and performance when trained on unorganized synthetic data. According to the definition proposed by Shumailov et al., model collapse is divided into two stages: early model collapse and late model collapse.

In early model collapse, the model begins to lose information about the tails of the data distribution, which mainly affects accuracy on a small number of data.

When we use synthetic data to train models, potential problems surface, including function approximation errors, sampling errors, and learning errors. These problems can occur even on the simplest models, especially in complex models, where errors are more likely to accumulate, accelerating model collapse. This makes early crashes difficult to detect, as overall performance may appear to be improving, while performance for a small number of data is declining.

Late model collapse can result in significant losses in model performance, misconceptions, and loss of much of the variability.

Root cause of model collapse

The root causes of model collapse can be summarized into three aspects, one is function approximation error, the other is sampling error, and the third is learning error. The accumulation of these problems can lead to a decline in the overall performance of the model. Especially in the Internet sharing of data, the data generated by AI will enter the future training data set, causing a vicious cycle.

Many researchers are concerned about this phenomenon and believe that model collapse will pose a fundamental threat to the future development of generative AI. However, some researchers have recently come up with a different perspective. They argue that model collapse is avoidable as long as synthetic data is accumulated alongside human-generated data. Their study points out that the pattern of data accumulation over time is more realistic, rather than just deleting all the data every year.

The real-world impact may not be as pessimistic as one might think.

Exploration of alternative solutions

In addition to the above discussion, another school of literature studies the use of machine learning detectors and watermarking techniques to identify data generated by the model and filter out such data. These methods provide new ideas for dealing with model collapse.

Preliminary exploration of mathematical models

In 2024, researchers made the first attempt to use a simple 1D Gaussian model to demonstrate the collapse phenomenon. The model uses an unbiased estimator based on the original data to calculate the mean and variance. Although the results of this simple model do not fully reflect the complexity of reality, it provides a basis for further research.

As the model evolves, even after the first generation, the complete distribution no longer appears normal, but instead transforms into a variance-gamma distribution.

Although this exploration may seem theoretical, its significance lies in providing a tool to help understand and evaluate changes between different generations. Through these models, researchers can calculate the expected mean and variance in each generation to better understand the dynamics of model collapse.

In the face of the reality that machine learning models are gradually gaining popularity, we should further think about: Will future generative AI be able to successfully cope with the challenge of model collapse, or will it unknowingly fall into a deeper dilemma?

Trending Knowledge

Why data generated by artificial intelligence may become a hidden danger in the future?

With the rapid development of artificial intelligence technology, the use of AI to generate data has become increasingly popular. However, this practice of using synthetic data to train AI models may

nan

The Egyptian pyramids are not only a miracle of ancient architecture, but also an important cultural symbol in human history.During the construction of the pyramid, the utilization of minerals has be

The secret of model collapse: Why artificial intelligence may be stuck in a vortex of performance degradation?

In the rapid development of artificial intelligence, a new concept-model collapse has gradually attracted the attention of experts. This phenomenon refers to the gradual degradation of machine learnin

Multimedia

From the first generation to the advanced generation: How does early model collapse quietly affect little data?

Root cause of model collapse

Exploration of alternative solutions

Preliminary exploration of mathematical models

Trending Knowledge

Responses

Language

Country/Area

No result found

Multimedia

From the first generation to the advanced generation: How does early model collapse quietly affect little data?

Root cause of model collapse

Exploration of alternative solutions

Preliminary exploration of mathematical models

Trending Knowledge

Responses

Responses