Language

Arabic
العربية

Chinese
中文

香港繁體
Traditional Chinese

臺灣正體
Traditional Chinese

English
English

French
Français

German
Deutsch

Italian
Italiano

Indonesian
Bahasa Indonesia

Japanese
日本語

Korean
한국어

Portuguese
Português

Russian
Русский

Spanish
español

Vietnamese
Tiếng Việt

Country/Area

Antigua and Barbuda
Antigua and Barbuda

Bosnia and Herzegovina
Bosna i Hercegovina

Central African Republic
République Centrafricaine

Congo, Democratic Republic of the
République Démocratique du Congo

Congo, Republic of the
République du Congo

Côte d'Ivoire
Côte d'Ivoire

Czech Republic
Česká republika

Dominican Republic
República Dominicana

Equatorial Guinea
Guinea Ecuatorial

Marshall Islands
Aolepān Aorōkin M̧ajeļ

North Macedonia
Северна Македонија

Papua New Guinea
Papua Niugini

Saint Kitts and Nevis
Saint Kitts and Nevis

Saint Vincent and the Grenadines
Saint Vincent and the Grenadines

Sao Tome and Principe
São Tomé e Príncipe

Saudi Arabia
المملكة العربية السعودية

Solomon Islands
Solomon Islands

Sri Lanka
ශ්‍රී ලංකාව

South Sudan
جنوب السودان

Trinidad and Tobago
Trinidad and Tobago

United Arab Emirates
الإمارات العربية المتحدة

United Kingdom
United Kingdom

Vatican City
Città del Vaticano

Language
Country/Area

Arabic
العربية

Chinese
中文

中国简体
Simplified Chinese

香港繁體
Traditional Chinese

臺灣正體
Traditional Chinese

English
English

French
Français

German
Deutsch

Italian
Italiano

Indonesian
Bahasa Indonesia

Japanese
日本語

Korean
한국어

Portuguese
Português

Russian
Русский

Spanish
español

Vietnamese
Tiếng Việt

Antigua and Barbuda
Antigua and Barbuda

The Bahamas
The Bahamas

Bosnia and Herzegovina
Bosna i Hercegovina

Burkina Faso
Burkina Faso

Cape Verde
Cape Verde

Central African Republic
République Centrafricaine

Congo, Democratic Republic of the
République Démocratique du Congo

Congo, Republic of the
République du Congo

Costa Rica
Costa Rica

Côte d'Ivoire
Côte d'Ivoire

Czech Republic
Česká republika

Dominican Republic
República Dominicana

El Salvador
El Salvador

Equatorial Guinea
Guinea Ecuatorial

The Gambia
The Gambia

Marshall Islands
Aolepān Aorōkin M̧ajeļ

North Macedonia
Северна Македонија

Papua New Guinea
Papua Niugini

Saint Kitts and Nevis
Saint Kitts and Nevis

Saint Lucia
Saint Lucia

Saint Vincent and the Grenadines
Saint Vincent and the Grenadines

San Marino
San Marino

Sao Tome and Principe
São Tomé e Príncipe

Saudi Arabia
المملكة العربية السعودية

Sierra Leone
Sierra Leone

Solomon Islands
Solomon Islands

South Africa
South Africa

Sri Lanka
ශ්‍රී ලංකාව

South Sudan
جنوب السودان

Trinidad and Tobago
Trinidad and Tobago

United Arab Emirates
الإمارات العربية المتحدة

United Kingdom
United Kingdom

United States
United States

Vatican City
Città del Vaticano

The IBM success story: How did statistical machine translation regain attention in the 1980s?

Statistical machine translation (SMT) is a machine translation method that relies on statistical models to generate translations, where the parameters of these models are derived from the analysis of a bilingual text corpus. The basic concepts of statistical machine translation have continued to evolve since Warren Weaver first proposed these ideas in 1949. In the late 1980s, researchers at IBM's Thomas J. Watson Research Center brought the technology back into the spotlight and further developed it. The resurgence of this phase is due to the fact that they combined the concepts of information theory and the advancement of computer technology to adapt SMT to a wider range of languages.

Statistical machine translation can utilize large amounts of bilingual and monolingual data to improve the fluency and accuracy of translation.

The advantage of SMT is that the model used for translation is not based on explicit language rules, but automatically learns conversion between languages through statistical analysis of large amounts of corpus. Therefore, this method makes more efficient use of human and data resources than traditional rule-based translation systems. In addition, since SMT systems are usually not optimized for a specific language pair, this makes them more flexible and scalable in application.

The fluency of statistical machine translation often comes from the language model running behind it.

However, statistical machine translation is not perfect. Corpora are expensive to create, specific errors are difficult to predict and correct, and translation results sometimes appear fluent but hide underlying translation problems. In particular, between language pairs with large differences in language structure, the effect of SMT may not be as expected, which is particularly evident in language pairs other than Western European languages.

The earliest word-based translation model made the basic unit of translation a single word in natural language. As word structures become more complex, the lengths of translated sentences are often inconsistent, which makes the "fertility rate" corresponding to the word a difficult point to handle flexibly. This word-based translation approach does not effectively handle high fertility rates between languages, as it cannot map two English words to one French word, even though it may make sense literally in some cases.

Phrase-based translation attempts to overcome the limitations of word-based translation and provide more flexible conversion by translating entire word sequences.

The phrase-based translation method introduces another innovative framework, which translates "phrases" extracted from the corpus using statistical methods. This method is more flexible and can effectively reduce the restrictions on words and word order. In this way, phrases can be directly mapped through the translation table and may be reordered during the translation process, thereby improving the quality of the translation results.

In the 1980s and 1990s, IBM's research continued to develop, taking syntactic structure into account and integrating context into translation. The statistical machine translation models of this period gradually established multi-level language understanding, marking a qualitative change in translation technology.

Language model is an indispensable component of statistical machine translation system, which helps improve the fluency of translation.

As time goes by, many well-known translation systems, such as Google Translate and Microsoft Translator, begin to improve their underlying technologies and transition to deep learning-based neural machine translation, marking the gradual obsolescence of statistical machine translation. However, the historical significance of SMT remains, as it laid the foundation for subsequent technological advances and achieved a leapfrog development in the field of translation.

Now, as we look back at the history of this technology, we can’t help but wonder, with the rapid development of artificial intelligence, how will machine translation technology evolve further in the future?

Trending Knowledge

The Mystery of Translation in 1949: How did Warren Weaver apply information theory to machine translation?

In the history of the development of translation technology, 1949 is undoubtedly a key turning point. That year, Warren Weaver formally proposed the idea of applying Claude Shannon's information the

The revolution of statistical machine translation: Why it can replace the old rule-based methods?

In the field of machine translation, the introduction of statistical methods can be described as a revolution. This approach has gradually replaced rule-based translation systems since the concept was

Multimedia

The IBM success story: How did statistical machine translation regain attention in the 1980s?

Trending Knowledge

Responses

Language

Country/Area

No result found

Multimedia

The IBM success story: How did statistical machine translation regain attention in the 1980s?

Trending Knowledge

Responses

Responses