• Latest
  • Trending
Google AI Introduces Translatotron 2 For Robust Direct Speech-To-Speech Translation

Google AI Introduces Translatotron 2 For Robust Direct Speech-To-Speech Translation

September 30, 2021
Just-In: Ethereum Merge Most Likely In August, Says Vitalik Buterin

Just-In: Ethereum Merge Most Likely In August, Says Vitalik Buterin

May 20, 2022
Trader Predicts Crypto Market Will Mimic 2018 Bear Season – Here’s How High Bitcoin Could Go Before Nuking Lower

Trader Predicts Crypto Market Will Mimic 2018 Bear Season – Here’s How High Bitcoin Could Go Before Nuking Lower

May 20, 2022
Terraform Labs, Luna Foundation Guard Bought 3.06m AVAX in total: Avalanche Foundation

Terraform Labs, Luna Foundation Guard Bought 3.06m AVAX in total: Avalanche Foundation

May 20, 2022

TD SYNNEX expands solution offering with Google Cloud

May 20, 2022

Creating an ML Web App and Deploying it on AWS

May 20, 2022
Will Fan Tokens Replace Memecoins Like Shiba Inu and Dogecoin?

Will Fan Tokens Replace Memecoins Like Shiba Inu and Dogecoin?

May 20, 2022
Goldman Sachs: Crypto Drawdown Will Have Little Impact on U.S. Economy

Goldman Sachs: Crypto Drawdown Will Have Little Impact on U.S. Economy

May 20, 2022
Crypto Bear Market: Pantera Partner Sees These Buying Opportunities

Crypto Bear Market: Pantera Partner Sees These Buying Opportunities

May 20, 2022
Australias Commonwealth Bank Halts Crypto Rollout

Australias Commonwealth Bank Halts Crypto Rollout

May 20, 2022
Commonwealth Bank puts crypto trading trial on ice as regulators dither

Commonwealth Bank puts crypto trading trial on ice as regulators dither

May 20, 2022
Ethereum devs tip The Merge will occur in August ‘if everything goes to plan’

Ethereum devs tip The Merge will occur in August ‘if everything goes to plan’

May 20, 2022
Beware, Bitcoin Jumping Back Above $30,000 Could Be A Dead Cat Bounce, Here’s why

Beware, Bitcoin Jumping Back Above $30,000 Could Be A Dead Cat Bounce, Here’s why

May 20, 2022
Deep Tech Central
Tuesday, June 28, 2022
Subscription
Sign Up
  • News
    • Artificial Intelligence
    • Crypto
    • CyberSecurity
    • IoT
    • Robotics
    • Quantum Computing
    • Sustainability
    • Telecom
  • Videos
  • DTC – UNV
No Result
View All Result
Deeptech Central
No Result
View All Result

Google AI Introduces Translatotron 2 For Robust Direct Speech-To-Speech Translation

by DeepTech Central
September 30, 2021
in Artificial Intelligence
0

The Natural Language Processing (NLP) domain is experiencing remarkable growth in many areas, including search engines, machine translation, chatbots, home assistants and many more. One such application of S2ST (speech-to-speech translation) is breaking language barriers globally by allowing speakers of different languages to communicate. It is therefore extremely valuable to humanity in terms of science and cross-cultural exchange. 

Automatic S2ST systems are typically made up of a series of subsystems for speech recognition, machine translation, and speech synthesis. However, such cascade systems may experience longer latency, information loss (particularly paralinguistic and non-linguistic information), and compounding errors between subsystems.

YOU MAY ALSO LIKE

Creating an ML Web App and Deploying it on AWS

Now You Don’t Need To Present Your Credit Card At Checkout If You Bind Your Facial Images/ Hand Features To Your MasterCard Credit Card

In 2019, Google AI introduced Translatotron, the first model that directly translates speech between two languages. This direct S2ST model could be trained end-to-end in a short amount of time and had the unique ability to keep the source speaker’s voice (which is non-linguistic information) in the translated speech. Despite its capacity to produce high-fidelity translated speech that sounds realistic, it nevertheless underperformed compared to a strong baseline cascade S2ST system. 

Google’s recent study presents the improved version of Translatotron, which significantly enhances performance. Translatotron 2 employs a new way for transferring the voices of the source speakers to the translated speech. Even when the input speech involves numerous speakers speaking in turn, the updated technique to voice transference is successful while also decreasing the potential for misuse and better complying with our AI Principles. 

Translatotron 2 Architecture

The main components of this new model are:

A voice encoder.A target phoneme decoder.A target speech synthesizer.An attention module that connects them all.

The encoder, attention module, and decoder work together to be comparable to a traditional direct speech-to-text translation (ST) model. 

Source: https://ai.googleblog.com/2021/09/high-quality-robust-and-responsible.html

The key changes made in Translatotron 2 are listed below:

The output from the target phoneme decoder is one of the inputs to the spectrogram synthesizer in Translatotron 2. It is, therefore, easy to train and performs better as a result of its strong conditioning.The spectrogram synthesizer used in Translatotron 2 is duration-based, which remarkably improves the robustness of the synthesized speech.The attention-based connection in Translatotron 2 is driven by the phoneme decoder instead of the spectrogram synthesizer. This aligns the acoustic information the spectrogram synthesizer sees with the translated material it’s synthesizing, allowing each speaker’s voice to be preserved throughout speaker turns.

Strong Voice Retention

By conditioning its decoder on a speaker embedding generated by a separately trained speaker encoder, the original Translatotron preserved the source speaker’s voice in the translated speech. However, if a clip of the target speaker’s recording was provided as the reference audio to the speaker encoder, or if the target speaker’s embedding was directly available, this approach allowed it to generate the translated speech in a different speaker’s voice. This had the potential to be used to spoof audio with arbitrary content.

Keeping this in mind, Translatotron 2 is built with just one speech encoder that handles both language understanding and voice capture. This restricts trained models to reproduce non-source voices.

The researchers used a modified version of PnG NAT, a TTS model capable of cross-lingual voice transmission. The modified PnG NAT model adds a separately learned speaker encoder, allowing zero-shot voice transference.

Furthermore, they propose ConcatAug, a simple concatenation-based data augmentation technique. This enables S2ST models to keep each speaker’s voice in the translated speech when the input speech contains many speakers speaking in turn. By randomly picking pairs of training examples and concatenating the source speech, target speech, and target phoneme sequences into new training examples, this method augments the training data on the fly. The model can learn from examples with speaker turns since the samples contain two speakers’ voices in both the source and destination speech.

Input (Spanish): 

TTS-synthesized reference (English): 

Translatotron 2 (without ConcatAug) prediction (English):

Translatotron 2 (with ConcatAug) prediction (English): 

Source: https://ai.googleblog.com/2021/09/high-quality-robust-and-responsible.html

Translatotron 2 consistently outperforms the original Translatotron in terms of translation quality, speech naturalness, and speech resilience in tests on three different corpora. It excelled in the challenging Fisher corpus in particular. 

Source: https://ai.googleblog.com/2021/09/high-quality-robust-and-responsible.html

The researchers also evaluated the model’s performance on a multilingual set-up, in which the model translated speech from four distinct languages into English. The language of the input voice is not provided; therefore, the model had to figure it out on its own. Translatotron 2 surpasses the original Translatotron by a wide margin on this task. The results suggest that Translatotron 2’s translation quality is comparable to a baseline speech-to-text translation model. These findings demonstrate that Translatotron 2 is very effective on multilingual S2ST.

Source: https://ai.googleblog.com/2021/09/high-quality-robust-and-responsible.html

Paper: https://arxiv.org/abs/2107.08661

Source: https://ai.googleblog.com/2021/09/high-quality-robust-and-responsible.html

The post Google AI Introduces Translatotron 2 For Robust Direct Speech-To-Speech Translation appeared first on MarkTechPost.

Share196Tweet123Share49

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

I agree to the Terms & Conditions and Privacy Policy.

Search

No Result
View All Result

Recent News

  • Just-In: Ethereum Merge Most Likely In August, Says Vitalik Buterin
  • Trader Predicts Crypto Market Will Mimic 2018 Bear Season – Here’s How High Bitcoin Could Go Before Nuking Lower
  • Terraform Labs, Luna Foundation Guard Bought 3.06m AVAX in total: Avalanche Foundation
  • About
  • Privacy Policy
  • Sign Up
  • Contact Us
  • About
  • Contact
  • Deeptech Central
  • Elementor #10628
  • Newsletter
  • Privacy Policy
  • Sign Up

© 2018-2021 DeepTech Central. - by MintMore Inc..

No Result
View All Result
  • News
    • Artificial Intelligence
    • Crypto
    • CyberSecurity
    • IoT
    • Robotics
    • Quantum Computing
    • Sustainability
    • Telecom
  • Videos
  • DTC – UNV

© 2018-2021 DeepTech Central. - by MintMore Inc..

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.

Stay Updated. Subscribe Today.

Join the community of 10K+ scholars & entrepreneurs.