Home » Meta’s AI-powered audio codec promises 10x compression over MP3

Meta’s AI-powered audio codec promises 10x compression over MP3

by admin
Expanding / A diagram of speech waveform data.

Meta AI

last week, meta announced An AI-powered audio compression method called “EnCodec” is said to compress audio by a factor of 10. MP3 format 64kbps with no loss of quality. According to Meta, the technology could dramatically improve the sound quality of conversations over low-bandwidth connections, such as calls in areas with unreliable service. This technique works for music too.

Meta announced the technology on October 25th in a paper titled “”.High fidelity neural audio compressioncreated by meta AI researcher Alexandre. defose, Jade Kopett, Gabriel Sinaev, Yossi Addy.Meta also summarized the study that blog Dedicated to EnCodec.

Meta claims their new audio encoder/decoder can compress audio to 10 times less than MP3.
Expanding / Meta claims their new audio encoder/decoder can compress audio to 10 times less than MP3.

Meta AI

Meta describes the method as a three-part system trained to compress audio to a desired target size. First, the encoder converts the uncompressed data into a lower frame rate “latent space” representation. A “quantizer” then compresses the representation to a target size, keeping track of the most important information that is later used to reconstruct the original signal. (This compressed signal is either sent over the network or saved to disk.) Finally, the decoder uses a neural network on a single CPU to turn the compressed data into speech in real time. return.

Block diagram showing how Meta's EnCodec compression works.
Expanding / Block diagram showing how Meta’s EnCodec compression works.

Meta AI

Use of meta Discriminator It proves the key to creating a way to compress audio as much as possible without losing important elements of the signal.

“The key to lossy compression is to identify changes that humans cannot perceive, because perfect reconstruction is not possible at low bit rates. Improves quality, which makes the cat-discriminator’s job is to distinguish between real and reconstructed samples. It tries to generate samples to fool the discriminator by making them physically similar.

Using neural networks to compress and decompress audio far from new— especially for audio compression — but researchers at Meta claim to be the first group to apply the technique to 48 kHz stereo audio (slightly better than CD’s 44.1 kHz sampling rate). . This is typical of music files distributed over the Internet.

In terms of applications, Meta says this AI-powered “ultra-compression of voice” can support “faster, higher-quality calls” even in poor network conditions. And of course, being Meta, the researcher also mentions his EnCodec’s impact on the metaverse, saying that the technology could ultimately result in a “rich metaverse experience without the need for significant bandwidth improvements.” said to be able to provide

Beyond that, one day you’ll even be able to get really small music audio files. For now, Meta’s new technology remains in the research stage, but it points to a future where high-quality audio can use less bandwidth.This is great news for mobile broadband providers overloaded network from streaming media.

You may also like

Leave a Comment