• Lowpass
  • Posts
  • Solving streaming’s mumbling problem

Solving streaming’s mumbling problem

In partnership with

Welcome to Lowpass! This week: Machine learning may help us to watch British TV shows without subtitles.

Say what: How Netflix and Sonos are trying to solve streaming’s mumbling problem

You know the feeling: You’re glued to your favorite show, watching a pivotal scene. Two major characters are talking to each other, relaying crucial pieces of information, perhaps foreshadowing what’s going to happen next. Only, one of them mumbles. Then, before you know it, a quick cut to the next scene, and you’re left wondering: What did he just say?

Intelligibility has always been an issue for movie and TV show fans, including the estimated 72 million Americans with some level of hearing loss. But with streaming, the problem has arguably gotten worse. Not only are people watching more movies at home as opposed to in theaters, but we’re also all watching a much wider variety of content: Overseas shows with challenging accents, YouTube clips with terrible audio quality, and whatever Christopher Nolan is trying to achieve with his mixes.

The problem has gotten so bad that half of all people queried in a recent survey said that they used subtitles “most of the time.” Does that number sound improbably high? I thought so, too,  but another survey has the number of 18 to 24-year-olds who watch things with subtitles some or all of the time even at 80 percent.

In other words: Streaming has a mumbling problem. Now, a growing number of companies are trying to find a fix.

How Netflix is pushing for better mixes

Among these is Netflix, which developed its own speech intelligibility measurement system to figure out how hard it is to understand the dialogue of a particular movie or TV show episode.IN a nutshell, the system separates speech from background sounds and music to quantify how distracting the non-speech audio is at any given point.

Dialogue Intelligibility, as measured in in the first episode of Netflix’s The Diplomat.

Of course, just measuring the problem doesn’t really fix it. That’s why the streamer teamed up with Fraunhofer Institute as well as Nugen Audio to build software solutions that help audio engineers to measure dialog intelligibility when it matters most: during the actual mixing process. 

In addition to incorporating Fruanhofer’s machine learning-based Dialogue Intelligibility Meter into popular audio workstation software, the three companies also cooperated on a new tool dubbed DialogCheck. 

“For an audio engineer who might have been working on a scene for hours or even days, hearing the same dialog hundreds of times, it’s a challenge to predict how a mix will sound to someone hearing it for the first time,” Nugen Audio explains on its website. DialogCheck is supposed to help engineers by providing a second opinion aided by data, according to the company.

Tackling mumbling in the living room

Giving audio engineers the tools for better mixes going forward is great. But what about those thousands of movies and TV shows already available on streaming, or YouTube clips that will never benefit from professional audio mixing? Smart speaker maker Sonos began working on its own solution to the mumbling problem three years ago, and recently updated its new Arc Ultra soundbar with a machine-learning-based speech enhancement solution.

Work on that solution began three years ago, I was recently told by Sonos Sound Experience Lead Paul Peace. As a first step, Sonos identified 16 different factors that could impact dialogue intelligibility. “It could be things that happen within the mix itself,” Peace said. “It could be the environment you're in. It could be some sort of impairment, or hearing loss itself.” What’s more, in many cases, there isn’t just a single culprit, but any combination of those 16 factors.

As a next step, Sonos teamed up with the British Royal National Institute for Deaf People to conduct listening studies. “That became invaluable in understanding what we should do,” Peace said. “One of the main things that came out of that was the fact that there's not a one-size-fits-all solution. Each listener has a different situation and multiple choices of correction may be required.”

With that in mind, Sonos developed four different levels for its new speech enhancement feature, including a Max level specifically designed for people with significant hearing loss.

The new speech enhancement options for the Sonos Arc Ultra, as available in the company’s iOS app.

Hip-Hop can be challenging

Sonos then created a machine learning model capable of identifying speech in the center channel of an audio mix – which can be surprisingly difficult. The company trained the model with over 1000 hours of material, ranging from TV shows to Nolan movies to YouTube content. “That was a big basis for the two-channel work we did,” Peace said about UGC.

Armed with that knowledge, the Arc Ultra can now identify dialogue in real time, and then decide how to best handle it. Sometimes, that means not doing anything at all. “We've figured out a way on center channel content to detect an action scene,” Peace explained. “When we detect an action scene, we dial back all of the processing. [That’s] because [in] typically action scenes, even if there is speech, it isn't story dialogue. It’s usually an expletive of some kind, or something to that nature that doesn't have any story content.”

Peace readily admitted that the system isn’t perfect. Edge cases include music performances that sound like speech. “If you're doing any sort of Hip-Hop, where there is a dialogue element to it, and it's not really vowel heavy, it can get fooled,” he said. “Definitely, that's a struggle.”

However, detection is only one part of the puzzle. Acting on detected dialogue in a smart way is also important, and just cranking up the volume isn’t it. “The trick is to actually pull away masking energy out of all the other content in a very sophisticated manner, just to give the dialogue space to breathe,” Peace said.

“When we see that dialogue is present, there is action taken on all of the other channels, and we can dial that in for each level,” he added.

AI to the rescue

Sonos included the new speech enhancement in a recent software update for its Arc Ultra soundbar. The response so far has been positive, with Reddit users calling it. “immediately noticeable,” “night and day” and “very impress[ive].” Sonos won’t be bringing the feature to legacy devices, but the company is looking at bringing it to future hardware.

More broadly, initiatives like Netflix’s dialogue intelligibility measurements and Sonos’ speech enhancement tech show that there’s room for machine learning and AI beyond the flashy headlines. Used correctly, these technologies can make streaming more accessible, improve the experience for everyone – and perhaps, one day, even get us to turn off those subtitles.

Enjoy reading stories like this one? Then please consider upgrading to the $8 a month / $80 a year paid tier to support my reporting, and get access to the full Lowpass newsletter every week.

SPONSORED

Use AI as Your Personal Assistant

Ready to save precious time and let AI do the heavy lifting?

Save time and simplify your unique workflow with HubSpot’s highly anticipated AI Playbook—your guide to smarter processes and effortless productivity.

What else

CoComelon is moving to Disney+. Netflix will lose its most popular kids TV title in 2027, but clips will continue to stream on YouTube as well.

Stellantis and Amazon part ways. A partnership between the two companies that would have included Amazon powering in-car software in Jeep and Chrysler vehicles is winding down; the two companies had previously partnered to bring Fire TV software to select cars.

Spotify is putting an even bigger emphasis on podcasts. The music service is adding new podcast-specific features to its app that are meant to help discover new shows.

Niantic is ditching gaming for AI. Not loving the AI framing, but this feature-length story about Niantic’s future following the Pokemon Go sale is nonetheless worth a read.

Apple plans to launch a new gaming app. The app will reportedly be introduced at WWDC, and launch on iPhones, iPads and Apple TVs later this year.

Reed Hastings joins the board of AI company. The former CEO of Netflix has been appointed to the board of directors of Anthropic.

SPONSORED

Here’s Why Over 4 Million Professionals Read Morning Brew

  • Business news explained in plain English

  • Straight facts, zero fluff, & plenty of puns

  • 100% free

That’s it

I can’t believe it’s almost June already … speaking of which: I’ll be at the Stream TV Show in Denver in mid-June, moderating two panels. Feel free to say hi if you’re there as well!

Thanks for reading, have a great weekend!

Reply

or to participate.