This new text-to-speech AI model understands what it's saying…

Text-to-speech AI models are a great tool for instances where human voice actors are typically used, such as audiobooks, dubbing, commercials, and more. However, because these models are not human and unaware of what they say, they can sometimes sound noticeably robotic. Hume’s new AI model seeks to tackle this issue.

Also: 10 key reasons AI went mainstream overnight – and what happens next

Octave

On Wednesday, Hume launched Octave, a text-to-speech large language model (LLM) with contextual awareness. The LLM can use this awareness to adjust its tune, rhythm, and timbre of speech to the words it is reading based on their meaning, according to the company. For example, an AI-enabled voice can convey a sense of disgust when reading a sentence.

Beyond understanding the context of the text, the model can also take directions. Users can instruct it to be “calm”, “whispering”, “disgustful”, “angry”, and more. Hume says the advantage Octave has over a voice actor is that it can take on any voice or even invent a new one based on the user description.

Also: Why Anthropic’s latest Claude model could be the new AI to beat – and how to try it

For instance, Hume says a user could provide a prompt as simple as “wise wizard” or as complex as combining different accents, demographic groups, occupational roles, and more. Essentially, the model would invent a voice on the script alone, but when prompted, it could be steered by the script and the description.

Testing the model

The user interface is easy to navigate, with one text box for Voice, in which you can describe exactly what you want the voice to sound like, and another for Script, in which you enter what you want the model to say. For my first test, I used the detailed pre-made prompts to see how it sounded.

Hume LLM for text-to-speech — Screenshot by Sabrina Ortiz/ZDNET

After clicking on “Generate”, Octave generated three voice results, and upon first listen I was impressed. Although I wasn’t convinced that the generations captured the “valley girl” sound, I was super-impressed with the intonations and inflections.

For my prompt, I created a scenario where the primary speaker is out of breath from running and in a hurry. The script read: “YAY I am almost at the finish line. I am so tired but am going to keep pushing because I am almost there. See you later! Byeeee.”

Also: 3 easy side hustles OpenAI’s Operator just made possible – plus how you can get started

I was equally happy with these results. Octave mostly conveyed what I wanted, placing the right amount of excitement and pauses where breaths would be taken if you were exhausted from running. However, like the prior example, the voice wasn’t exactly what I described. In this case, the speaker didn’t speak super-fast.

Overall, it seems like the model’s strength is placing the nuances of human speech in its output. What often gives AI voices away is their monotony, making the output sound quite boring to listen to. With Octave, you could hear the reader’s emotions, whether frustration, defeat, or tiredness. Words like “ugh” have the exact length and breathing a human would use, creating an engaging experience.

How to access

There are different tiers for accessing the model, including a free one with a 10,000-character limit (around 10 minutes) and unlimited character voices if you want to try it out. Beyond the free tier, there are six additional tiers, ranging from $3 to $900 per month, depending on access needs.

Also: Anthropic offers $20,000 to whoever can jailbreak its new AI safety system

For example, the Starter tier is $3 per month and includes 30,000 characters (around 30 minutes), while the Business tier is $900 monthly for 10,000,000 characters (around 10,000 minutes). There is also an Enterprise option that can be customized to your needs. You can view all the offerings and get started on the Hume website.

Source link

This new text-to-speech AI model understands what it's saying – how to try it for free

Octave

Testing the model

How to access

VMWARE

Helping Public Sector Organisations Define Cloud Strategy

How to change the VLAN ID of the Service Console in ESX from the command line/console

Cisco UCS and Vmware Interfaces (Vnics) HA Design Considerations

Troubleshooting network and TCP/UDP port connectivity issues on ESX/ESXi(2020669)

vSphere Client Parameters

Configuration Templates

CUE Licenses

Trouble shooting Unity Express with Call Manager Integeration & Operational Issues

CME Configuration Example: SIP Trunks to Viatalk and VoIP.ms

SIP Phone registration – CME Configuration

CUE Voicemail + VPIM networking (CUE to unity)

Related Post

Octave

Testing the model

How to access

VMWARE

Configuration Templates