- Everything announced at Amazon's Alexa event today: Alexa Plus, new Echo Show UI, and more
- Missing MagSafe on your iPhone 16e? Here's how to easily add it: 2 ways
- Lenovo renews ThinkSystem lineup for AI workloads and more
- Five takeaways from Cisco Live EMEA
- Finally, I found a reliable 3-in-1 charger that works with all my Pixel devices (and it's 20% off)
This new text-to-speech AI model understands what it's saying – how to try it for free

Text-to-speech AI models are a great tool for instances where human voice actors are typically used, such as audiobooks, dubbing, commercials, and more. However, because these models are not human and unaware of what they say, they can sometimes sound noticeably robotic. Hume’s new AI model seeks to tackle this issue.
Also: 10 key reasons AI went mainstream overnight – and what happens next
Octave
On Wednesday, Hume launched Octave, a text-to-speech large language model (LLM) with contextual awareness. The LLM can use this awareness to adjust its tune, rhythm, and timbre of speech to the words it is reading based on their meaning, according to the company. For example, an AI-enabled voice can convey a sense of disgust when reading a sentence.
Beyond understanding the context of the text, the model can also take directions. Users can instruct it to be “calm”, “whispering”, “disgustful”, “angry”, and more. Hume says the advantage Octave has over a voice actor is that it can take on any voice or even invent a new one based on the user description.
Also: Why Anthropic’s latest Claude model could be the new AI to beat – and how to try it
For instance, Hume says a user could provide a prompt as simple as “wise wizard” or as complex as combining different accents, demographic groups, occupational roles, and more. Essentially, the model would invent a voice on the script alone, but when prompted, it could be steered by the script and the description.
Testing the model
The user interface is easy to navigate, with one text box for Voice, in which you can describe exactly what you want the voice to sound like, and another for Script, in which you enter what you want the model to say. For my first test, I used the detailed pre-made prompts to see how it sounded.
After clicking on “Generate”, Octave generated three voice results, and upon first listen I was impressed. Although I wasn’t convinced that the generations captured the “valley girl” sound, I was super-impressed with the intonations and inflections.
For my prompt, I created a scenario where the primary speaker is out of breath from running and in a hurry. The script read: “YAY I am almost at the finish line. I am so tired but am going to keep pushing because I am almost there. See you later! Byeeee.”
Also: 3 easy side hustles OpenAI’s Operator just made possible – plus how you can get started
I was equally happy with these results. Octave mostly conveyed what I wanted, placing the right amount of excitement and pauses where breaths would be taken if you were exhausted from running. However, like the prior example, the voice wasn’t exactly what I described. In this case, the speaker didn’t speak super-fast.
Overall, it seems like the model’s strength is placing the nuances of human speech in its output. What often gives AI voices away is their monotony, making the output sound quite boring to listen to. With Octave, you could hear the reader’s emotions, whether frustration, defeat, or tiredness. Words like “ugh” have the exact length and breathing a human would use, creating an engaging experience.
How to access
There are different tiers for accessing the model, including a free one with a 10,000-character limit (around 10 minutes) and unlimited character voices if you want to try it out. Beyond the free tier, there are six additional tiers, ranging from $3 to $900 per month, depending on access needs.
Also: Anthropic offers $20,000 to whoever can jailbreak its new AI safety system
For example, the Starter tier is $3 per month and includes 30,000 characters (around 30 minutes), while the Business tier is $900 monthly for 10,000,000 characters (around 10,000 minutes). There is also an Enterprise option that can be customized to your needs. You can view all the offerings and get started on the Hume website.