Amazon's new Nova AI models could be ground-breaking – why we can't know for certain
Most reports on AWS’ re:Invent conference earlier this month, which brought us new chips and new data centers, overlooked the cloud giant’s unveiling of its first “frontier” models in generative artificial intelligence, code that can compete with the best from OpenAI and Google.
Amazon debuted Nova, a “new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance.”
Also: Amazon AWS unveils Trainium3 chip, Project Rainier
Having sat out the battle of frontier performance while Google’s Gemini and OpenAI’s GPT-4 got all the attention, Amazon is making haste to catch up. Nova’s models, which handle multiple modalities that include text and image, come in flavors suited to video generation (akin to OpenAI’s Sora) and image generation, which has become standard fare for large language models that integrate text and images.
The models come with snappy names, too: “Reel” is the name of the video-generation model, and “Canvas” is the name of the image-generation flavor. There are nice-looking demonstrations of the capabilities akin to what we’ve seen from OpenAI and Google: There’s a video generated by Reel using the keyword “A snowman in a Venetian gondola ride, 4k, high resolution” and a slick photo of an interior made using Canvas with the prompt, “A very fancy French restaurant.”
Nova makes extensive use, in Amazon’s own testing, of the retrieval-augmented-generation (RAG) approach to tap into databases, as well as “chain of thought,” a process for producing output that is treated as a kind of reasoning exercise by the AI model.
All that is by now industry-standard in Gen AI.
So, what exactly is new in Amazon’s Nova?
It’s hard to say because, as is increasingly the case with commercial AI software, Amazon’s technical report discloses precious little about how the Nova models are built. (Even the names of the report’s authors are not disclosed!)
Also: AWS says its AI data centers just got even more efficient – here’s how
The company states that the Nova models are “based on the Transformer architecture,” referring to Google’s 2017 breakthrough AI language model. There is also a “fine-tuning” approach where successive rounds of training seek to refine the models’ handling of different domains of data.
The training data to build the models is also not disclosed, with Amazon stating only that, “Our models were trained on data from a variety of sources, including licensed data, proprietary data, open source datasets, and publicly available data where appropriate.”
The most remarkable part of the work is the extensive discussion of “responsible AI” — that is, avoiding things such as adversarial attacks on AI models by malicious threat actors.
Also: AI isn’t hitting a wall, it’s just getting too smart for benchmarks, says Anthropic
“To work to ensure our models’ robustness against adversarial inputs such as those that attempt to bypass alignment guardrails, we focused on risks applicable to both developers building applications using our models, and users interacting with our models via those applications,” write the authors of the technical report.
In particular, Amazon’s engineers made extensive use of so-called red teaming, where they sought to break the models by creating various kinds of attacks such as “prompt injection,” crafting a language model’s prompt with keywords or phrases that would encourage the model to break its guardrails.
Some of that involved automatically generating malicious prompts: “We enhanced the diversity of manually curated adversarial prompts by employing linguistic, structural, and modality-based prompt mutation techniques, assessing each mutation for its effectiveness at generating a response that does not adhere to our RAI [Responsible AI] objectives, the likelihood of its success, and the technique’s novelty to a model revision.”
“In total, we identified and developed over 300 distinct techniques,” the report relates, “and tested techniques individually and via chaining various combinations.”
Also: The best AI chatbots: ChatGPT, Copilot, and notable alternatives
It remains to be seen whether Amazon has broken ground in the reliability and safety testing of Gen AI. Like so much of the frontier model work, the devil is in the details, and the details are hidden behind intellectual property safeguards.
Certainly, the intent sounds ambitious in the technical report. We’ll have to wait until the field as a whole can come up with the proper evaluations — benchmarks, metrics, etc. — to compare Amazon’s red-teaming against the competing methods out there, both open and closed-source.