Operator isn't worth its $200-per-month ChatGPT Pro…

ZDNET

This week, OpenAI is introducing a research preview called Operator. I initially wanted to do a hands-on, but once I found out that you need a Pro account (which costs $200 per month), I decided to watch the various OpenAI demos, share them with you, and then share my thoughts. Altman did say that users of the $20-per-month Plus plan would eventually be able to use Operator.

Operator is an AI agent. Fundamentally, it simulates keyboard and mouse clicks in a browser, reading the screen, and performing actions.

Also: Have a genealogy mystery? How I used AI to solve a family puzzle

I have a fairly long history of building this kind of app, using mostly algorithmic programming along with a little machine learning to identify the location of certain images on the screen.

My most recent project was an auto-posting tool that would make my social media posts for me. Yes, there are a plethora of subscription services that will do that for you, but I decided to see what it would take to build my own.

My code used a combination of the DOM (document object model) for individual social media service pages, along with image recognizers that were able to find buttons (like the + or Post buttons). I used the tool I built for about a year but ran into a very annoying snag.

About every two weeks, one of the six sites I was navigating made a small change to the screen interface, which proceeded to break my code. So every two weeks, instead of posting my social media posts normally, I had to spend a few hours fixing whatever had broken.

The fact that the web is constantly changing (for example, a blue “Post” button might turn into a red “Post / Subscribe at 30% off” button during a promotion) might knock the AI off its game.

Computer-using agent

The model OpenAI is using is called CUA, or computing-using agent. This model dictates how Operator talks to the websites it’s supposed to navigate.

In their introduction video, Sam Altman and OpenAI team members Yash Kumar, Casey Chu, and Reiichiro Nakano explained that Operator doesn’t use APIs and isn’t working off of extracted text pulled from the DOM. Instead, it’s “viewing” an actual web page in a live browser running in the cloud, reading the context directly off the screen.

Also: How ChatGPT scanned 170k lines of code in seconds, saving me hours of work

They were very clear that the control mechanism for the web pages was mouse and keyboard simulation, and the input that the AI reads is the visual representation of the actual web page that we see as humans.

The OpenAI team did say that Operator will work just like a human using a web browser — searching, clicking, and visiting websites. But there is a contradiction that I haven’t fully figured out yet, which is that OpenAI has partnered with a bunch of sites (Instacart, DoorDash, Etsy, OpenTable, Tripadvisor, AP, Priceline, StubHub, Thumbtack, Target, Uber, and more).

What do these partnerships do for Operator? Are they affiliate deals where OpenAI gets a kickback on any sales? Do they have an agreement to let Operator know if the website format has changed? Did OpenAI do additional modeling for those sites? Does it have some level of API access to the data those sites display on the web?

Until we have a better understanding of those answers, we won’t really know the scope of what Operator can do. All the demos shown were conducted using sites the company has partnered with, so it’s not clear, for example, that it could go into ZDNET and construct a list of my last 10 articles and email that to me using Gmail.

Also: How to use ChatGPT

Right now, I get the impression that Operator is fairly shallow in what it can accomplish. This demo, for example, was able to look up a recipe on one site and then populate an Instacart shopping cart with the ingredient list.

There were demos that showed making a restaurant reservation, buying tickets to a basketball game, and so on. Each of these were one or two site processes where data was found on one site and then applied to another.

opentable — Screenshot by David Gewirtz/ZDNET

Guardrails and privacy

OpenAI does appear to have given some serious consideration to issues of privacy and guardrails. For example, one demo showed the booking of four basketball tickets for a total of more than $1,000. It’s unlikely any of us would feel comfortable just letting the AI go ahead and spend that kind of cash on our behalf unsupervised.

Operator knows when to pause and ask for human intervention. Or at least, it’s supposed to. It’s still in beta, so it’s possible that it could run amok, just because it’s not quite finished.

Also: The best AI for coding

But the key idea is simple: when the operations on a website are about to get sensitive (logging in, spending money, making reservations, checking out, etc.), Operator asks its human to confirm the operation.

Additionally, the human user can take control of the cloud-based browser window. According to OpenAI, when the human is controlling the browser, it acts like a private session, and nothing that takes place while the human is in control is fed back to the AI.

You can also opt out of allowing your website interactions to be used as training data for the AI.

Site-specific custom instructions

Operator allows you to create site-specific custom instructions on a site-by-site basis.

personalize — Screenshot by David Gewirtz/ZDNET

In the above example, pulled from the video below, the demonstrator wants to make sure that bookings on Priceline are fully refundable and have a free breakfast. By placing that custom instruction in the web site’s preferences, the AI agent will always consider that when performing a task on Priceline.

Additionally, Operator will allow you to save a task so you can rerun it or schedule it later.

saved — Screenshot by David Gewirtz/ZDNET

If you have a regular activity you’d like Operator to do for you, this is a quick way to ensure you can re-run your work when you want.

Baby steps

Operator feels very much like baby steps to me at this time. For example, I’d love to tell an AI to go through my inbox, find all the press releases, and assign them to one label (I’m using Gmail). Or find all the AI-related press releases and give them one label, while the rest of the press releases get another.

This is both a complex task and one that’s got quite a long runtime (I have 51,000 marketing pieces in my Promotions tab). As such, it’s way beyond the scope of what Operator can do.

Also: I spent hours testing ChatGPT Tasks – and its refusal to follow directions was mildly terrifying

But someday? Maybe.

I’m also trying to avoid the science fiction horror interpretation of all of this. There’s a little part of my brain yelling, “They’re letting the AI surf the Internet? Are they nuts?”

And yeah, tools like Operator (and even all the AIs that are trained on the Internet as a whole) are probably opening doors to some really bad things, especially if we ever do create sentient AIs. But for now, it’s an interesting exercise to see how well an AI succeeds at reading a recipe and ordering the ingredients from Instacart.

What do you think? When the price comes down to the $20-per-month range, do you see tasks you might assign to Operator? Does it worry you? Let us know your thoughts in the comments below.

You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Source link

Operator isn't worth its $200-per-month ChatGPT Pro subscription yet – here's why

Computer-using agent

Guardrails and privacy

Site-specific custom instructions

Baby steps

VMWARE

Helping Public Sector Organisations Define Cloud Strategy

How to change the VLAN ID of the Service Console in ESX from the command line/console

Cisco UCS and Vmware Interfaces (Vnics) HA Design Considerations

Troubleshooting network and TCP/UDP port connectivity issues on ESX/ESXi(2020669)

vSphere Client Parameters

Configuration Templates

CUE Licenses

Trouble shooting Unity Express with Call Manager Integeration & Operational Issues

CME Configuration Example: SIP Trunks to Viatalk and VoIP.ms

SIP Phone registration – CME Configuration

CUE Voicemail + VPIM networking (CUE to unity)

Related Post

Computer-using agent

Guardrails and privacy

Site-specific custom instructions

Baby steps

VMWARE

Configuration Templates