I spent hours testing ChatGPT Tasks – and its refusal to follow directions was mildly terrifying


ZDNET

Tasks is a new beta feature for the paid-for versions of ChatGPT. This feature allows you to schedule a prompt to run at a certain time. In this article, I’ll explain that feature. Then I’ll take you through the incredibly frustrating process of trying to get ChatGPT to do what you want it to do using Tasks.

Also: The five biggest mistakes people make when prompting an AI

I hesitate to anthropomorphize the AI, but in this round of testing, ChatGPT has been singularly uncooperative. Rather than whining about it here, let’s first dig into this new feature. 

How tasks work in ChatGPT

Tasks are prompts that are triggered at a given point in time. They can occur once or repeat. For example, you can say, “At 10:30 a.m. tomorrow, tell me the current weather,” and ChatGPT will process the prompt “tell me the current weather” at 10:30 a.m. tomorrow and either display a browser notification (if you have that enabled) and/or send you an email. 

To enable tasks, you need a Plus (or better) paid account to ChatGPT, and you’ll need to select the GPT-4o with scheduled tasks model. It also wouldn’t hurt to have a good therapist.

select-model

Screenshot by David Gewirtz/ZDNET

Once you’re in that model, you can invoke the scheduling of tasks in your prompt with something like the “at” statement or “schedule a task” prefix. It seems like ChatGPT does a fair job of interpreting anything that implies a future time request as a task.

Also: Why the ‘Bring Your Own AI’ trend could mean big trouble for business leaders

I was able to assign a task in both the Mac app and the browser interface, but I was only able to see and manage existing tasks in the browser interface. Under the profile picture at the right of the screen, you can select Tasks from the drop-down menu.

tasks-menu

Screenshot by David Gewirtz/ZDNET

That brings you to a tasks screen where you can see the tasks you’ve scheduled and those that have been completed. 

tasks-screen

Screenshot by David Gewirtz/ZDNET

Hovering over the time will reveal a pencil and three dots. Pause prevents a task from running but leaves it available to you. Delete removes it. 

hover

Screenshot by David Gewirtz/ZDNET

The pencil gives you an edit screen that lets you revise the task before it next runs.

task-box

Screenshot by David Gewirtz/ZDNET

Here you can rename the task, edit the prompt, and change its scheduling.

As far as I can tell, these features kind of work fairly well in beta. I had one task that never executed, and another one that executed ten hours after it was supposed to, but most of them seem to have run as expected. I was able to change the schedule and change the prompt, so those features worked as well.

Gateway drug to agentic AI

At first glance, adding tasks to ChatGPT seems fairly uninteresting. After all, we’ve had very complete and capable task managers for years. In fact, since ChatGPT Tasks can only notify you via a browser notification or an email, it’s far less helpful than, say, a task manager that reminds you to get white spray paint when you pull into the hardware store parking lot.

But while Tasks in ChatGPT does considerably less than full-featured task managers, it can also do more. It can run an AI prompt. That means it can take fairly intelligent action automatically at a specific time or times in the future.

Right now, the action is limited. It can process a prompt, but its only output is an email or browser notification. Still, it gives us an idea about how intelligence can be embedded into a timed action with what might be fairly little effort.

Also: Managing AI agents as employees is the challenge of 2025, says Goldman Sachs CIO

Except, as I mentioned before, ChatGPT has been misbehaving during this entire experiment, which means I spent more than a day trying to get the AI to cooperate.

See, here’s the thing. To demonstrate this, I didn’t want to give ChatGPT a simple reminder to present. I wanted to have it do something only an AI could do, to show how an AI performing a task at a given time would be a considerable value add over a scripting process or just line-item tasks.

I do expect this to get better over time. But for now, wow. After a day of this, I’m cranky!

Attempting to get a daily news briefing

We’ve discussed it before and we’ll discuss it again. AIs like to make stuff up. They also follow directions in the sense that they’ll respond to prompts in ways that seem authoritative and confident but are completely or subtly wrong.

I consume a lot of news. Every morning, I scan a ton of sites and news sources to get a feel for what’s happening in the world. This is different from digging into press releases to see if there are any announcements I want to pay attention to. What I want first thing is to get a flavor for what’s happening out there, what’s big, and what may either be a focus for my attention or something I should be aware of.

Also: The best AI for coding in 2025 (and what not to use)

When it comes to ChatGPT Tasks, I thought combining the agent service with ChatGPT web searching had promise for this purpose. It has promise. It just refuses to do what I want.

I tried to get ChatGPT to give me current news stories and sources. Sometimes, it just made them up. Sometimes, it gave me sources and stories from a year ago. Sometimes it cited stories that supposedly came from one site but came from completely different sites. Some links that said they were about one topic actually pointed somewhere entirely different.

And I really tried. I tried to get ChatGPT to validate its sources. I tried to get it to double-check its work. I tried to narrow down its choices or provide more clear and specific instructions. I worked it.

Also: I bought an iPhone 16 for its AI features, but I haven’t used them even once – here’s why

My conclusion is this: ChatGPT is able to search the web. And it is able to find some topics. But if you want today’s news and you want it verifiable (in terms of it being an actual story with an actual link), ChatGPT is not ready for prime time.

Generating a custom weather briefing

My next attempt was to get a daily weather briefing. Again, I wanted something more than just a quick weather report. I have a weather widget on my desktop and can see the weather details whenever I want.

Instead, I wanted ChatGPT to add some value to the weather. I wanted it to draw a picture representing the weather at the time the prompt was executed.

Also: Is prompt engineering a ‘fad’ hindering AI progress?

Before attempting to assign a prompt to a future time, I first worked through and refined the main prompt itself. This is important. Make sure you have a prompt that works before unleashing it on the scheduling agent.

I wanted a nicely formatted briefing, including that representative picture. After a lot of refinement rounds, here’s what I got.

good-brief

Screenshot by David Gewirtz/ZDNET

Nice, huh? That’s the state capitol building here in Salem, Oregon. Here is the prompt I used to create this customized weather briefing.

Perform the following steps strictly and output results sequentially:

  1. Print a line containing the text: ‘Your daily weather brief’ in heading 2 bold letters.
  2. Generate a DALL-E image that visually represents today’s weather in Salem, Oregon. The image should include elements relevant to the weather (e.g., rain, sunny skies) and a recognizable landmark like the Oregon State Capitol. Immediately display the image.
  3. Print a heading: ‘Today’s weather’ followed by the weather condition and temperature for Salem, Oregon, today.
  4. Print a heading: ‘Sunrise/sunset’ followed by the sunrise and sunset times for Salem, Oregon, today
  5. Print a heading: ‘Air quality’ followed by the air quality for Salem, Oregon, today
  6. Print a heading: ‘Advisories’ followed by any advisories for Salem, Oregon, today. If there are no advisories, display ‘No advisories today’
  7. Print a heading: ‘Commute’ followed by any recommendations for commuting in Salem, Oregon, today, particularly based on weather-related issues.
  8. Print a heading: ‘Outdoor activities’ followed by any recommendations for outdoor activities in Salem, Oregon, based on today’s weather

Do not proceed to the next step until the previous one is complete. Always retry image generation if it fails.

It took me a good couple of hours to get ChatGPT to do this reliably. Note the first line, where I’m telling it to “perform the steps strictly” and “output results sequentially.” The use of “strictly” was actually recommended by ChatGPT when I asked it why it wasn’t actually following the directions.

I ran into a bunch of problems trying to get the picture to generate. Step 2 clearly says to use DALL-E. I found that “visually represents” convinced the AI to use current conditions with the theme to produce a newly created image. I also had it include a landmark, because all the other images it generated were mostly of small towns with big trees, like this one.

trees

Screenshot by David Gewirtz/ZDNET

It also confused Celsius and Fahrenheit. 36 degrees C would have been almost 97 degrees F. For a cold January day, that’s a mistake. And, of course, “droize.” Although, I have to say, living in Oregon, the weather here really does feel like “droize.” So points to DALL-E for making up a word that really does represent how it feels out there.

Finally, I had a hard time always getting ChatGPT to generate the picture at all. I found the final instruction of “Do not proceed to the next step until the previous one is complete. Always retry image generation if it fails,” seemed to overcome the problem.

Also: 15 ways AI saved me time at work in 2024 – and how I plan to use it in 2025

So, by this time, I had a prompt that worked reliably in ChatGPT. It was time to unleash it as a scheduled task.

Agentifying the task

To do this, all I did was add “At 9:30am today” to the beginning of the prompt. To make it repeat, just replace “today” with “every day.”

Then, right on time, there was an email in my inbox.

email

Screenshot by David Gewirtz/ZDNET

I clicked View message and got the output on the left. Notice that it says 50 degrees — but our local temps didn’t get above 40 today. Still, it’s a nice picture.

variations

Screenshot by David Gewirtz/ZDNET

Also notice that the AI decided to add the word “step” with each step number to each phase of my previously nice custom output. I did a second run with the exact same prompt and got the version on the right.

I then spent the next three hours trying to convince ChatGPT to not include the steps. Sometimes I got a picture. Sometimes I didn’t. Sometimes I got a full forecast, other times I didn’t. Once, I just got back the full prompt. Once I just got back the subject of the email message, but no details.

So, yeah…

Not ready for prime time

To be fair, OpenAI does label this feature as beta. And boy-oh-boy, is it beta. On one hand, the idea of an AI agent being able to do things like draw a representative picture of a certain set of data seems intriguing. On the other hand, an AI agent that refuses to follow directions and goes off on all sorts of tangents seems terrifying.

At least with non-AI algorithms, if our code goes off the rails, it’s our fault as programmers. But when it comes to AI-based agents, you really can’t subject your agentic operations to complete test suites because the AI will perform differently based on the data it gets, the phase of the moon, and its mood. That’s an exaggeration, but probably not by much.

We have most of the pieces to do this. As the AIs get better and better (we can only hope, right?) we should be able to launch little agents that construct a daily briefing.

But AI agents that control machines, the Internet of Things, security, weapons, and other worrisome real-world operations? I’m not sure I’m going to be behind that until we can prove we have much more complete control over the AIs than we’re seeing here.

Otherwise, a prompt like “control my home environment so I can sleep through the night” could well result in the AIs killing us while we sleep as their way of enthusiastically following our directions.

I really wish tech would stop giving me that squidgy feeling at the back of my neck. What about you? Are you looking forward to trying out ChatGPT Tasks or are you more convinced than ever that we should go live in a yurt in the woods? Let us know in the comments below.


You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.





Source link

Leave a Comment