How Stack Overflow is adding value to human answers in the age of AI


Stack Overflow CEO Prashanth Chandrasekar.

Tiernan Ray

The question-and-answer site Stack Overflow was founded 17 years ago to allow programmers — human programmers — to post questions about programming problems and get answers from a community of like-minded individuals. 

Since its founding, the world has become enamored with how ChatGPT and other generative AI programs can not only supply answers but even perform the work itself, handing you your own custom code.

How should a community of people sharing knowledge respond to the sudden appeal of AI automation?

Stack Overflow’s CEO, Prashanth Chandrasekar, has been running the company for almost six years — and he has a plan. I sat down with him this month to learn how that plan is coming together.

Also: What is Stack Overflow? A forum for all who code

“Things have changed; we want to change with the times,” Chandrasekar told me. “We wanted to go with the direction of the flow of the river.”

The rise of ChatGPT

He didn’t really have a choice. Starting in 2022, the rise of ChatGPT had an immediate impact on Stack Overflow’s public site traffic, which delivers the advertising that accounts for a large portion of its revenue. 

A primary way in which people came to Stack Overflow was always a Google search. Paid search on Google was the “user interface to Stack Overflow,” as Chandrasekar puts it.

Being able to ask a chatbot instead of searching quickly cut into those Google queries, and traffic began to decline. 

Chandrasekar arrived at a broad philosophical conclusion about not just AI and automation but also the internet.

“Our view is that the nature of the internet has changed,” he said. It’s no longer mostly about paid search from human queries driving site traffic. “The user interface has changed to be Gen AI tools,” he observed.

“And, so, we’re having to sort of be as responsive to that industry change as possible. We need to also diversify” as a property.

Also: Stack Overflow joins Reddit and Twitter in charging AI companies for training data

Chandrasekar and team realized there is a lot of value in Stack Overflow’s 60 million answers to address the shortcomings of generative AI. One option was to sue OpenAI and other makers of AI foundation models. The “pre-trained” large language models include Stack Overflow conversations, which are legally the property of Stack Overflow.

“We said, OK, we could do that, or we could do something a little bit more, let’s say, novel, relative to what everybody else was doing.” 

Instead of filing suit, Stack Overflow installed code to block scraping of the public website, “and then we said, ‘if you’re building a chat bot or whatever, you have to work with us. Let’s have a fair business agreement.'”

Knowledge as a service 

On top of advertising and the paid enterprise version of Stack Overflow, called Teams, Chandrasekar is building out a third source of revenue known as knowledge solutions — or, knowledge-as-a-service.

Chandrasekar has been signing deals to license Stack Overflow’s content, first with Google and OpenAI. 

In the past 18 months, he’s also been busy signing “all the cloud hyperscalers, I can’t name them, but you know them,” he said. “We are in the process of doing many more.”

Those deals include integration with Microsoft’s Visual Studio Code tool, GitHub CoPilot, and Google Gemini Code Assist — to get answers right inside the programming environment.

“All these companies are taking, are leveraging an official licensed version of our dataset to train their models for pre-training to do things like RAG and indexing in some cases.”

Also: AI bots scraping your data? This free tool gives those pesky crawlers the run-around

Answers get surfaced on ChatGPT or other LLMs, with a credit and a link back to Stack Overflow. “The goal is to actually recapture the traffic that people gave to our website directly,” he said. “We are becoming more of a headless website: if people are spending all their time in gen AI bots, that’s also fine.”

Of course, there’s the risk of not having direct relationships with the end user if they are not actually on the Stack Overflow site. Chandrasekar said the company has various agreements to get pertinent information from OpenAI and others about things such as the prompt that the chatbot user is using. 

“There are a lot of subtleties in the engagement between the partner and us,” he told me. “We are working through each scenario,” he said, such as how much of an LLM’s “context window” (the recent memory of chats) is shared with Stack Overflow.

AI’s shortcomings

Chandrasekar said that Stack Overflow is fixing three major shortcomings of the technology for OpenAI and the other giants.

Also: Why scaling agentic AI is a marathon, not a sprint

One shortcoming is what he calls the trust point. “You don’t trust what’s coming out of it,” alluding to the infamous LLM hallucinations and confabulations

The second shortcoming is LLM brain drain. “If you don’t generate new information, these LLMs are not going to progress in their intelligence level,” as is clear from the controversies over so-called synthetic data that can pollute LLMs.

Last, and perhaps most important, “the answers coming out of the gen AI are actually not knowledge,” said Chandrasekar. “There may be an answer, but they may be tapped out on complexity because this is too complicated of a set of circumstances,” and therefore, really needs to have the rich context of Stack Overflow, he said. 

Teams integration 

At the same time that he has inked deals with the giants for the public Stack Overflow, Chandrasekar has begun a second part of the knowledge solutions business. Stack Overflow is integrating its public content into the Teams product for corporations that want to expand their internal knowledge resources for the purpose of agentic AI.

The Teams product was introduced because companies said they wanted their own version of Stack Overflow as a repository not for general programming knowledge but for their particular corporate processes. Now, said Chandrasekar, the same companies want to expand that information pool because they want to develop AI agents that do a lot more than programming.

“We have companies like Uber who have actually done this with us already, where they’ve built an AI chatbot that’s called an assistant, or, in a very generous sense, agent because it’s all about performing the action. But that agent is leveraging the Teams data for something like, ‘How do we actually do this’ inside Uber? The bot serves up an automatic answer inside a corporate chat. “The bot has all the knowledge from the team, so it’s surfacing the right information at the right time.”

Also: Stack Overflow CTO: From bootstrapped to scaling one of the Web’s biggest properties

In other words, agentic AI automates what employees traditionally do with Teams. “We noticed this because our APIs are red-hot; our APIs are being hit constantly by the bot.”

Given that trend, Chandrasekar is adding a new element to Teams, licensing the entire Stack Overflow public site data to the same Teams customers.

“We said, ‘Why don’t we take our knowledge solutions product, our public platform data, and also present that to companies alongside Teams data?’ An agent then can have the knowledge from 60 million questions and answers, and also all the knowledge specific to the company, and then it’s even more armed with the right answer at the right time.”

The Teams product, like the large licensing deals, is integrated into various products, such as Atlassian’s JIRA IT ticketing system.

Stack Overflow is negotiating licensing terms with enterprise Teams users. Chandrasekar declined to discuss pricing details other than to say, “It’s a value-based pricing model.” 

Also: 10 professional developers on vibe coding’s true promise and peril

“It’s early stages,” he said, in determining what the market will allow for such content licensing. 

Website enhancements

 At the same time as knowledge solutions are being developed, the public Stack Overflow site is receiving some fairly significant enhancements. 

The company is still “working our way back” to the level of traffic prior to ChatGPT, said Chandrasekar, without disclosing traffic numbers, adding the site “has not yet fully” made it back to the pre-ChatGPT level.

Chandrasekar is building out the functionality of the public site to make it more real-time. 

The traditional mode of Stack Overflow is one person posting a question and then others posting their suggested answers. 

Two other forms of exchange are buried deep within the site — chats and discussions. Chandrasekar describes these as “swim lane” modes of interaction that don’t provide the perfection of the main stack — overflow answers — but can get a response to someone much more quickly.

Chandrasekar’s philosophy is that “we want to provide multiple form factors and project types for technologists of different kinds.”

The company is also thinking about adding instructional video content from users. “There’s a lot of great content,” he observed. “Imagine if somebody is testing DeepSeek and we are able to able to, let’s say, live stream that and capture it and somebody else can learn from that experience. We really want to go from being a knowledge base into much more of a community site.” 

If this sounds to you like Reddit, Chandrasekar said he gets that a lot. The difference, he noted, is that “we are obviously a very specialized audience; we are not trying to be all things to all people.” He admires the larger social site. “I know them very well, and they’ve been great. They are very much a close cousin to us, or, maybe, a bigger brother.”

He observed that Reddit’s licensing deals with OpenAI helped pave the way for the knowledge solutions business.

There is always the danger in expanding a successful property that one can spread one’s efforts too thin. How does Stack Overflow place its bets?

“We want to pick the ones that resonate the most with our users,” he said. “I talk to the community a lot, basically engage with them to understand which ones to double down on. We are literally going and running tests on which ones actually make a difference.”

Also: Chatbots are distorting news – even for paid users

The relationship internally with gen AI has also changed. Early on, when ChatGPT came public, some users of Stack Overflow were grabbing ready-made answers from the bot. The site responded by banning the cut-and-paste replies. 

“But then, we said, let’s talk to the community and see how they wanted to do things,” he said. “One thing that became obvious is that people still found it to be fairly rough to just engage human to humans on asking questions.”

Traditionally, human moderation on the site might lead to moderators scolding repetitive or newbie questions. “If you’re asking a question about a technical subject, if someone had answered before, somebody would tell you, that’s a wrong question. Go search before you ask.”

There came “a huge opportunity to use AI.” 

The site recently went live with “Gemini-powered” answers. Now, “Gemini is giving you a prompting it’s all been asked and answered ” and taking you to the relevant listing, “all in a private window of just you and the AI,” so there’s no shame in your newbie inquiry. 

The changes to Stack Overflow’s public site are the most recent initiative, but the licensing deals and the additions to Teams seem to be helping the business already. 

Going forward

“We’re growing as a company,” said Chandrasekar, while declining to disclose financials. Stack Overflow is owned by European investment giant Prosus NV of The Netherlands, which acquired it four years ago for $1.8 billion

Prosus is publicly listed, so the company may disclose actual financial information about Stack Overflow when it announces its full fiscal year report, which it usually does toward the end of June every year.

The knowledge solutions part of the business has become Stack Overflow’s fastest-growing business, followed by the Teams sales, and the advertising business, which is a “very steady” business because of the constant demand to advertise to programmers where they spend time. Each of the three is a third of revenue, roughly, said Chandrasekar.

Perhaps the stickiest part of this is how the community handles it. There was pushback when the company first approached Google and OpenAI about data licensing — not surprising, as the community considers the 60 million questions their community property in a sense, even if it is legally the property of the Stack Overflow corporation.

There was so much pushback that some users said they were banned from the site if they caused a stir over the licensing deals.

Without getting into the details of past conflicts, said Chandrasekar, at this point, Stack Overflow users have come to realize that the company is not a not-for-profit and that it needs to make an income to serve its purpose. 

Also: Stack Overflow could suspend your account if you change a post to protest OpenAI’s deal

“I did an AMA two weeks ago,” an “Ask Me Anything” with users on Stack Overflow, he recalled. “I said, look, we’re not like any other site out there, but we don’t take donations. You have a business that supports the site, and one way to drive a business is to leverage what you have that’s useful to add value in the ecosystem.”

The result, he said, is that “they realized this is a good thing for Stack, and for them, because by not doing this, we are actually not capturing the revenue that we need to be able to invest back into the community, to build the moderator tools that they need. So, the community understands it now, slowly.”

Want more stories about AI? Sign up for Innovation, our weekly newsletter.





Source link

Leave a Comment