- Beyond Technology: How Cisco teams break through organizational barriers to build workplaces of the future
- Samsung is selling the popular Frame TV with a free bezel for up to $1,800 off
- Dell wants to be your one-stop shop for enterprise AI infrastructure
- I thought my favorite browser blocked trackers but this free privacy tool proved me wrong
- PwCのCITO(最高情報技術責任者)が語る「CIOの魅力」とは
Synthetic data’s fine line between reward and disaster

Up to 20% of the data used for training AI is already synthetic — that is, generated rather than obtained by observing the real world — with LLMs using millions of synthesized samples. That could reach up to 80% by 2028 according to Gartner, adding that by 2030, it’ll be used for more business decision making than real data. Technically, though, any output you get from an LLM is synthetic data.
AI training is where synthetic data shines, says Gartner principal researcher Vibha Chitkara. “It effectively addresses many inherent challenges associated with real-world data, such as bias, incompleteness, noise, historical limitations, and privacy and regulatory concerns, including personally identifiable information,” she says.
Generating large volumes of training data on demand is appealing compared to slow, expensive gathering of real-world data, which can be fraught with privacy concerns, or just not available. Synthetic data ought to help preserve privacy, speed up development, and be more cost effective for long-tail scenarios enterprises couldn’t otherwise tackle, she adds. It can even be used for controlled experimentation, assuming you can make it accurate enough.