Synthetic data’s fine line between reward and disaster

Up to 20% of the data used for training AI is already synthetic — that is, generated rather than obtained by observing the real world — with LLMs using millions of synthesized samples. That could reach up to 80% by 2028 according to Gartner, adding that by 2030, it’ll be used for more business decision making than real data. Technically, though, any output you get from an LLM is synthetic data.

AI training is where synthetic data shines, says Gartner principal researcher Vibha Chitkara. “It effectively addresses many inherent challenges associated with real-world data, such as bias, incompleteness, noise, historical limitations, and privacy and regulatory concerns, including personally identifiable information,” she says.

Generating large volumes of training data on demand is appealing compared to slow, expensive gathering of real-world data, which can be fraught with privacy concerns, or just not available. Synthetic data ought to help preserve privacy, speed up development, and be more cost effective for long-tail scenarios enterprises couldn’t otherwise tackle, she adds. It can even be used for controlled experimentation, assuming you can make it accurate enough.

Source link

Synthetic data’s fine line between reward and disaster

Leave a Comment Cancel reply

VMWARE

Helping Public Sector Organisations Define Cloud Strategy

How to change the VLAN ID of the Service Console in ESX from the command line/console

Cisco UCS and Vmware Interfaces (Vnics) HA Design Considerations

Troubleshooting network and TCP/UDP port connectivity issues on ESX/ESXi(2020669)

vSphere Client Parameters

Configuration Templates

CUE Licenses

Trouble shooting Unity Express with Call Manager Integeration & Operational Issues

CME Configuration Example: SIP Trunks to Viatalk and VoIP.ms

SIP Phone registration – CME Configuration

CUE Voicemail + VPIM networking (CUE to unity)

Related Post

Leave a Comment Cancel reply

VMWARE

Configuration Templates