- If ChatGPT produces AI-generated code for your app, who does it really belong to?
- The best iPhone power banks of 2024: Expert tested and reviewed
- The best NAS devices of 2024: Expert tested
- Four Ways to Harden Your Code Against Security Vulnerabilities and Weaknesses
- I converted this Windows 11 Mini PC into a Linux workstation - and didn't regret it
Report Highlights Rising Risks in Sensitive Data Management
The volume of sensitive data that companies are harbouring in non-production environments, like development, testing, analytics, and AI/ML, is rising, according to a new report. Executives are also getting more concerned about protecting it — and feeding it into new AI products is not helping.
The “Delphix 2024 State of Data Compliance and Security Report” found that 74% of organisations that handle sensitive data increased the volume kept in non-production, also known as lower, environments in the last year. What’s more, 91% are worried about their expanded exposure footprint as a result, putting them at risk of breaches and non-compliance penalties.
The amount of consumer data that companies hold is rising overall due to the growth in the number of online consumers and their ongoing digital transformation efforts. The IDC forecasts that by 2025, the global datasphere will grow to 163 zettabytes, ten times the 16.1 zettabytes of data generated in 2016.
As a result, the amount of sensitive data, such as personal identifiable information, protected health information, and financial details, being stored is also increasing.
Sensitive data is often created and stored in production, or live, environments like the CRM or ERP, which have tight controls and limited access. However, standard IT operations often result in data being copied multiple times into non-production environments, allowing more personnel access and increasing the risk of breach.
The report’s findings were the result of a survey of 250 senior-level employees at organisations with at least 5,000 employees that handle sensitive consumer data. It was conducted by software provider Perforce.
SEE: National Public Data Breach: 2.7bn Records Leaked on Dark Web
Over half of businesses have already experienced a data breach
Over half of respondents said they had already experienced a breach of sensitive data kept in non-production environments.
Other evidence supports that the issue is worsening: a study by Apple found that there was a 20% increase in data breaches from 2022 to 2023. Indeed, 61% of Americans have learned their personal data had been breached or compromised at some point.
The Perforce report found that 42% of the respondent organisations have experienced ransomware. This malware, specifically, is a growing threat globally; a study from Malwarebytes published this month found that global ransomware attacks increased by 33% in the last year.
Part of the problem is that global supply chains are becoming longer and more complex, increasing the number of potential entry points for attackers. A report from the Identity Theft Resource Center found that the number of organisations impacted by supply chain attacks surged by more than 2,600 percentage points between 2018 and 2023. Furthermore, payouts exceeded $1 billion (£790 million) for the first time in 2023, making it an increasingly lucrative exploit for attackers.
AI is the biggest culprit when it comes to insecure consumer data
With companies now adopting AI into business processes, it is becoming increasingly difficult to keep control of what data goes where.
AI systems often require the use of sensitive consumer data for training and operation, and the complexity of the algorithms and potential integration with external systems can create new attack vectors that are hard to manage. In fact, the report found that AI and ML are the leading causes of sensitive data growth in non-production environments, as cited by 60% of respondents.
“AI environments may be less governed and protected than production environments,” the report’s authors wrote. “As a result, they can be easier to compromise.”
Business decision-makers are aware of this risk: 85% report concerns about regulatory non-compliance in AI environments. While many AI-specific regulations are in their infancy, GDPR requires personal data used in AI systems to be processed lawfully and transparently, and there are various applicable state level-laws in the U.S..
SEE: AI Executive Order: White House Releases 90-Day Progress Report
The E.U. AI Act came into force in August, which sets strict rules on the use of AI for facial recognition and safeguards for general-purpose AI systems. Companies that fail to comply with the legislation face fines ranging from €35 million ($38 million USD) or 7% of global turnover to €7.5 million ($8.1 million USD) or 1.5% of turnover, depending on the infringement and size of the company. It is thought that more similar AI-specific regulations will spring up in other regions in the near future.
Other concerns about sensitive data in AI environments, cited by over 80% of the respondents to the Perforce study, include using low quality data as input into their AI models, personal data re-identification, and theft of model training data, which can include IP and trade secrets.
Businesses are worried about the financial cost of insecure data
Another main reason large businesses are so concerned about insecure data is the prospect of a hefty non-compliance fine. Consumer data is widely subject to expanding regulations, like GDPR and HIPAA, which can be confusing and change frequently.
Many regulations, like GDPR, apply penalties based on annual turnover, so bigger companies face bigger charges. The Perforce report found that 43% of respondents have already had to pay up or adjust non-compliances, and 52% have experienced audit issues and failures related to non-production data.
But the cost of a data breach can go past the fine, as a portion of the lost revenue comes from halted operations. A recent Splunk report found that the biggest cause of downtime incidents was cybersecurity-related human errors, such as clicking a phishing link.
Unplanned downtime costs the world’s largest companies $400 billion a year, with contributors including direct revenue loss, diminished shareholder value, stagnant productivity, and reputational damage. Indeed, ransomware damage costs are predicted to exceed $265 billion by 2031.
According to IBM, the average cost of a data breach in 2024 is $4.88 million, a 10% increase over 2023. The tech giant’s report added that 40% of breaches involved data stored across multiple environments, like public cloud and on-prem, and these cost more than $5 million on average and took the longest to identify and contain. This shows that business leaders are right to be concerned about data sprawl.
SEE: Nearly 10 Billion Passwords Leaked in Biggest Compilation of All Time
Taking steps to secure data in non-production environments can be resource-intensive
There are ways that data stored in non-production environments can be secured, like by masking the sensitive data. However, the Perforce report found that businesses have several reasons why they are reluctant to do so, including that respondents find it difficult and time-consuming, and because it may slow down the organisation.
- Nearly a third are concerned that it may slow down software development, as replicating production databases to non-production environments securely can take weeks.
- 36% say masked data can be unrealistic and therefore impact software quality.
- 38% think the security protocols may inhibit the company’s ability to track and comply with regulations.
The report also found that 86% of organisations allow data compliance exceptions in non-production environments to avoid the hassle of storing it securely. These include using a limited data set, data minimisation, or gaining consent from the data subject.
Recommendations for securing sensitive data in non-production environments
The Perforce team outlined the top four ways businesses can secure their sensitive data in non-production environments:
- Static data masking: Permanently replacing sensitive values with fictitious, yet realistic equivalents.
- Data loss prevention (DLP): A perimeter-defence security approach that detects potential data breaches and theft and attempts to prevent them.
- Data encryption: Temporarily converts data into code, allowing only authorised users to access the data.
- Strict access control: A policy that categorises users by roles and other attributes and configures these users’ access to datasets based on these categories.
The authors wrote: “Protecting sensitive data in general is not easy to do. AI/ML adds to that complexity.
“Tools that specialise in protecting sensitive data in other non-production environments — development, testing, and analytics, for example — are well-positioned to help you protect your AI environment.”