
According to Cybercrime Magazine, global data storage is expected to exceed 200 zettabytes in 2025. One zettabyte is equal to a billion terabytes or a trillion gigabytes. That’s a lot of data to secure.
Inside the corners of those storage systems lurks shadow data: unchecked and uncontrolled data that could expose your company’s sensitive information to criminals.
What is shadow data?
Shadow data is information created, stored, and managed inside systems or applications that aren’t under your company’s control, or fly under the radar of internal cybersecurity scans. It usually happens when employees use nonapproved or nonstandard applications for work-related activities. This is known as “shadow IT.” Employees are often well-intentioned when installing these applications, typically using the software for productivity and workarounds.
External networks might be less vigilant about policing downloaded apps or employees’ devices. But in-person employees can have the same effect if you don’t have a clear policy on safeguarding data, including how it can be used and what can be shared on connected systems. Connected systems include Internet of Things devices, personal devices and artificial intelligence (AI) systems.
A shadow data example
Let’s say your employee downloads an outside application for a presentation they’re giving to their team. They upload proprietary company data as part of the presentation. Their app saves information to an unknown file location on your network and in the cloud. The file location isn’t part of the usual cybersecurity scans and remains off the radar.
A few weeks later, a cybergang breaches your company’s networks and discovers a rogue app installed on some computers. They’re familiar with the app and know where to find its backup files. (Hackers are adept at locating and exploiting cybersecurity weaknesses, including known software vulnerabilities and hidden backup files left by apps.) They grab the backup folders, knowing they could contain valuable information. And they’re not encrypted because they’re not part of your company’s cybersecurity scans.
The cybergang isolates data from the employee presentation that contains internal tech and release notes about your top-secret product launch. They demand a ransom in exchange for not leaking your proprietary data.
Shadow data risks
The proliferation of data makes it harder to track and safeguard. Shadow data risk debuted in last year’s IBM’s “Cost of a Data Breach Report 2024.” According to the 2024 report, shadow data breaches were rising in frequency and cost, with 35% of all breaches involving shadow data. The report revealed that shadow data theft cost more than a typical breach and was harder to identify and contain, especially across multiple data environments (public cloud, private cloud and on-premises).
Shadow data continues to trend in this year’s IBM report, but with an added star: AI. Companies are adopting AI but not overseeing it.
AI shadow data risk: a disturbing upward trend
IBM’s “Cost of a Data Breach Report 2025” revealed that of the organizations that suffered a cyberattack, 20% involved shadow AI. Due to its conversational nature, it’s easy to forget that AI is an IT tool that uses data stores to function, like any computer software.
Shadow AI can fly under cybersecurity radar if it’s being used off-grid, since the cybersecurity team doesn’t monitor it. If your employees are using generative AI tools like ChatGPT to write reports, crunch data or create programming code, they could accidentally expose sensitive data while using the tool.
Businesses with high levels of shadow AI added $670,000 to the average data breach cost compared to businesses with low or no shadow AI, according to IBM’s 2025 report. Shadow AI breaches compromised more personally identifiable information (65%) and intellectual property (40%) data. The report noted that just one unmonitored AI system could lead to widespread exposure. In fact, shadow AI has already claimed a top spot for increasing data breach costs.
As more businesses integrate AI into their operations, shadow AI cybersecurity incidents could disrupt business activities and compromise sensitive data while evading detection. Without clear internal AI governance policies, shadow AI poses a concerning risk:
- 87% of organizations said they have no policies or processes addressing AI risk.
- Almost two-thirds of businesses that experienced a breach didn’t routinely audit their AI models.
- More than three-quarters reported not using adversarial testing on their AI models. (Adversarial testing is hiring someone to hack into your network to expose weaknesses so you can fix them before a real cyberattack.)
If you’re using AI in any part of your workflow, set up safety measures.
Secure your AI data, including training data, to avoid theft and misuse. Hackers could pollute your AI’s training data to skew its outputs. They could steal intellectual property and sensitive information as shadow data uploaded in an AI prompt.
An AI shadow data liability example
Imagine you roll out a generative AI application on your internal networks. Your executive team decides to share a summary of their quarterly updates on performance issues, finances and strategic plans with some of the department heads. They ask AI to summarize the meeting document into a single-page overview, keeping all names anonymous. It returns an accurate summary without any personal information.
Unbeknownst to the executive team, the AI incorporated the original document into its training data repository for future reference, leaving employee and financial information in an unsecured area. Now in violation of privacy laws, you expose your organization to lawsuits and other liabilities.
As if that’s not enough, you suffer a cyberattack. Because of the executive team’s unsanctioned use of AI, trade secrets, email addresses, and financial and account information were stolen in the attack. The hackers knew exactly where to look and swiped the information to create a spear-phishing attack. This enabled them to trick your accountant into transferring funds using a deepfaked persona.
Cybersecurity exposures and mitigations
To combat shadow data, you’ll need robust cybersecurity, including zero-trust architecture and strict policies on software and devices. (Zero-trust architecture treats all network traffic as a potential threat.)
Here are some other security considerations:
- Shadow data can lead to unsecured data, increasing the possibility of a breach.
- Shadow data may violate the European Union’s General Data Protection Regulation and the U.S. Health Insurance Portability and Accountability Act. Sensitive data could be stored outside secure and compliant systems, resulting in legal troubles and severe fines.
- Uncontrolled applications can become outdated or lack adequate cybersecurity, making them prime targets for hackers. They could lead to a zero-day vulnerability in your system. (This is a vulnerability that’s unknown to software developers, until cybercriminals exploit it.)
- Shadow IT increases the chances of an insider stealing or manipulating your data.
- Shadow data may delay the detection of and response to cyber threats, leading to increased damage and substantial costs.
Here are some ways to mitigate shadow data risks:
- Set clear policies on using nonsanctioned applications. Communicate them to your employees.
- Run regular audits to uncover unauthorized applications within your company.
- Develop an AI oversight plan, including what AI programs are authorized and how they’re contained within your cybersecurity.
- Train your employees on the risks of shadow data when using nonsanctioned apps. Promote safer software practices.
- Implement technology solutions that discover, control and protect shadow IT.
- Involve your IT teams so they can assess application security and fit within your company’s IT ecosystem.
- Establish processes to back up critical data regularly.
- Stay current on evolving technology guidelines, like the National Security Agency’s Deploying AI Systems Securely: Best Practices for Deploying Secure and Resilient AI Systems.
- Use data privacy tools to identify sensitive data stored on your networks, especially on generative AI systems.
Shadow data awareness
Keep your data and employees out of the shadows. Be transparent about your cybersecurity policies and why staying within tech boundaries is essential. If there’s an application your employees regularly use, maybe it’s time to officially bring that software on board so employees can use it safely. Whether it's in the cloud or your internal network, ensure your cybersecurity plan brings shadow data into the light.