The implementation of General Data Protection Regulation (GDPR) has caused the business community to reconsider how they collect, hold and distribute personal data. The GDPR is going to impact every business operating in the EU and with the effective date of May 25, 2018 fast approaching, cyber security professionals are working tirelessly to amend their practices in order to comply with the new legislation.
The GDPR provides a number of rights for individuals whose data you may be handling; these include the right to be informed; the right of access; the right to rectification; the right to restrict processing; the right to data portability; the right to object; rights in relation to automated decision making and profiling.
On top of this the right to erasure means that the data subject has the right to request erasure of all personal data related to them on any one of a number of grounds.
With all the new compliance methods being introduced, one thing seems to slip through the cracks – the fact that it is not uncommon for data to be moved around an organisation outside of official controls, even when established data channels are available.
This paper will look at the risk this rogue data poses to the GDPR compliance and how it can be aligned to the GDPR as well as the real, day to day business benefits that can make this alignment rework cost-effective and justifiable regardless of the GDPR.
What is Snapshot Reporting?
The ‘snapshot’ is often used in reporting terminology to describe a data extract for a specific moment in time. This does not automatically mean the snapshot is a GDPR liability…but it doesn’t mean it isn’t either!
Why Snapshot Reporting Happens
In many cases, changes to the SQL script used to export the data is all that is required to turn a snapshot into a repeatable report. And once that logic is in place, the extract can be refreshed when required for the full data set, rather than accumulating a collection of historic extracts to build the same set.
Common situations that can make this challenging include:
- It’s not SQL!
Most off-the-shelf software is packaged with some form of built-in reporting. This may take the form of a report collection that meets the more basic queries the vendor thinks are important, or contain a method of report configuration that allows bespoke reporting… but usually with a subset of functionality to allow basic report development.
The functionality of these built-in reporting utilities varies widely from product to product. Some do cater for the extended logic required for non-snapshot reporting, some do not.
- Not Enough Time Stamps
The key to reusable extract logic is being able to consistently identify when something has happened/changed in the source data. When timestamps do not exist for certain events it becomes impossible to know when or what has changed.
- Available Specialist Reporting Software
Not having access to specialised reporting software with which to build dynamic reports is more common than many would suspect. Sure, your organisation has an embedded reporting service but it takes forever going through the official channels and the data warehouse doesn’t have quite what you need anyway.
In this scenario, a snapshot direct from the source data looks like it will solve the problem nicely. And it will, until the source data is updated and the extracts are out of alignment.
Ironically, this snapshot data invariably finds itself being sliced and diced in MS Excel, a product that is perfectly capable of direct data source access and applying the logical approach needed to avoid snapshot reporting.
- In-house Expertise
Every instance of snapshot reporting I have encountered has been a ‘best endeavours’ solution created by well-meaning employees trying to fulfil a business need that would otherwise be impossible. And it is not uncommon for key business activities to be dependent on said snapshots. (It needs pointing out however that for an experienced SQL user, this sort of change is a logic issue that is well within their skillset to resolve.)
If any variation of the above scenario exists within your organisation there is a good chance the accuracy of current reporting is compromised regardless of the GDPR. Fortunately, none are necessarily showstoppers in the handling of snapshot reports but can change the focus of the problem from being purely logical to encompassing technical challenges, which we’ll look at in the next section.
Non-repeatable snapshots may have a minimal GDPR risk if they are deleted after use, but are still a challenge to Data Governance and auditing.
It is worth considering the risk snapshot reporting brings to the organisation, before we look at the GDPR implications:
- Manually Intensive
Snapshots don’t always mean that manual activities are required to derive value, but it is usually the case as work normally done within a data warehouse still needs to be applied. This can range from summary operations to merging multiple snapshots together for historic trends.
- Single Point of Failure
It is bad enough when an employee with extensive, specialist knowledge leaves the organisation, but when their private stash of snapshot extracts vanishes with them the risk can be exasperated. And in the likely event that the snapshot extracts have had undocumented manual transformations, that knowledge is lost too.
- Outside of Data Governance
If the logic used to create a data extract produces different results today than it did last Thursday, the extract recipient is likely to collect historic snapshots and build an unofficial data repository that is unknown to the Data Protection Officer (DPO).
The wider organisation and its associated data controls cannot know what snapshots have been collected for later use or what additional transformations have been applied. This situation creates a risk for audit trails, data lineage and ‘one version of the truth’ as well as GDPR compliance.
The GDPR Risk
The below illustration shows how personal data can be extracted out of Data Governance and the GDPR compliance through a snapshot and resurface months later.
What are the GDPR Implications?
Not honouring a request to be forgotten, or using personal data for other than its permissible purpose opens the organisation to the GDPR risk.
There are two levels of financial penalties related to GDPR:
- Up to €10 million or 2% of the organisation’s global annual turnover from the previous financial year (whichever is higher).
- Up to €20 million or 4% of the organisation’s global annual turnover from the previous financial year (whichever is higher).
The expectation under the GDPR is that the Data Protection Officer (DPO) will have the organisation’s data repositories documented and know exactly where to locate any instances of personal information.
With snapshot extracts being shared outside of any formal controls on distribution and content, the DPO will not have a view of these repositories which can hold data that has been removed from the rest of the organisation.
The GDPR may exist because of deliberate exploitation of personal data and shoddy security practices, but this does not make responsible organisations immune to getting caught in the crossfire.
What is the Solution?
The rest of this section outlines two possible approaches to handling the risk of snapshot reporting, their suitability will depend on current reporting practices and available resources.
When a Snapshot is not a Snapshot (Logical Approaches)
In a scenario where the only way to get data for the previous month is to run a snapshot extract on the first day of the current month: the recipient will collect those extracts in order to build quarterly and yearly summaries.
Most high GDPR risk snapshots are the result of this approach to report building and they are easily addressed once identified.
In this specific example, the recipient requires two changes to their current practices:
- The option to extract the data for any full month at any time and receive the same result.
- The option to extract the data for any previous quarter (or year) at any time and receive the same result, as well as aligning to the results of the aggregated monthly extracts for the same period.
These two changes and a policy of any reports being refreshed with current data before distribution will go a long way to minimise the GDPR risk without putting all data extracts into existing Data Governance.
Extended Enterprise Data Governance
The ideal solution for handling snapshot reporting is to shut it down completely and provision all data through a governed service.
This is likely to be expensive, but still cheaper than the possible GDPR fines!
Unfortunately, it is also likely to be time consuming, so applying the logical approach outlined above as an immediate safeguard is highly recommended.
The specifics on how this can be implemented will vary dramatically from organisation to organisation and is beyond the scope of this paper.
What to Do Next
The following bullet points outline a list of investigatory activities that should be undertaken if there is a suspicion of snapshot reporting within an organisation. The amount of effort to carry out the below suggestions depends greatly on what data governance is already in place and the volume of reporting in use.
- Review Known Reports/Extracts
The people using snapshot extracts within an organisation are doing so to meet a business need. If that business need can be fulfilled by a more stable, less manually intensive activity it should be welcomed.
- Review Unknown Reports/Extracts
Finding snapshot are currently in use which are not known to the organisation is its own challenge. The ease of auditing what data is being extracted and how will vary depending on the technical approach used: and may be as simple as refreshing a canned audit report to a manually surveying activities.
Whether fixing is bringing a snapshot under strict Data Governance through an agreed method, such as being provisioned through a data warehouse, or simply improving the snapshot extract logic. The fix is a success if there is no longer a need to store data extracts outside of Data Governance.
- Remove and/or Refresh
Once troublesome snapshots are identified, and new data provisions are in place: any existing snapshot extracts should be deleted with the user of those extracts confident that the data will be available when they need it (i.e.: the user has a process they can invoke to get the data they need from the governed data service).
- Amend Existing Data Governance Processes
With all the above done, it is important to improve or introduce official processes that stop the proliferation of snapshot reporting reappearing over time.
Regular audits to ensure Data Governance is maintained should be considered good practice in general, and are essential for continued data health.
The variation between organisations and technologies make detailed technical or procedural recommendations impossible. Sooo..
Get in Touch!
One of the biggest challenges in addressing snapshot reporting is having the available resource with the required skillset to carry out the work, work that requires a mix of analysis and in-depth knowledge of data handling (including SQL).
In-house talent may be lacking or, more likely, too busy with their day job for extra work and hiring external expertise for a two-week piece of work can be difficult and expensive.
About the Author
I have two decades in the IT industry, consulting for multiple companies and public sector organisations, always with a focus on data in one way or another. I’ve done everything from greenfield data warehouse implementation to statistic analytics, and everything in between.
I am always more than happy to undertake any work that leads to the demise of snapshot reporting!