Bernd H., Senior Software Engineer
Deleting Customer Data (Intentionally!)
Once a project ends, consultants are normally contractually obligated to delete any copies of proprietary client data they received as part of a project, as well as any data they themselves created for the customer’s proprietary use.
The latter part need not even be detailed in a contract – e.g. in Austria, the law defines that given a contract to create some item, that item is the sole property of the client by default. This rule includes not just physical artifacts, but also information (such as software and digital art or documents). GDPR adds another wrinkle to the picture in that it mandates that commercially used personally identifiable information gets deleted once it is no longer required for a legitimate business need, or for legal requirements (such as financial records retention).
A Stitch in Time…
For easy records deletion, a bit of preparation is half the battle. Actionable recommendations include:
- In any web application / “SaaS” you use, create a new “project“ or “label“ for everything related to a project, then take care to use it for any project-relevant information you input. Research in advance that the corresponding software actually can delete everything tagged in this way at once!
- On local computers and servers you control, use a customer & project file folder hierarchy so you don’t have to go hunt for individual files.
This leaves three major pain points: Automatically generated files, backups and communication systems.
Automatically Generated Files
By automatically generated files, I mean things like browser caches and working copies that e.g. Microsoft Office and other productivity applications create, often not necessarily where you expect them. The latter, especially, can include very sensitive business information, and also “personally identifying information” in the GDPR sense.
The only thing that helps with this scattered information is searching the whole storage of a work machine for relevant terms and names of your contacts in the project and seeing what comes up. On file servers, this issue is (thankfully!) rare.
- Research where the software you use stores temporary- or working copies and caches.
- Have tooling that can search the contents of the kind of files that you produce across whole disks to automate search for content that should be deleted. This is often provided by the software itself – e.g., if you have Microsoft Office installed, Windows search can search into Office files. If the software you use stores plain text files, “ripgrep” is an extremely fast and vendor-neutral option.
Backups will generally retain sensitive information even after it has been deleted from the live systems – that’s most of the point of backups, after all!
How backups should be handled will depend on the situation. For extremely sensitive information, backups might well have to be filtered to remove such data when the live copies are removed. This can impose significant costs, especially when “deep” offline backups, such as a large tape library, has to be scrubbed.
In most cases, it will be enough to once again delete data that had been removed for contractual or legal reasons after it has been restored.
- If your customer insists on scrubbing backup copies, ensure that they cover the often very large costs imposed by such an operation.
- If deleting restored copies is sufficient, have a process in place that records contractual deletion of live data so that these deletions can be replayed automatically after a restore or disaster recovery operation.
- Keep up-to-date on the support your backup tooling offers for either way of handling intentional deletion. This is a rapidly-improving field, partly due to GDPR.
Communication software, whether email or instant messaging, is a whole other problem:
- we often communicate with the same contacts about different projects (or entirely different topics) and don’t want to delete the whole messaging history when a project concludes
- we use one address or account to communicate with many different parties, about multiple projects
There are no easy solutions for either problem: In some cases, the best solution may be to create per-project accounts and just deleting the whole account when the project concludes. This still does not guarantee that others remember to use the correct address for any given project if they know other ways to contact you!
In any case, consider that your email or instant messaging provider may retain your full messaging history. For information stored by third parties, you always have to rely on contractual agreements and your provider’s compliance with their own published policies. Here, their history with privacy and data breach issues is your best guide.
Ensure that your clients are informed which service providers you work with, where appropriate.
- Do due diligence on whom you entrust your data.
- Separate, and delete, project accounts where required.
- Have prior agreement with clients how to handle accidental disclosure to third parties, by you or your clients themselves, and how to communicate and handle third party breaches.
- Keep informed how your current (and prospective) providers handle GDPR compliance. Many of the tools developed for compliance are also
useful for handling contractual confidentiality.
Encrypting Internet Communications
Ensuring data confidentiality and obviating provider deletion by encrypting communications is a rather advanced topic. Sadly, all solutions for client-side encryption impose significant skill burdens on their users and/or the users’ IT support organizations.
It is essential, however, to consider that unencrypted Internet communication, even before the question of deletion comes up, can potentially be intercepted by many third parties, including:
- Internet service providers
- Internet backbone carriers
- Email providers, for both the sender and recipient
- Intermediate email providers transmitting emails for the sender (popular for deliverability – basically, experts for staying out of SPAM blacklists)
Here, the winning move may be not to play: Only deliver highly confidential information in person or through known-secure links (e.g. file uploads through a VPN).
- If feasible, encrypt email communications. PGP / GPG may be preferable since it requires no trust in third parties. Its use imposes extreme skill burdens on end users, however, and will likely be infeasible for non-expert users.
- When PGP is infeasibe, consider S/MIME. It uses the same certificate validation infrastructure as HTTPS. This, however, means that you
have to trust third parties that create the certificates (this seems to be “good enough” for e-commerce, after all – but note that certificate authorities have been breached on occasion). Alternatively, you can run your own certificate authority, but this requires very high skill around SSL and trust management in all involved IT departments.
- Consider not transmitting highly confidential information using email at all.
- For instant messaging, investigate whether the system you use offers end-to-end encryption, or whether you can switch to a provider that
offers such an option.
There are two shortcuts that can save a lot of work: Encryption and hardware investment.
Encryption can help by making deletion obsolete (or exceedingly easy). If you have encrypted all content that you would have to delete, it is sufficient to delete the encryption key. This can boil down to destroying (or handing over to your client) a hardware token. Key management hardware tokens (such as the popular Yubikey) have the additional benefit that they ensure no other copies of private keys remain in existence.
Another benefit of encryption is that it eliminates most risk from hardware theft – just make sure that you don’t store keys on the same device, or leave hardware tokens plugged in!
Hardware investment is the easiest approach for local workstation/server data deletion of all: You just buy separate systems for each project’s use, store all project data only on them, and fully delete and re-install them before moving them to the next project.
Consider your storage media if you choose this approach: A full secure deletion can impose significant wear & tear on solid state disk media. If you use SSDs, consider combining this approach with encryption: If your data are properly encrypted, securely destroying all copies of the private keys is fully equivalent to secure deletion.
The main drawbacks to this are inconvenience and cost – if you work on multiple concurrent client projects, lugging around (and paying for!) multiple laptops and backup disks can be quite the drag. Nevertheless, this approach can be highly practical if you only have a few projects (e.g. one workstation for company-internal business and one project workstation at a time). Many of the problems discussed above are just
non-existent if you choose this approach.
If you use computers with easily replaceable hard disks (e.g. desktop workstations with pluggable HDD frames, some laptops), you might also only switch hard disks between projects, saving time, money and luggage overhead.
If you choose this approach, make its feasibility a factor during hardware purchasing decisions, and always remember to use different encryption keys for different projects!