Data Remanence: Cloud Computing Shell Game

Everybody knows that dragging a file into the trash and then emptying the trash doesn’t actually erase the file. It simply indicates to the file system that the file is deleted, but the data in the file remain on the hard drive until the file system eventually overwrites the file. If you require the actual erasure of deleted files, then you must take an active step to erase the portion of the drive that contained the file, perhaps by explicitly overwriting each bit of the original file. Even then, it may be possible (although generally quite difficult) to recover parts of the original file, due to the magnetic properties of the storage medium. We call this problem data remanence.

Cloud Computing complicates the data remanence issue enormously. You typically have no visibility into the physical location of your data in the Cloud, so overwriting the physical media is virtually impossible. The Cloud infrastructure may distribute your storage or virtual machine instance across multiple physical drives. And furthermore, deprovisioning that instance is similar to dragging it to the trash: the data that your instance wrote to the various drives remain until the Cloud provider eventually gets around to reallocating the sectors you were using to other instances. And even then, an enterprising hacker might be able to read your data by looking at the bits in their newly provisioned instance.

Unfortunately, the current state of the art for dealing with data remanence in the Cloud is a shell game: applications relegate the solution to the infrastructure level, while the infrastructure considers the problem to be at the application level. To make matters worse, no one seems to be focusing on the data remanence problem in the Cloud. That is, except for the hackers, who will be quietly stealing your data before you know what happened.

Encryption: Necessary but not Sufficient

Encryption is the obvious first line of defense against the data remanence problem. Make sure all the data you store in the Cloud are encrypted. Manage your keys locally, rather than putting them in the Cloud. In this way, not only are your data confidential, but all you have to do to securely delete your data is to delete (or expire) the key.

Problem solved, right? Not so fast.

Such application-level encryption has a major limitation. There’s simply not much you can do with encrypted data unless you decrypt them, other than simply store them or move them around. If you decrypt your data in the Cloud, then the data remanence problem once again rears its ugly head. As a result, application-level encryption can only solve the data remanence problem when you’re using the Cloud for storage only. If you want to process your data in the Cloud, the approach is insufficient.

Perhaps we should handle encryption below the application level, say, at the media layer. With media encryption, you essentially have an encrypted volume in the Cloud. You must present the appropriate credential to mount the volume, just as you would a local hard drive that has media encryption. Media encryption protects you from stolen hard drives (or your Cloud provider going bankrupt and putting the drives on eBay), but it is still insufficient for dealing with the data remanence issue.

The limitation of media encryption in the Cloud is that it only protects read/write operations to the file systems or databases that are physically present on the encrypted media. Other operations, however, may not have adequate protection, for example, message queuing, data caching, and logging. In a traditional on-premise server environment, your systems people are fully in control over how and where they handle such operational or transitory data. However, in the Cloud you have no such control. The Cloud provider’s underlying provisioning infrastructure may use a caching scheme as part of its elastic load balancing, and you’d be none the wiser. Remember, you may believe queues or caches are inherently temporary, but the data remanence issue centers on situations where “temporary” really means “unpredictably persistent.”

One approach to addressing this problem that is gaining in popularity is “Virtual Private Storage,” or VPS. With VPS, encryption and decryption (among other capabilities) take place transparently on an intermediary that negotiates all interactions with the Cloud. For example, buy one of the new generation of Cloud appliances, put it in your DMZ, and configure it to encrypt everything going from your network up to the Cloud, while decrypting in the other direction. From the user’s perspective such security measures are entirely transparent; they don’t have to worry about confidentiality or data remanence in the Cloud. From the perspective of the Cloud, none of your data are ever unencrypted, whether written to a hard drive or temporarily stored in a queue or a cache somewhere.

The Missing Piece: Meaningful Use

Unfortunately, neither VPS or media encryption is a complete solution, because they both limit what you can do in the Cloud environment. In essence, all of the encryption approaches we’ve discussed treat the Cloud as a storage option. It’s true that Cloud storage is an essential part of the Infrastructure-as-a-Service (IaaS) story. But what if you want to do more with the Cloud than IaaS?

A wonderful example of this question comes from the healthcare industry. And even if you’re not in healthcare, the same challenges may apply to your organization. As you might expect, there are stringent, heavily regulated standards for the confidentiality of Electronic Health Records (EHRs). Encryption techniques traditionally provide sufficient confidentiality for these sensitive data. As solution providers build Cloud-based EHR applications, however, the data remanence issue rears its ugly head.

Cloud storage itself isn’t the issue. Put EHRs in the Cloud, move them around, and bring them back from the Cloud: no problem there. But the regulations require more than storage. In the US, for example, the HITECH Act “promotes the adoption and meaningful use of health information technology.” It then goes into quite a bit of detail as to what “meaningful use” means, and it’s a lot more than IaaS can provide. For example, e-prescribing (eRx) and clinical decision support are two obvious meaningful uses of EHRs that the healthcare industry requires from Cloud-based solutions.

The challenge is that both eRx and clinical decision support necessitate actually doing something interesting with EHRs in the Cloud—and that means decrypting EHRs in the Cloud, which brings us back to the data remanence issue. IaaS simply cannot fully solve this problem, because it’s at the application level. Software-as-a-Service (SaaS) also cannot fully resolve the problem, because SaaS solutions alone cannot deal with the remanence issues inherent in having decrypted data in the Cloud.

The ZapThink Take

Fortunately, there is a third Cloud service model: Platform-as-a-Service (PaaS). ZapThink has lambasted PaaS as warmed over middleware in the Cloud, and truth be told, many PaaS solutions are still little more than thinly veiled middleware. The fact still remains that it’s up to the PaaS vendors to solve the Cloud data remanence problem, since all of the gaps in media encryption and application-level encryption are within the realm of PaaS.

It’s not clear, however, that any PaaS vendor has fully solved this problem yet. There are many moving parts to a platform, after all: messaging, transactionality, data storage and caching, framework APIs, and more. Place those capabilities into the dynamically provisioned Cloud environment. Then, ensure the platform never writes unencrypted data to physical media, even for data in transit.

Essentially, the PaaS vendors must rise to this challenge and build their offerings from the ground up with data remanence in mind. Until they do, no organization should trust them with EHRs or data of similar sensitivity. Of course, with challenge comes opportunity. Are you a vendor who is working on a solution to the Cloud data remanence problem, or a Cloud user who is struggling to find such a solution? Drop us a line, or better yet, check out our new online Cloud Security Fundamentals course.

Image Source: vvracer, Flickr. CC BY 2.0 License.