Tokenization vs. Encryption, and When to Use Them
Before you start planning to manage tokens or encryption keys, it’s important to understand the key differences between tokenization and encryption. This post provides an overview of these techniques, and some guidelines on when to use each.
Tokenization and encryption are both obfuscation techniques. While the intent of both of these methods is to protect sensitive information, their implementations, strengths, and limitations can have unexpected consequences – especially when it comes to data privacy and compliance.
If You Use a Credit Card, You Need to Know Tokenization
Chances are you have used a credit card to pay for something online. If so, you have encountered workflows that rely on tokenization, even if you haven’t designed web-based payment sequences. This means tokenization is still part of your daily life and good to understand, even if just from a layperson’s perspective.
The payment card industry (PCI) defines certain mandates and requirements that payment card transactions must comply with to be considered legitimate. These PCI compliance requirements are in place to ensure the protection of payment data, with the ultimate goal of preventing fraud and other malicious misuse of PCI data like cardholder names, credit card numbers (PANs) and security codes (CVVs or CVCs).
The PCI Standards Council defines the standards and practices that merchants and other companies that handle PCI data must follow to ensure that credit card and debit card transactions are PCI compliant. These standards and practices are known as the Payment Card Industry Data Security Standards (PCI DSS). While there is no law mandating PCI compliance, such compliance is required to process card-based payments because the PCI Standards Council includes all of the major credit card networks, and courts have upheld the Council’s right to require compliance of any company who wants to use their networks.
Tokenization is one of the core components of PCI compliance. In data privacy technology, a token is a representation of data – like a pointer – that, on its own, has no particular meaning or exploitable value unless you have the ability to map it to the original value. When performing a web-based payment transaction, tokens are used instead of payment card data to keep that data from being intercepted and misused.
Tokenization is great for consumers and for companies that process card-based payments, because it helps to keep credit card data under the consumer’s control, and prevents fraud.
Tokenization: A Token Cannot Be Hacked
The use cases for tokenization expand beyond PCI into protecting sensitive data like personally identifiable information (PII), protected health information (PHI), and more. With tokenization, the actual data is stored somewhere secure and only used when absolutely necessary with a token redemption process called detokenization. A token can be referenced in multiple systems and services, or passed into an analytics pipeline, without compromising the privacy or security of the original data.
A tokenization scheme provides a one-to-one mapping: one token corresponds to one sensitive data element, like a name or PAN. Importantly, a token cannot be deciphered with a decoder. If a bad actor were to breach a database of tokens, they could not do anything useful with the tokens because they wouldn’t have the mapping to the data the token represents without access to the detokenization process. Essentially, tokenization is about substitution and isolation of sensitive data elements.
For more information on tokenization specifically, check out Demystifying Tokenization: What Every Engineer Should Know.
A simple example of tokenization is a coat check system. You don’t want to carry your coat around so you decide to check it. At a coat check, you swap your coat for a ticket that you will need to present later in order to retrieve your coat. Should you forget to retrieve your coat, the coat check ticket will not keep you warm once you leave the establishment – on its own, the ticket can’t perform the function of the coat it represents nor can you somehow turn the ticket into a coat. In order to get your coat back, you have to exchange the ticket for it at the coat check, as shown in the following diagram.
Tokenization in Payments
Now that we’ve covered tokenization in a bit more depth, let’s take a closer look at a real world example of tokenization for payments, which we mentioned earlier in this post.
In PCI tokenization, tokens are passed between the merchant and the payment processor. The actual credit card information is only shared among the processor (for example, Stripe), the credit card networks (such as Visa or Mastercard) and the card’s issuing bank.
Here’s how tokenization is used in a typical payments workflow:
- You give the online merchant your PCI data – payment account number (PAN), CVV, and card expiry date – to purchase an item.
- The merchant registers your credit card information with a payment services vault and processor (such as Stripe) and the payment service returns a token. The merchant stores the token
- When it comes time to issue a transaction against your card, the merchant passes a token representing your credit card information to the payment processor (i.e., Stripe).
- The payment processor then transforms the token into meaningful PCI data, and sends that PCI data on to the credit card network (i.e., Visa).
- The credit card network then passes the PCI data to the issuing bank to complete the payment.
You can see how this is similar to the coat check example, except in this case, the exchange of tokens for sensitive credit card information also has a layer of governance and logging.
For more information about the latest developments in tokenization for payments, check out Network Tokenization: Everything You Need to Know.
Tokenization for Data Residency
The advantages of tokenization go beyond consuming, storing, and processing payment information. Not only can tokenization protect sensitive data, tokenization can help keep personal data localized to help meet data residency requirements included in global privacy laws such as Brazil’s LGPD and Europe’s GDPR.
Data residency refers to the physical location of data. In other words, if the server hosting a database of user information is located in a data center in the United States, then the data residency would be described as the United States. With the advent of remote infrastructure and the growth of data collection, some countries have instituted laws governing where the personal data of their citizens can be physically stored.
Tokenization lets PII remain in its country of origin without restricting the use of other, non-sensitive data across globally distributed systems.
In data residency tokenization, tokens are used for the core backend infrastructure, databases, data lakes, analytics, and third party integrations so that PII can remain in one location to comply with data residency requirements.
Here’s how tokenization is used in a typical data residency workflow:
- When you subscribe to an online streaming service, you provide PII such as your name, address, and phone number.
- The streaming service uses a data privacy vault in your region to store your PII in plaintext.
- For the streaming service’s backend, databases, data lakes, analytics, and any other function or third party integration that doesn’t need PII, tokens are used and detokenized as needed.
The following diagram shows how this works in the streaming service’s architecture:
To learn more about data residency, see What is Data Residency, and How Can a Data Privacy Vault Help?
Encryption: Algorithms and Keys
Encryption has a history that goes back to the ancient world, with the earliest forms consisting of using a “key” to replace one letter or symbol for another when encoding (encrypting) a message into cyphertext that could later be decoded (decrypted) using that same key to acquire the original plaintext. Because it’s existed for over 3000 years, encryption and decryption have an entire branch of mathematics devoted to them: cryptography.
In modern cryptography, precise and elaborate algorithms are used to encrypt data into ciphertext in order to transmit sensitive information. Ciphertext is a string of unreadable characters that can represent many types of data – including long-form data like documents or videos.
To decrypt ciphertext into plaintext, the receiver must have access to the encryption key. This assumes that only the intended authorized parties have access to the key.
But, unlike a token – which is meaningless on its own – encryption can be hacked. A bad actor could come into possession of an encryption key or – with enough time and determination – mathematically figure out the encryption algorithm and key on their own. The stronger an encryption algorithm and key are, the harder it is to hack the encryption.
If we return to our earlier example of the coat check but added encryption, then in addition to swapping your coat for a coat check ticket, the coat would also be locked in a closet accessible only with a key held by the coat checker.
In this example, you can only get your coat if you have the ticket and if the coat checker is present with the key to unlock the closet and retrieve your coat. Anyone in possession of the key can unlock the closet, which means that if the coat checker isn’t careful and a bad actor has the closet key, you could lose your coat even if you don’t lose your ticket and present it to the coat checker!
Let’s look at a few real-world applications for encryption.
Encryption for Payments
Let’s look at a real world example of how encryption is used in payment card transactions that we’ve discussed throughout this post. If we return to the example of a PCI compliant web-based payment, encryption is used to securely pass credit card information between a payment processor, a credit card network, and an issuing bank.
Here’s how encryption is used in a typical payments workflow:
- You give the online merchant your payment account number (PAN), CVV, and expiry date to purchase an item.
- The merchant uses a service like Stripe as their payment processor, passing a token representing your credit card information to Stripe. The merchant stores the payment data as a token.
- Stripe has the ability to transform the token into meaningful data. Using an encryption key, Stripe encrypts the plaintext credit card information into ciphertext, which is then passed onto the credit card network.
- The credit card network then passes the data to the issuing bank to complete the payment. The issuing bank is in possession of the same encryption key Stripe used to encrypt the credit card information and can therefore decrypt the ciphertext back into plaintext and use it to complete the payment.
The following diagram shows which parts of a payment card workflow use encryption:
PCI compliant payments are a good example of how encryption and tokenization can be used together to provide better security than either one could provide on its own.
Encryption for Healthcare
Similarly, as more and more aspects of health services migrate from the doctor’s office to the Internet, encryption has become an essential aspect of complying with HIPAA requirements. To electronically transmit PHI, HIPAA requires entities to “implement a mechanism to encrypt PHI whenever deemed appropriate.”
In other words, as PHI travels between entities such as between a doctor and a patient, the data must be encrypted from plaintext to cyphertext and then decrypted for the recipient to read.
Tokenization vs. Encryption: A Side by Side Comparison
To summarize, tokenization and encryption offer different, but complementary, solutions to the problem of protecting sensitive data. To understand them better, it helps to look at them from several angles: privacy, vulnerability, scalability, flexibility, and management.
The following table compares these techniques:
A Thorough Approach to Privacy Includes Both Tokenization and Encryption
When it comes to ensuring data privacy, tokenization and encryption are complementary technologies and a solid data privacy strategy implements both. Skyflow not only implements both tokenization and encryption, we secure your sensitive data in a data privacy vault and provide polymorphic encryption that enables operations on data without the need to ever decrypt – and potentially expose – sensitive information. To learn more about tokenization and encryption, check out this Partially Redacted podcast episode featuring Skyflow Product Lead Joe McCarron (and join the Partially Redacted community).
You could build all of these features yourself, or you could get these capabilities using Skyflow’s powerful but intuitive API that makes it easy to protect the privacy of sensitive data. You can try out Skyflow’s tokenization, encryption, and other features when you sign up to Try Skyflow.