June 16, 2023
The Build vs. Buy Dilemma: The Real Cost of Data Privacy
As companies’ use of sensitive data continues to grow exponentially and regulations become more stringent, organizations face the crucial decision of whether to build their own data privacy solution or purchase an existing one.
While developers often lean towards building new technical capabilities, they have a tendency to only look at the cost of addressing the initial set requirements when creating new solutions. This can lead to underestimating the secondary costs associated with the maintenance and scalability of an in-house solution.
In this post we take an in-depth look at what’s required (in terms of cost and effort) to build your own data privacy solution, and the various features and requirements you need to think through. We also explore the types of expertise your team will need to build a data privacy solution, and the difference between building for compliance versus building for best-in-class security, without sacrificing data usability.
The Cost of Managing a Service
When developers are asked to assess the cost of a project, they tend to focus solely on the initial development work required to achieve the desired outcome, or in many cases, to achieve 80% of the desired outcome to get the service online and functional. Then, with more context, the team can assess how to complete the remaining 20% of the required functionality. Across many domains of software development, these ratios are often a useful guideline.
But in the case of managing a service the “final 20%” turns out to be much more than 20% of the total work required.
That’s because it’s easy to overlook the long-term aspects of maintenance, scalability, and the ability to accommodate unforeseen future requirements. That’s why, for many technologies (and especially a data privacy service) this “final 20%” is where the bulk of the cost and effort resides. And, a significant portion of that cost persists over time, taking the form of ongoing maintenance expenses.
For this reason, the effort required to create and maintain a product or service is somewhat like an iceberg: most of it is hidden.
As a simple example, consider the idea of self-hosting and running Git yourself versus paying for GitHub.
Would You Build Your Own Git?
Many organizations choose to pay for GitHub so that they have access to features like private repositories, increased storage and resource limits, advanced governance and security, premium support, and enterprise grade features like SSO.
However, instead of paying for GitHub, you could just download and self-host Git. It’s free and open source, so why not just install and use it yourself? That would presumably save your company money, and you would retain total control over your source code. It’s a win-win, right?
But, it’s not quite that simple. There are actually a lot of steps to complete and issues to think through before you can self-host Git for a company to use internally.
And in all likelihood, your full self-hosting costs will end up being higher than if you simply paid to use GitHub.
For example, beyond cloning Git, you’ll need to work through the following challenges to self-host it for your company:
- Server Infrastructure: In order for all the developers in your company to use Git, you’ll need to host it somewhere. So, you’ll need to spin up an AWS EC2 instance and install Git there (or use a similar hosting solution).
- Git Server: You’ll need to use Git server software to manage your repositories, which means using GitLab, or something similar.
- Configuration: Source code security is vitally important, so you’ll need to configure authentication, user management, repository permissions, and potentially other settings on your Git server.
- Domain and SSL Certificate: To protect your source code by enabling HTTPS, you’ll need to register a domain for your Git server and obtain an SSL certificate.
- Security: You can’t let just anyone read your source code, or commit PRs to your repositories, so you need to take steps to secure your repos. Additionally, you’ll need audit logs so that you have an audit trail available in case of mischief by a rogue employee or unauthorized access by malicious actors outside of your company.
- Backup and Disaster Recovery: Your Git server is a mission-critical asset for your company. So, you’re on the hook for backup and recovery. If your server fails, your organization will need a way to keep developing your software. If you want to pay for a backup and recovery system, then you’ll need to build one that backs up the repositories and configuration files to a separate server. You’ll also need to develop a disaster recovery plan and make sure others are trained in how to execute it so you don’t become the “single point of failure” for your company.
- Monitoring and Maintenance: You’ll need to keep an eye on the health of the Git server, so you’ll need a monitoring and logging system. You will also need to keep the server and software up to date.
- Integration and Collaboration: Ideally, your Git server integrates with other tools and services to enhance collaboration and productivity. For example, it should integrate with your company’s issue tracking systems, CI/CD pipelines, and code review tools.
So, something as straightforward as running Git yourself to keep you in control and cut costs now looks very expensive to manage, in terms of both time and resources.
These secondary costs – like maintenance, monitoring, security, recovery, and infrastructure – quickly balloon when you create and maintain your own in-house solutions. On closer examination, these costs turn out to not to be so “secondary” after all.
This is true for any application or service, but it’s especially true for a service-based data privacy solution.
That’s because when it comes to data privacy and security, you can’t afford to cut corners. The cost of a mistake is too high – averaging $4.35m in 2022. These costs include:
- Damaged customer trust, leading to loss of customers
- Damage to your brand, leading to increased customer acquisition costs
- Fines and other regulatory sanctions under laws like GDPR and CPRA
With that said, let’s take a look at the key considerations you need to think through before diving into building your own data privacy solution.
Key Considerations When Building a Data Privacy Solution
Maybe the preceding discussion of challenges around self-hosting Git hasn’t discouraged you from your plans to build your own service-based data privacy solution.
In that case, you’re probably wondering which key requirements you must address to build an effective data privacy solution.
Let’s take a closer look at what’s involved.
Isolation: Protecting Sensitive Data
Just as you might keep your passport, jewelry, and cash isolated and protected within a safe in your home rather than in the kitchen “junk drawer” alongside your batteries, paperclips, and flashlights, sensitive data should be stored separately from other data. Sensitive data isn’l like a JPEG of your company’s logo or how many times a user has opened an app, it’s a vitally important resource. And when you commingle sensitive data with non-sensitive data, there’s no good way to protect it.
To provide effective data privacy, access control for sensitive data should be very different from access control of non-sensitive data: just as a home safe has a combination lock but your kitchen junk drawer doesn’t. With sensitive data, you need to consider not only who has access, but in what format the data is available (e.g. masked, redacted, plaintext), how long access persists before timing out, and what authorized users or services can do with that data. This is fundamentally different from application data like the number of times someone has viewed a document. For non-sensitive data, CRUD permissions are sufficient, but non-sensitive data requires more effective data governance.
If sensitive and non-sensitive data is commingled in a single system, it becomes incredibly complex and confusing to build and support a system which preserves the privacy of sensitive data without making it onerous to use non-sensitive data.
Isolating sensitive data is the only viable solution to protect sensitive data, but of course, it isn’t the only requirement.
Security Measures: Strengthening Data Protection
To build a data privacy solution you must incorporate a range of effective security measures. This includes not only traditional security practices like perimeter security, but also network security, penetration testing, and protection against DDoS attacks.
Additionally, you’ll need fine-grained access controls and fine-grained access controls. For example, to protect sensitive data fields like social security numbers (SSNs), you should go beyond merely limiting which fields can be accessed by users in certain roles. Instead, you should also limit how much of those fields they can query, and limit their queries to a given time period.
So, instead of granting customer service agents (CSAs) access to full SSNs around the clock, you’re giving them access to just the last four digits of the SSN that they need to do their job. And, you’re limiting this access to just when that CSA is on duty.
Just as there’s no good reason for a CSA to access a full SSN, there’s no good reason for a CSA to have access to thousands of customer records in a second. No CSA has a legitimate need for such unrestricted access to sensitive data. Like other users, they should have the lowest level of sensitive data privileged access that they need to do their jobs.
By adopting the principle of least privilege, your company can significantly reduce the risk of unauthorized access to sensitive data.
Testing and Evaluation: Protecting Data Privacy
Data privacy hinges on preventing the authorized use of sensitive data and preventing data breaches. Adopting a zero trust approach to protecting sensitive data helps organizations to verify access and avoid unauthorized use. Additionally, to maintain the integrity of a data privacy solution, you must rely on established encryption implementations and thoroughly test any modifications or extensions of those implementations.
Practices such as using a time-to-live (TTL) implementation when caching sensitive, transient data like CVV codes, decrypting data only when necessary, maintaining sensitive data isolation, protecting encryption keys, and having a zero-day plan in place contribute to the overall effectiveness of a data privacy solution.
Staying Up To Date: Embracing Continuous Improvement
Regulations might be a driving factor for organizations to implement data privacy solutions, but compliance shouldn’t be your sole focus. Instead, privacy and security should be core principles embedded into every aspect of your software and product.
Adhering to regulations is a baseline requirement, but regulatory compliance is no excuse for mishandling sensitive data. Instead, you must go beyond compliance and adopt advanced security practices like zero trust principles to uphold your commitment to comprehensive data privacy. Staying up to date with evolving security practices and industry standards is essential to building and maintaining an effective data privacy solution.
Balancing Performance, Security, and Usability
Building a data privacy solution presents several challenges. Rigorous design and engineering practices are essential, as even a small oversight carries significant consequences. Striking the right balance between performance, security, and usability is a constant challenge.
You can’t simply lock the data up and throw away the key. You need to build in support for retrieving, rendering, manipulating, and passing sensitive data to trusted third parties – all while protecting this data. For example, if you’re storing credit card numbers and you need to issue a card transaction through the Visa network, you’ll need to decrypt the credit card data and repackage it in the format Visa expects, all while meeting PCI DSS encryption and infrastructure standards.
To effectively tackle scale and security challenges, it is crucial to adopt innovation and devise imaginative approaches. For instance, incorporating additional layers of security, such as envelope encryption, can introduce performance overhead. Striking the right balance between optimizing performance and ensuring privacy and security demands thoughtful deliberation, meticulous planning, and flawless execution. Prioritizing simplicity over complexity amplifies security by minimizing the introduction of bugs.
Scaling: Balancing Cost and Efficiency
Data privacy solutions tend to be computationally demanding and storage-intensive, both of which can result in higher costs. To mitigate these challenges, you must design the solution to allow for seamless horizontal scaling. And seamless horizontal scaling requires continuous testing, throughput optimization, and latency minimization. Building infrastructure that seamlessly scales up based on your minute-to-minute needs, combined with proper data set partitioning, reduces the risk of bottlenecks and supports efficient operations.
Now that we’ve reviewed the functional requirements to build and maintain a data privacy solution, let’s take a look at the expertise your team will need to develop, or hire, to meet these requirements.
The Expertise Required to Build a Data Privacy Solution
Building an effective and resilient data privacy solution requires a multidisciplinary team with deep expertise in a variety of areas. Effective collaboration among these experts is crucial to meeting the organization's privacy and security objectives.
Here are some key areas of expertise required to build an effective data privacy solution:
- Database Expertise: Database engineers who understand data storage mechanisms, indexing, and query optimization are crucial for building a scalable and efficient data privacy solution
- Encryption and Key Management: Encryption experts are needed to design and implement secure encryption algorithms and establish proper key management practices to safeguard your sensitive data
- Infrastructure and Security: Infrastructure experts play a critical role in designing and configuring the cloud infrastructure that hosts the data privacy solution to optimize its resiliency, scalability, and security
- Performance Engineering: Performance engineers play a vital role in balancing performance, security, and usability by optimizing the efficiency of data processing and minimizing latency
- Network Engineering: Network engineers design and configure secure network topologies and ensure proper network segmentation to protect sensitive data by isolating it from the rest of your systems
- Disaster Recovery and Business Continuity: Disaster recovery architects develop strategies and mechanisms to recover data in case of catastrophic events and ensure business continuity and data integrity
- Compliance and Privacy Knowledge: Compliance experts with a deep understanding of privacy regulations and best practices are vital to building a solution that aligns with industry standards and legal requirements
Hiring a team with in-depth knowledge in each of these areas is difficult enough, and retaining them could prove to be even more difficult.
So, is there a good alternative to building a data privacy solution yourself?
Why Building a Data Privacy Solution Probably Doesn’t Make Sense
Given a team with the right expertise and enough time, an engineering organization can build nearly anything. So the question isn’t so much whether you could build your own data privacy solution, but rather: what does it make sense for you to build? Another reasonable question you could ask is: which core product features aren’t you building if you are focusing your talent on building an in-house data privacy solution?
For example, would you build your own database, or simply design a new database schema for an existing database technology?
Data Privacy is a Serious Job
Writing software for data privacy is a serious job, and it can’t be taken lightly. You have to apply a high level of design, engineering, security, and operations rigor all the time. And, you need to make the right tradeoffs as often performance, security, and usability are at odds with each other and all design and implementation decisions have to be carefully thought through.
Building something as nuanced as a data privacy solution requires extremely talented engineers. The problem is, unless this is your core product, it’s unlikely to be a customer-facing feature, making recruiting exceedingly difficult as most engineers want to work on the “flagship product”.
This makes recruiting and retaining top talent difficult when building data privacy solutions that, while vital, might not be part of your core product. Also, will you really put your top talent on a project that’s not part of your core, customer-facing product? And if so, for how long?
Of course, beyond the issue of building and retaining a team to build a data privacy vault, you have the issue that you will often face pressure to scope down your data privacy efforts to just what’s required to maintain compliance with standards like PCI DSS or laws like GDPR, CPRA, and HIPAA.
Compliance is a Baseline
Compliance motivates a lot of data privacy work, but it’s frustrating and ineffective to target the minimum bar set by compliance if you want to provide effective data privacy and security. But maintaining a minimal level of compliance is (at least superficially) easier and less expensive than building comprehensive data privacy solutions that will protect your customers and your company, and future-proof your business against the landslide of new data privacy laws being passed globally.
If you’re only building for compliance, rather than for resiliency in the face of the newest, most sophisticated exploits, you’ll lose the battle for security. Many companies, including Equifax, have had major data breaches despite being fully “in compliance”.
To truly prioritize data privacy and security, you’ll need to stay up to date on the latest security, encryption, and privacy-enhancing technologies. And, you’ll also need to employ continuous monitoring to detect new potential threats, stay on top of software updates, and leverage third parties to run penetration testing (“pentesting”) and bug bounty programs.
Sensitive Data Isolation is Paramount
To properly manage and control access to sensitive data, it’s necessary to isolate sensitive data from the rest of your infrastructure and systems. Isolation makes it possible to implement appropriate access controls and protection mechanisms to safeguard the privacy of sensitive data. True isolation is arguably impossible to achieve if you build a data privacy solution yourself, due to how data administration is handled at most companies. Inevitably, if you build it in-house someone within your organization will have full administrative access. This means that the pressures exerted by tight deadlines might lead to this person establishing “backdoors” that circumvent the measures you’ve put in place.
So before you charge down that path of DIY data security, think carefully about the cost beyond just “getting it to work”. Think about the secondary costs, the number of experts needed to do this right and most importantly, the cost of a mistake.
Try Skyflow Data Privacy Vault
The technical requirements for a data privacy solution are formidable. And, as we’ve discussed above, the incentives that exist within most companies around hiring, deadlines, and talent allocation can make building, securing, maintaining, and operating an in-house data privacy solution a formidable challenge.
We built Skyflow Data Privacy Vault to address the data privacy challenge at scale by meeting or exceeding the requirements described above. Because Skyflow Vault is our core product (in fact, it’s our only product), our success depends on meeting the data privacy and security needs of our customers.
To learn more about how Skyflow can help you protect the privacy and security of your sensitive customer data while easing compliance, contact us.