March 29, 2023

Privacy-aware Data Pipelines with Skyflow's Piper Keyes

How to Subscribe
Share In

A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and increased ROI.

However, despite your best efforts, sensitive customer data tends to find its way into our analytics pipelines, ending up in our data warehouses and metrics dashboards. Replicating customer PII to your downstream services greatly increases your compliance scope and makes maintaining data privacy and security significantly more challenging.

In this episode, Engineering Lead at Skyflow Piper Keyes joins the show to discuss what goes into building a privacy-aware data pipeline, what tools and technologies should you be using, and how Skyflow addresses this problem.

Topics:

  • What is a data analytics pipeline?
  • What does it mean to build a privacy-aware data pipeline?
  • Can you give some examples of use cases where privacy-aware data pipelines are particularly important?
  • What’s it mean to de-identify data and how does that work?
  • What are some common techniques used to preserve privacy in data pipelines?
  • How does analytics work for de-identified data?
  • How do you balance the need for data privacy with the need for actually being able to use the data?
  • What’s it take to build a privacy-aware pipeline from scratch?
  • What are some of the biggest challenges in building privacy-aware data pipelines?
  • How does something like this work with Skyflow?
  • Let’s say I have customer’s transactional data from Visa, how could I ingest that data into my data warehouse but avoid having to build PCI compliance infrastructure? Walk me through how that works.
  • Could you build a machine learning model based on the de-identified data?
  • Once I have the data in my warehouse, let’s say I needed to inform a clinical trial participant about an issue but I also want to maintain their privacy, how could I perform an operation like that?
  • What other use cases does this product enable?

Resources:


Other Podcast

September 11, 2024

Pseudo-anonymization of Data with Jack Godau

In this episode, Sean sat down with Jack Godau to dive deep into the world of pseudoanonymization. Jack shared how pseudoanonymization differs from anonymization, explaining its value for maintaining data utility while complying with stringent regulations like GDPR.

August 28, 2024

The Evolution of Certificate Management with Anchor Security's Ben Burkert

In this episode we explore how certificates and TLS function, the inherent difficulties in managing internal TLS certificates, and why nearly every engineer has a horror story related to it.

August 14, 2024

What is a Data Lakehouse with Upsolver's Ori Rafael

In this episode, we sit down with Ori Rafael, CEO and Co-founder of Upsolver, to explore the rise of the lakehouse architecture and its significance in modern data management.