Back to Blog

Data Privacy Engineering: PII Detection, Data Masking, and Right-to-Erasure Pipelines

Build privacy-by-design systems — PII detection and classification, data masking in non-production environments, right-to-erasure pipelines, consent management,

Viprasol Tech Team
May 19, 2026
13 min read

Data Privacy Engineering: PII Detection, Data Masking, and Right-to-Erasure Pipelines

Privacy compliance (GDPR, CCPA, PIPL) is no longer a legal checkbox — regulators are issuing multi-million dollar fines, and enterprise buyers conduct privacy reviews before signing contracts. Privacy engineering means implementing the technical mechanisms that make compliance possible: knowing where personal data lives, protecting it, and being able to delete it on request.

This guide covers the engineering implementation, not the legal theory.


PII Classification: Know What You Have

You cannot protect data you haven't catalogued. PII classification starts with knowing what personal data your system holds and where.

PII categories (GDPR Article 9 high-sensitivity in bold):

CategoryExamplesRisk Level
IdentifiersName, email, user IDHigh
ContactPhone, address, IPHigh
HealthMedical records, diagnosesVery High
BiometricFingerprints, face dataVery High
FinancialPayment method, bank accountHigh
BehavioralClickstream, purchase historyMedium
DeviceBrowser fingerprint, device IDMedium

Automated PII detection in your codebase:

# scripts/pii_scanner.py — scan database schema and code for PII fields
import re
from pathlib import Path

# Patterns that suggest PII fields
PII_PATTERNS = {
    'email': re.compile(r'\b(email|e_mail|email_address)\b', re.IGNORECASE),
    'phone': re.compile(r'\b(phone|phone_number|mobile|cell)\b', re.IGNORECASE),
    'name': re.compile(r'\b(first_name|last_name|full_name|display_name)\b', re.IGNORECASE),
    'ssn': re.compile(r'\b(ssn|social_security|tax_id|national_id)\b', re.IGNORECASE),
    'dob': re.compile(r'\b(date_of_birth|dob|birthdate|birth_date)\b', re.IGNORECASE),
    'address': re.compile(r'\b(address|street|postal_code|zip_code)\b', re.IGNORECASE),
    'ip': re.compile(r'\b(ip_address|client_ip|remote_addr)\b', re.IGNORECASE),
    'payment': re.compile(r'\b(card_number|cvv|account_number|routing_number)\b', re.IGNORECASE),
}

def scan_prisma_schema(schema_path: str) -> dict[str, list[str]]:
    """Find PII fields in Prisma schema"""
    findings: dict[str, list[str]] = {}

    with open(schema_path) as f:
        content = f.read()

    current_model = None
    for line in content.splitlines():
        model_match = re.match(r'model (\w+) {', line)
        if model_match:
            current_model = model_match.group(1)

        if current_model:
            for pii_type, pattern in PII_PATTERNS.items():
                if pattern.search(line):
                    findings.setdefault(current_model, []).append(
                        f"{pii_type}: {line.strip()}"
                    )

    return findings

# Run: python pii_scanner.py
findings = scan_prisma_schema('prisma/schema.prisma')
for model, fields in findings.items():
    print(f"\n{model}:")
    for field in fields:
        print(f"  [PII] {field}")

Data Masking for Non-Production Environments

Developers and QA engineers should never work with real production PII. Data masking replaces real values with realistic fake values before data reaches non-production environments.

// scripts/maskProductionData.ts
// Run on a production DB dump before loading into staging

import { faker } from '@faker-js/faker';
import { createHash } from 'crypto';

interface MaskingStrategy {
  email: (value: string) => string;
  name: (value: string) => string;
  phone: (value: string) => string;
  ip: (value: string) => string;
  // Preserve structure, replace content
}

const masking: MaskingStrategy = {
  // Deterministic: same input always produces same output
  // Allows testing relationships (user A's orders still belong to user A)
  email: (email: string) => {
    const hash = createHash('md5').update(email).digest('hex').substring(0, 8);
    return `test-${hash}@example-masked.com`;
  },

  name: () => faker.person.fullName(),

  phone: () => faker.phone.number('+1##########'),

  ip: (ip: string) => {
    // Anonymize last octet (GDPR: truncated IP is no longer personal data)
    return ip.replace(/\.\d+$/, '.0');
  },
};

// Apply masking to staging database
async function maskDatabase(stagingDb: Pool) {
  console.log('Masking PII in staging database...');

  await stagingDb.query(`
    UPDATE users SET
      email = 'test-' || encode(digest(email, 'md5'), 'hex')::text || '@example-masked.com',
      name = 'Test User ' || id::text,
      phone = '+10000000000',
      date_of_birth = NULL
  `);

  await stagingDb.query(`
    UPDATE billing_info SET
      card_last_four = '0000',
      billing_name = 'Test User',
      billing_address = '123 Test Street, Test City, TC 00000'
  `);

  // Anonymize IP addresses in event logs
  await stagingDb.query(`
    UPDATE events SET
      ip_address = regexp_replace(ip_address, '\\.\\d+$', '.0')
    WHERE ip_address IS NOT NULL
  `);

  console.log('Masking complete');
}

Automated masking in CI pipeline:

# .github/workflows/refresh-staging.yml
name: Refresh Staging Data

on:
  schedule:
    - cron: '0 2 * * 0'  # Weekly on Sunday at 2am
  workflow_dispatch:

jobs:
  refresh:
    runs-on: ubuntu-latest
    steps:
      - name: Dump production DB (schema + data)
        run: |
          pg_dump ${{ secrets.PROD_DB_URL }} \
            --no-owner --no-privileges \
            --exclude-table=webhook_events \
            -f production-dump.sql

      - name: Load into staging
        run: psql ${{ secrets.STAGING_DB_URL }} < production-dump.sql

      - name: Apply PII masking
        run: npx ts-node scripts/maskProductionData.ts
        env:
          STAGING_DB_URL: ${{ secrets.STAGING_DB_URL }}

🚀 SaaS MVP in 8 Weeks — Seriously

We have launched 50+ SaaS platforms. Multi-tenant architecture, Stripe billing, auth, role-based access, and cloud deployment — all handled by one senior team.

  • Week 1–2: Architecture design + wireframes
  • Week 3–6: Core features built + tested
  • Week 7–8: Launch-ready on AWS/Vercel with CI/CD
  • Post-launch: Maintenance plans from month 3

Right-to-Erasure Pipeline (GDPR Article 17)

GDPR requires you to delete a user's personal data within 30 days of a valid erasure request. This is harder than it sounds — data lives in many places.

// lib/privacy/erasurePipeline.ts

interface ErasureRequest {
  requestId: string;
  userId: string;
  requestedAt: Date;
  completedAt?: Date;
  steps: ErasureStep[];
}

interface ErasureStep {
  name: string;
  status: 'pending' | 'completed' | 'failed' | 'not_applicable';
  completedAt?: Date;
  error?: string;
}

export async function processErasureRequest(userId: string): Promise<ErasureRequest> {
  const requestId = crypto.randomUUID();
  const steps: ErasureStep[] = [];

  const runStep = async (name: string, fn: () => Promise<void>): Promise<void> => {
    try {
      await fn();
      steps.push({ name, status: 'completed', completedAt: new Date() });
    } catch (err: any) {
      steps.push({ name, status: 'failed', error: err.message });
      // Log but continue — partial erasure is better than none
      logger.error({ step: name, userId, error: err.message });
    }
  };

  // Step 1: Anonymize user record (don't delete — keep for financial records, etc.)
  await runStep('anonymize_user_record', async () => {
    await db.query(`
      UPDATE users SET
        email = 'deleted-' || $1 || '@deleted.invalid',
        name = 'Deleted User',
        phone = NULL,
        avatar_url = NULL,
        deleted_at = NOW(),
        deletion_request_id = $2
      WHERE id = $1
    `, [userId, requestId]);
  });

  // Step 2: Delete from analytics/event tracking
  await runStep('delete_analytics_events', async () => {
    // PostHog
    await posthog.delete(userId);
    // Mixpanel
    await mixpanel.people.delete_user(userId);
    // Internal events table
    await db.query('DELETE FROM events WHERE user_id = $1', [userId]);
  });

  // Step 3: Delete from email provider
  await runStep('delete_email_provider', async () => {
    const contacts = await sendgrid.request({
      method: 'POST',
      url: '/v3/marketing/contacts/search',
      body: { query: `email = '${userEmail}'` },
    });
    if (contacts.body.result.length > 0) {
      await sendgrid.request({
        method: 'DELETE',
        url: `/v3/marketing/contacts?ids=${contacts.body.result.map((c: any) => c.id).join(',')}`,
      });
    }
  });

  // Step 4: Delete from support tool
  await runStep('delete_support_tickets', async () => {
    // Intercom: anonymize user
    await intercom.users.update({
      user_id: userId,
      name: 'Deleted User',
      email: `deleted-${userId}@deleted.invalid`,
      custom_attributes: { gdpr_deleted: true },
    });
  });

  // Step 5: Remove from search index
  await runStep('delete_search_index', async () => {
    // Remove all user's documents from Typesense
    await typesense.collections('projects').documents().delete({
      filter_by: `userId:=${userId}`,
    });
  });

  // Step 6: Delete from backups (flag for backup rotation)
  await runStep('flag_backup_deletion', async () => {
    // Mark backups that contain this user for deletion on next rotation
    await db.query(`
      INSERT INTO backup_deletion_queue (user_id, requested_at)
      VALUES ($1, NOW())
    `, [userId]);
    // Actual backup deletion happens when those backups age out (per retention policy)
  });

  // Step 7: Delete uploaded files
  await runStep('delete_user_files', async () => {
    const files = await db.query(
      'SELECT s3_key FROM user_uploads WHERE user_id = $1', [userId]
    );
    await Promise.all(
      files.rows.map(f => s3.deleteObject({ Bucket: 'uploads', Key: f.s3_key }).promise())
    );
    await db.query('DELETE FROM user_uploads WHERE user_id = $1', [userId]);
  });

  // Record completion
  await db.erasureRequests.create({
    requestId,
    userId,
    steps,
    completedAt: new Date(),
  });

  return { requestId, userId, requestedAt: new Date(), steps };
}

Consent Management

Record and honor marketing consent:

// lib/consent.ts
interface ConsentRecord {
  userId: string;
  purpose: 'marketing' | 'analytics' | 'personalization';
  status: 'granted' | 'withdrawn';
  source: 'signup' | 'settings' | 'cookie_banner';
  recordedAt: Date;
  ipAddress: string;  // Required by GDPR to prove consent
  userAgent: string;
}

export async function recordConsent(consent: ConsentRecord): Promise<void> {
  // Consent records are immutable — never update, always append
  await db.query(`
    INSERT INTO consent_records
      (user_id, purpose, status, source, recorded_at, ip_address, user_agent)
    VALUES ($1, $2, $3, $4, NOW(), $5, $6)
  `, [
    consent.userId,
    consent.purpose,
    consent.status,
    consent.source,
    consent.ipAddress,
    consent.userAgent,
  ]);
}

export async function hasConsent(userId: string, purpose: string): Promise<boolean> {
  const latest = await db.query(`
    SELECT status FROM consent_records
    WHERE user_id = $1 AND purpose = $2
    ORDER BY recorded_at DESC
    LIMIT 1
  `, [userId, purpose]);

  return latest.rows[0]?.status === 'granted';
}

💡 The Difference Between a SaaS Demo and a SaaS Business

Anyone can build a demo. We build SaaS products that handle real load, real users, and real payments — with architecture that does not need to be rewritten at 1,000 users.

  • Multi-tenant PostgreSQL with row-level security
  • Stripe subscriptions, usage billing, annual plans
  • SOC2-ready infrastructure from day one
  • We own zero equity — you own everything

Data Retention Automation

-- Automated retention: delete records older than retention period
-- Run as a nightly cron job

-- Delete old analytics events (retain 2 years)
DELETE FROM events
WHERE created_at < NOW() - INTERVAL '2 years';

-- Anonymize inactive users (no login in 3 years)
UPDATE users SET
  email = 'inactive-' || id || '@anonymized.invalid',
  name = 'Inactive User',
  phone = NULL
WHERE last_login_at < NOW() - INTERVAL '3 years'
  AND deleted_at IS NULL;

-- Delete expired sessions
DELETE FROM sessions WHERE expires_at < NOW();

Working With Viprasol

We implement privacy engineering for SaaS products — PII classification, data masking pipelines, right-to-erasure automation, consent management, and privacy impact assessments. Privacy engineering done right protects users and reduces compliance risk.

Talk to our team about privacy engineering implementation.


See Also

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Building a SaaS Product?

We've helped launch 50+ SaaS platforms. Let's build yours — fast.

Free consultation • No commitment • Response within 24 hours

Viprasol · AI Agent Systems

Add AI automation to your SaaS product?

Viprasol builds custom AI agent crews that plug into any SaaS workflow — automating repetitive tasks, qualifying leads, and responding across every channel your customers use.