Data Privacy Engineering: PII Detection, Data Masking, and Right-to-Erasure Pipelines
Build privacy-by-design systems — PII detection and classification, data masking in non-production environments, right-to-erasure pipelines, consent management,
Data Privacy Engineering: PII Detection, Data Masking, and Right-to-Erasure Pipelines
Privacy compliance (GDPR, CCPA, PIPL) is no longer a legal checkbox — regulators are issuing multi-million dollar fines, and enterprise buyers conduct privacy reviews before signing contracts. Privacy engineering means implementing the technical mechanisms that make compliance possible: knowing where personal data lives, protecting it, and being able to delete it on request.
This guide covers the engineering implementation, not the legal theory.
PII Classification: Know What You Have
You cannot protect data you haven't catalogued. PII classification starts with knowing what personal data your system holds and where.
PII categories (GDPR Article 9 high-sensitivity in bold):
| Category | Examples | Risk Level |
|---|---|---|
| Identifiers | Name, email, user ID | High |
| Contact | Phone, address, IP | High |
| Health | Medical records, diagnoses | Very High |
| Biometric | Fingerprints, face data | Very High |
| Financial | Payment method, bank account | High |
| Behavioral | Clickstream, purchase history | Medium |
| Device | Browser fingerprint, device ID | Medium |
Automated PII detection in your codebase:
# scripts/pii_scanner.py — scan database schema and code for PII fields
import re
from pathlib import Path
# Patterns that suggest PII fields
PII_PATTERNS = {
'email': re.compile(r'\b(email|e_mail|email_address)\b', re.IGNORECASE),
'phone': re.compile(r'\b(phone|phone_number|mobile|cell)\b', re.IGNORECASE),
'name': re.compile(r'\b(first_name|last_name|full_name|display_name)\b', re.IGNORECASE),
'ssn': re.compile(r'\b(ssn|social_security|tax_id|national_id)\b', re.IGNORECASE),
'dob': re.compile(r'\b(date_of_birth|dob|birthdate|birth_date)\b', re.IGNORECASE),
'address': re.compile(r'\b(address|street|postal_code|zip_code)\b', re.IGNORECASE),
'ip': re.compile(r'\b(ip_address|client_ip|remote_addr)\b', re.IGNORECASE),
'payment': re.compile(r'\b(card_number|cvv|account_number|routing_number)\b', re.IGNORECASE),
}
def scan_prisma_schema(schema_path: str) -> dict[str, list[str]]:
"""Find PII fields in Prisma schema"""
findings: dict[str, list[str]] = {}
with open(schema_path) as f:
content = f.read()
current_model = None
for line in content.splitlines():
model_match = re.match(r'model (\w+) {', line)
if model_match:
current_model = model_match.group(1)
if current_model:
for pii_type, pattern in PII_PATTERNS.items():
if pattern.search(line):
findings.setdefault(current_model, []).append(
f"{pii_type}: {line.strip()}"
)
return findings
# Run: python pii_scanner.py
findings = scan_prisma_schema('prisma/schema.prisma')
for model, fields in findings.items():
print(f"\n{model}:")
for field in fields:
print(f" [PII] {field}")
Data Masking for Non-Production Environments
Developers and QA engineers should never work with real production PII. Data masking replaces real values with realistic fake values before data reaches non-production environments.
// scripts/maskProductionData.ts
// Run on a production DB dump before loading into staging
import { faker } from '@faker-js/faker';
import { createHash } from 'crypto';
interface MaskingStrategy {
email: (value: string) => string;
name: (value: string) => string;
phone: (value: string) => string;
ip: (value: string) => string;
// Preserve structure, replace content
}
const masking: MaskingStrategy = {
// Deterministic: same input always produces same output
// Allows testing relationships (user A's orders still belong to user A)
email: (email: string) => {
const hash = createHash('md5').update(email).digest('hex').substring(0, 8);
return `test-${hash}@example-masked.com`;
},
name: () => faker.person.fullName(),
phone: () => faker.phone.number('+1##########'),
ip: (ip: string) => {
// Anonymize last octet (GDPR: truncated IP is no longer personal data)
return ip.replace(/\.\d+$/, '.0');
},
};
// Apply masking to staging database
async function maskDatabase(stagingDb: Pool) {
console.log('Masking PII in staging database...');
await stagingDb.query(`
UPDATE users SET
email = 'test-' || encode(digest(email, 'md5'), 'hex')::text || '@example-masked.com',
name = 'Test User ' || id::text,
phone = '+10000000000',
date_of_birth = NULL
`);
await stagingDb.query(`
UPDATE billing_info SET
card_last_four = '0000',
billing_name = 'Test User',
billing_address = '123 Test Street, Test City, TC 00000'
`);
// Anonymize IP addresses in event logs
await stagingDb.query(`
UPDATE events SET
ip_address = regexp_replace(ip_address, '\\.\\d+$', '.0')
WHERE ip_address IS NOT NULL
`);
console.log('Masking complete');
}
Automated masking in CI pipeline:
# .github/workflows/refresh-staging.yml
name: Refresh Staging Data
on:
schedule:
- cron: '0 2 * * 0' # Weekly on Sunday at 2am
workflow_dispatch:
jobs:
refresh:
runs-on: ubuntu-latest
steps:
- name: Dump production DB (schema + data)
run: |
pg_dump ${{ secrets.PROD_DB_URL }} \
--no-owner --no-privileges \
--exclude-table=webhook_events \
-f production-dump.sql
- name: Load into staging
run: psql ${{ secrets.STAGING_DB_URL }} < production-dump.sql
- name: Apply PII masking
run: npx ts-node scripts/maskProductionData.ts
env:
STAGING_DB_URL: ${{ secrets.STAGING_DB_URL }}
🚀 SaaS MVP in 8 Weeks — Seriously
We have launched 50+ SaaS platforms. Multi-tenant architecture, Stripe billing, auth, role-based access, and cloud deployment — all handled by one senior team.
- Week 1–2: Architecture design + wireframes
- Week 3–6: Core features built + tested
- Week 7–8: Launch-ready on AWS/Vercel with CI/CD
- Post-launch: Maintenance plans from month 3
Right-to-Erasure Pipeline (GDPR Article 17)
GDPR requires you to delete a user's personal data within 30 days of a valid erasure request. This is harder than it sounds — data lives in many places.
// lib/privacy/erasurePipeline.ts
interface ErasureRequest {
requestId: string;
userId: string;
requestedAt: Date;
completedAt?: Date;
steps: ErasureStep[];
}
interface ErasureStep {
name: string;
status: 'pending' | 'completed' | 'failed' | 'not_applicable';
completedAt?: Date;
error?: string;
}
export async function processErasureRequest(userId: string): Promise<ErasureRequest> {
const requestId = crypto.randomUUID();
const steps: ErasureStep[] = [];
const runStep = async (name: string, fn: () => Promise<void>): Promise<void> => {
try {
await fn();
steps.push({ name, status: 'completed', completedAt: new Date() });
} catch (err: any) {
steps.push({ name, status: 'failed', error: err.message });
// Log but continue — partial erasure is better than none
logger.error({ step: name, userId, error: err.message });
}
};
// Step 1: Anonymize user record (don't delete — keep for financial records, etc.)
await runStep('anonymize_user_record', async () => {
await db.query(`
UPDATE users SET
email = 'deleted-' || $1 || '@deleted.invalid',
name = 'Deleted User',
phone = NULL,
avatar_url = NULL,
deleted_at = NOW(),
deletion_request_id = $2
WHERE id = $1
`, [userId, requestId]);
});
// Step 2: Delete from analytics/event tracking
await runStep('delete_analytics_events', async () => {
// PostHog
await posthog.delete(userId);
// Mixpanel
await mixpanel.people.delete_user(userId);
// Internal events table
await db.query('DELETE FROM events WHERE user_id = $1', [userId]);
});
// Step 3: Delete from email provider
await runStep('delete_email_provider', async () => {
const contacts = await sendgrid.request({
method: 'POST',
url: '/v3/marketing/contacts/search',
body: { query: `email = '${userEmail}'` },
});
if (contacts.body.result.length > 0) {
await sendgrid.request({
method: 'DELETE',
url: `/v3/marketing/contacts?ids=${contacts.body.result.map((c: any) => c.id).join(',')}`,
});
}
});
// Step 4: Delete from support tool
await runStep('delete_support_tickets', async () => {
// Intercom: anonymize user
await intercom.users.update({
user_id: userId,
name: 'Deleted User',
email: `deleted-${userId}@deleted.invalid`,
custom_attributes: { gdpr_deleted: true },
});
});
// Step 5: Remove from search index
await runStep('delete_search_index', async () => {
// Remove all user's documents from Typesense
await typesense.collections('projects').documents().delete({
filter_by: `userId:=${userId}`,
});
});
// Step 6: Delete from backups (flag for backup rotation)
await runStep('flag_backup_deletion', async () => {
// Mark backups that contain this user for deletion on next rotation
await db.query(`
INSERT INTO backup_deletion_queue (user_id, requested_at)
VALUES ($1, NOW())
`, [userId]);
// Actual backup deletion happens when those backups age out (per retention policy)
});
// Step 7: Delete uploaded files
await runStep('delete_user_files', async () => {
const files = await db.query(
'SELECT s3_key FROM user_uploads WHERE user_id = $1', [userId]
);
await Promise.all(
files.rows.map(f => s3.deleteObject({ Bucket: 'uploads', Key: f.s3_key }).promise())
);
await db.query('DELETE FROM user_uploads WHERE user_id = $1', [userId]);
});
// Record completion
await db.erasureRequests.create({
requestId,
userId,
steps,
completedAt: new Date(),
});
return { requestId, userId, requestedAt: new Date(), steps };
}
Consent Management
Record and honor marketing consent:
// lib/consent.ts
interface ConsentRecord {
userId: string;
purpose: 'marketing' | 'analytics' | 'personalization';
status: 'granted' | 'withdrawn';
source: 'signup' | 'settings' | 'cookie_banner';
recordedAt: Date;
ipAddress: string; // Required by GDPR to prove consent
userAgent: string;
}
export async function recordConsent(consent: ConsentRecord): Promise<void> {
// Consent records are immutable — never update, always append
await db.query(`
INSERT INTO consent_records
(user_id, purpose, status, source, recorded_at, ip_address, user_agent)
VALUES ($1, $2, $3, $4, NOW(), $5, $6)
`, [
consent.userId,
consent.purpose,
consent.status,
consent.source,
consent.ipAddress,
consent.userAgent,
]);
}
export async function hasConsent(userId: string, purpose: string): Promise<boolean> {
const latest = await db.query(`
SELECT status FROM consent_records
WHERE user_id = $1 AND purpose = $2
ORDER BY recorded_at DESC
LIMIT 1
`, [userId, purpose]);
return latest.rows[0]?.status === 'granted';
}
💡 The Difference Between a SaaS Demo and a SaaS Business
Anyone can build a demo. We build SaaS products that handle real load, real users, and real payments — with architecture that does not need to be rewritten at 1,000 users.
- Multi-tenant PostgreSQL with row-level security
- Stripe subscriptions, usage billing, annual plans
- SOC2-ready infrastructure from day one
- We own zero equity — you own everything
Data Retention Automation
-- Automated retention: delete records older than retention period
-- Run as a nightly cron job
-- Delete old analytics events (retain 2 years)
DELETE FROM events
WHERE created_at < NOW() - INTERVAL '2 years';
-- Anonymize inactive users (no login in 3 years)
UPDATE users SET
email = 'inactive-' || id || '@anonymized.invalid',
name = 'Inactive User',
phone = NULL
WHERE last_login_at < NOW() - INTERVAL '3 years'
AND deleted_at IS NULL;
-- Delete expired sessions
DELETE FROM sessions WHERE expires_at < NOW();
Working With Viprasol
We implement privacy engineering for SaaS products — PII classification, data masking pipelines, right-to-erasure automation, consent management, and privacy impact assessments. Privacy engineering done right protects users and reduces compliance risk.
→ Talk to our team about privacy engineering implementation.
See Also
- Fintech Compliance Software — compliance in regulated industries
- SOC 2 Compliance — security certification requirements
- Multi-Tenancy Patterns — data isolation per tenant
- Database Sharding — data residency across regions
- Web Development Services — privacy-by-design product development
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Building a SaaS Product?
We've helped launch 50+ SaaS platforms. Let's build yours — fast.
Free consultation • No commitment • Response within 24 hours
Add AI automation to your SaaS product?
Viprasol builds custom AI agent crews that plug into any SaaS workflow — automating repetitive tasks, qualifying leads, and responding across every channel your customers use.