Data Privacy Engineering: PII Detection, Data Masking
Build privacy-by-design systems — PII detection and classification, data masking in non-production environments, right-to-erasure pipelines, consent management,
Data Privacy Engineering: PII Detection, Data Masking, and Right-to-Erasure Pipelines
Quick answer. Privacy engineering implements the technical mechanisms behind GDPR, CCPA, and PIPL compliance: cataloguing where personal data lives, masking and protecting it, and building right-to-erasure pipelines that delete it on request. It starts with PII classification, since you cannot protect or delete data you haven't first discovered and categorized.
Privacy compliance (GDPR, CCPA, PIPL) is no longer a legal checkbox — regulators are issuing multi-million dollar fines, and enterprise buyers conduct privacy reviews before signing contracts. Privacy engineering means implementing the technical mechanisms that make compliance possible: knowing where personal data lives, protecting it, and being able to delete it on request.
This guide covers the engineering implementation, not the legal theory.
PII Classification: Know What You Have
You cannot protect data you haven't catalogued. PII classification starts with knowing what personal data your system holds and where.
PII categories (GDPR Article 9 high-sensitivity in bold):
| Category | Examples | Risk Level |
|---|---|---|
| Identifiers | Name, email, user ID | High |
| Contact | Phone, address, IP | High |
| Health | Medical records, diagnoses | Very High |
| Biometric | Fingerprints, face data | Very High |
| Financial | Payment method, bank account | High |
| Behavioral | Clickstream, purchase history | Medium |
| Device | Browser fingerprint, device ID | Medium |
Automated PII detection in your codebase:
# scripts/pii_scanner.py — scan database schema and code for PII fields
import re
from pathlib import Path
# Patterns that suggest PII fields
PII_PATTERNS = {
'email': re.compile(r'\b(email|e_mail|email_address)\b', re.IGNORECASE),
'phone': re.compile(r'\b(phone|phone_number|mobile|cell)\b', re.IGNORECASE),
'name': re.compile(r'\b(first_name|last_name|full_name|display_name)\b', re.IGNORECASE),
'ssn': re.compile(r'\b(ssn|social_security|tax_id|national_id)\b', re.IGNORECASE),
'dob': re.compile(r'\b(date_of_birth|dob|birthdate|birth_date)\b', re.IGNORECASE),
'address': re.compile(r'\b(address|street|postal_code|zip_code)\b', re.IGNORECASE),
'ip': re.compile(r'\b(ip_address|client_ip|remote_addr)\b', re.IGNORECASE),
'payment': re.compile(r'\b(card_number|cvv|account_number|routing_number)\b', re.IGNORECASE),
}
def scan_prisma_schema(schema_path: str) -> dict[str, list[str]]:
"""Find PII fields in Prisma schema"""
findings: dict[str, list[str]] = {}
with open(schema_path) as f:
content = f.read()
current_model = None
for line in content.splitlines():
model_match = re.match(r'model (\w+) {', line)
if model_match:
current_model = model_match.group(1)
if current_model:
for pii_type, pattern in PII_PATTERNS.items():
if pattern.search(line):
findings.setdefault(current_model, []).append(
f"{pii_type}: {line.strip()}"
)
return findings
# Run: python pii_scanner.py
findings = scan_prisma_schema('prisma/schema.prisma')
for model, fields in findings.items():
print(f"\n{model}:")
for field in fields:
print(f" [PII] {field}")
Data Masking for Non-Production Environments
Developers and QA engineers should never work with real production PII. Data masking replaces real values with realistic fake values before data reaches non-production environments.
// scripts/maskProductionData.ts
// Run on a production DB dump before loading into staging
import { faker } from '@faker-js/faker';
import { createHash } from 'crypto';
interface MaskingStrategy {
email: (value: string) => string;
name: (value: string) => string;
phone: (value: string) => string;
ip: (value: string) => string;
// Preserve structure, replace content
}
const masking: MaskingStrategy = {
// Deterministic: same input always produces same output
// Allows testing relationships (user A's orders still belong to user A)
email: (email: string) => {
const hash = createHash('md5').update(email).digest('hex').substring(0, 8);
return `test-${hash}@example-masked.com`;
},
name: () => faker.person.fullName(),
phone: () => faker.phone.number('+1##########'),
ip: (ip: string) => {
// Anonymize last octet (GDPR: truncated IP is no longer personal data)
return ip.replace(/\.\d+$/, '.0');
},
};
// Apply masking to staging database
async function maskDatabase(stagingDb: Pool) {
console.log('Masking PII in staging database...');
await stagingDb.query(`
UPDATE users SET
email = 'test-' || encode(digest(email, 'md5'), 'hex')::text || '@example-masked.com',
name = 'Test User ' || id::text,
phone = '+10000000000',
date_of_birth = NULL
`);
await stagingDb.query(`
UPDATE billing_info SET
card_last_four = '0000',
billing_name = 'Test User',
billing_address = '123 Test Street, Test City, TC 00000'
`);
// Anonymize IP addresses in event logs
await stagingDb.query(`
UPDATE events SET
ip_address = regexp_replace(ip_address, '\\.\\d+$', '.0')
WHERE ip_address IS NOT NULL
`);
console.log('Masking complete');
}
Automated masking in CI pipeline:
# .github/workflows/refresh-staging.yml
name: Refresh Staging Data
on:
schedule:
- cron: '0 2 * * 0' # Weekly on Sunday at 2am
workflow_dispatch:
jobs:
refresh:
runs-on: ubuntu-latest
steps:
- name: Dump production DB (schema + data)
run: |
pg_dump ${{ secrets.PROD_DB_URL }} \
--no-owner --no-privileges \
--exclude-table=webhook_events \
-f production-dump.sql
- name: Load into staging
run: psql ${{ secrets.STAGING_DB_URL }} < production-dump.sql
- name: Apply PII masking
run: npx ts-node scripts/maskProductionData.ts
env:
STAGING_DB_URL: ${{ secrets.STAGING_DB_URL }}
🚀 SaaS MVP in 8 Weeks — Seriously
We have launched 50+ SaaS platforms. Multi-tenant architecture, Stripe billing, auth, role-based access, and cloud deployment — all handled by one senior team.
- Week 1–2: Architecture design + wireframes
- Week 3–6: Core features built + tested
- Week 7–8: Launch-ready on AWS/Vercel with CI/CD
- Post-launch: Maintenance plans from month 3
Right-to-Erasure Pipeline (GDPR Article 17)
GDPR requires you to delete a user's personal data within 30 days of a valid erasure request. This is harder than it sounds — data lives in many places.
// lib/privacy/erasurePipeline.ts
interface ErasureRequest {
requestId: string;
userId: string;
requestedAt: Date;
completedAt?: Date;
steps: ErasureStep[];
}
interface ErasureStep {
name: string;
status: 'pending' | 'completed' | 'failed' | 'not_applicable';
completedAt?: Date;
error?: string;
}
export async function processErasureRequest(userId: string): Promise<ErasureRequest> {
const requestId = crypto.randomUUID();
const steps: ErasureStep[] = [];
const runStep = async (name: string, fn: () => Promise<void>): Promise<void> => {
try {
await fn();
steps.push({ name, status: 'completed', completedAt: new Date() });
} catch (err: any) {
steps.push({ name, status: 'failed', error: err.message });
// Log but continue — partial erasure is better than none
logger.error({ step: name, userId, error: err.message });
}
};
// Step 1: Anonymize user record (don't delete — keep for financial records, etc.)
await runStep('anonymize_user_record', async () => {
await db.query(`
UPDATE users SET
email = 'deleted-' || $1 || '@deleted.invalid',
name = 'Deleted User',
phone = NULL,
avatar_url = NULL,
deleted_at = NOW(),
deletion_request_id = $2
WHERE id = $1
`, [userId, requestId]);
});
// Step 2: Delete from analytics/event tracking
await runStep('delete_analytics_events', async () => {
// PostHog
await posthog.delete(userId);
// Mixpanel
await mixpanel.people.delete_user(userId);
// Internal events table
await db.query('DELETE FROM events WHERE user_id = $1', [userId]);
});
// Step 3: Delete from email provider
await runStep('delete_email_provider', async () => {
const contacts = await sendgrid.request({
method: 'POST',
url: '/v3/marketing/contacts/search',
body: { query: `email = '${userEmail}'` },
});
if (contacts.body.result.length > 0) {
await sendgrid.request({
method: 'DELETE',
url: `/v3/marketing/contacts?ids=${contacts.body.result.map((c: any) => c.id).join(',')}`,
});
}
});
// Step 4: Delete from support tool
await runStep('delete_support_tickets', async () => {
// Intercom: anonymize user
await intercom.users.update({
user_id: userId,
name: 'Deleted User',
email: `deleted-${userId}@deleted.invalid`,
custom_attributes: { gdpr_deleted: true },
});
});
// Step 5: Remove from search index
await runStep('delete_search_index', async () => {
// Remove all user's documents from Typesense
await typesense.collections('projects').documents().delete({
filter_by: `userId:=${userId}`,
});
});
// Step 6: Delete from backups (flag for backup rotation)
await runStep('flag_backup_deletion', async () => {
// Mark backups that contain this user for deletion on next rotation
await db.query(`
INSERT INTO backup_deletion_queue (user_id, requested_at)
VALUES ($1, NOW())
`, [userId]);
// Actual backup deletion happens when those backups age out (per retention policy)
});
// Step 7: Delete uploaded files
await runStep('delete_user_files', async () => {
const files = await db.query(
'SELECT s3_key FROM user_uploads WHERE user_id = $1', [userId]
);
await Promise.all(
files.rows.map(f => s3.deleteObject({ Bucket: 'uploads', Key: f.s3_key }).promise())
);
await db.query('DELETE FROM user_uploads WHERE user_id = $1', [userId]);
});
// Record completion
await db.erasureRequests.create({
requestId,
userId,
steps,
completedAt: new Date(),
});
return { requestId, userId, requestedAt: new Date(), steps };
}
Consent Management
Record and honor marketing consent:
// lib/consent.ts
interface ConsentRecord {
userId: string;
purpose: 'marketing' | 'analytics' | 'personalization';
status: 'granted' | 'withdrawn';
source: 'signup' | 'settings' | 'cookie_banner';
recordedAt: Date;
ipAddress: string; // Required by GDPR to prove consent
userAgent: string;
}
export async function recordConsent(consent: ConsentRecord): Promise<void> {
// Consent records are immutable — never update, always append
await db.query(`
INSERT INTO consent_records
(user_id, purpose, status, source, recorded_at, ip_address, user_agent)
VALUES ($1, $2, $3, $4, NOW(), $5, $6)
`, [
consent.userId,
consent.purpose,
consent.status,
consent.source,
consent.ipAddress,
consent.userAgent,
]);
}
export async function hasConsent(userId: string, purpose: string): Promise<boolean> {
const latest = await db.query(`
SELECT status FROM consent_records
WHERE user_id = $1 AND purpose = $2
ORDER BY recorded_at DESC
LIMIT 1
`, [userId, purpose]);
return latest.rows[0]?.status === 'granted';
}

💡 The Difference Between a SaaS Demo and a SaaS Business
Anyone can build a demo. We build SaaS products that handle real load, real users, and real payments — with architecture that does not need to be rewritten at 1,000 users.
- Multi-tenant PostgreSQL with row-level security
- Stripe subscriptions, usage billing, annual plans
- SOC2-ready infrastructure from day one
- We own zero equity — you own everything
Data Retention Automation
-- Automated retention: delete records older than retention period
-- Run as a nightly cron job
-- Delete old analytics events (retain 2 years)
DELETE FROM events
WHERE created_at < NOW() - INTERVAL '2 years';
-- Anonymize inactive users (no login in 3 years)
UPDATE users SET
email = 'inactive-' || id || '@anonymized.invalid',
name = 'Inactive User',
phone = NULL
WHERE last_login_at < NOW() - INTERVAL '3 years'
AND deleted_at IS NULL;
-- Delete expired sessions
DELETE FROM sessions WHERE expires_at < NOW();
Our Approach at Viprasol
We implement privacy engineering for SaaS products — PII classification, data masking pipelines, right-to-erasure automation, consent management, and privacy impact assessments. Privacy engineering done right protects users and reduces compliance risk.
→ Talk to our team about privacy engineering implementation.
You Might Also Like
- Fintech Compliance Software — compliance in regulated industries
- SOC 2 Compliance — security certification requirements
- Multi-Tenancy Patterns — data isolation per tenant
- Database Sharding — data residency across regions
- Web Development Services — privacy-by-design product development
External Resources
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 1000+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement.
Building a SaaS Product?
We've helped launch 50+ SaaS platforms. Let's build yours — fast.
Free consultation • No commitment • Response within 24 hours
Add AI automation to your SaaS product?
Viprasol builds custom AI agent crews that plug into any SaaS workflow — automating repetitive tasks, qualifying leads, and responding across every channel your customers use.