Raw data in.
Reliable decisions out.
— at any scale.
Data Engineer with 5+ years designing cloud-native data platforms at Amazon and TouchWorld. I build the pipelines that power ML models, BI dashboards, and regulatory compliance — at any scale, in any domain.
high-volume scale
governance built
fraud & risk models
Healthcare Claims Data Platform
End-to-end platform ingesting, validating, and curating insurance claims. Batch and CDC ingestion with HIPAA-compliant handling, late-arriving data logic, and automated quality gates — producing ML-ready feature stores and analytics datasets.
Change Data Capture Pipeline
CDC pipeline from PostgreSQL to Snowflake via AWS DMS. Full idempotency, deduplication, schema evolution, and historical preservation — replacing expensive full-loads with sub-minute incremental ingestion and zero data loss.
Event-Driven Processing Pipeline
Near-real-time pipeline triggered on S3 uploads via EventBridge. Lambda and Glue handle automated transformations with zero idle compute — cutting end-to-end latency from hours to under 2 minutes on a fully serverless architecture.
Cloud-Native AWS Data Platform
Library of production-ready Glue jobs, Lambda functions, and architecture patterns with IAM least-privilege policies and Lake Formation governance. Annotated architecture diagrams for batch and event-driven designs — built for reuse across projects.
Spark & Python Utility Library
Reusable PySpark transformation modules and Python utilities with unit tests. Covers window functions, advanced aggregations, partitioning strategies, and common ETL patterns — designed as plug-in components for any pipeline.
DevOps & Infrastructure as Code
Complete DevOps layer: Terraform modules for S3, Glue, and IAM; Dockerized PySpark for local dev parity; Jenkins + GitHub Actions CI/CD on every merge; Kubernetes (EKS) orchestration. Production never gets a surprise.
The modules that separate a junior builder from a senior platform engineer — quality, governance, performance, and observability as first-class concerns.
Data Quality Framework
Schema-agnostic validation enforcing null checks, referential integrity, accepted values, and drift detection — versioned code, not manual checks.
accepted_values.sqlHealthcare Data Governance
HIPAA-compliant layer with dynamic masking, row-level access control, PHI classification, and audit trails — built from real healthcare platform work.
Snowflake Query Optimization
Query optimization patterns and warehouse tuning at scale — clustering keys, materialized views, result caching, and right-sizing with cost analysis.
Pipeline Monitoring & SLA
Production observability with SLA tracking, severity-tiered alerting, and runbook docs — because a pipeline without monitoring is a future incident.