Available for Q3 / Q4 engagements

Data systems
that just work.

I’m Long Nguyen, a senior data engineer who builds production-grade pipelines, lakehouses, and ML infrastructure for teams that can’t afford it to break. Ten years in, hundreds of millions of rows shipped daily.

10y+
Building data infra
$80M+
Revenue routed through
12
Production deployments
99.95%
Pipeline uptime SLA
Scroll

Built with the boring, battle-tested stack

SnowflakeDatabricksdbtAirflowKafkaSparkBigQueryPostgresAWSGCPTerraformPythonSnowflakeDatabricksdbtAirflowKafkaSparkBigQueryPostgresAWSGCPTerraformPython

Selected work · 2018 — 2026

A handful of engagements I’m most proud of.

01Retail · Analytics

Fortune 500 retailer

Cut nightly batch from 9h to 22min

Re-architected a brittle Airflow + EMR pipeline into an incremental Spark + dbt lakehouse on Databricks. Same dashboards, fraction of the spend, finance team gets numbers before standup.

Runtime
−96%
Compute spend
−$340k/yr
SLA hit-rate
99.95%
DatabricksdbtSparkAirflow
02Fintech · Risk

Series-B fintech

Real-time fraud signals in <300ms

Designed a Kafka → Flink → Feature Store pipeline serving fraud models at the edge of the payment flow. Replaced 2 vendor tools, paid for itself in the first quarter.

P95 latency
240ms
Vendors removed
2
Loss avoided
$4.1M
KafkaFlinkFeature StoreAWS
03ML Platform

AI-native SaaS

Shipped an internal ML platform in 11 weeks

Bootstrapped feature pipelines, training infra, model registry, and online inference on GCP for a 30-person product team. From zero to first model in production in under a quarter.

Time to first model
11 wks
Models in prod
8
Eng team size
3
GCPVertex AITerraformPython
04HealthTech · HIPAA

Healthcare scale-up

HIPAA-grade lakehouse, audited and shipped

Built a fully audited lakehouse on Snowflake with row-level security, lineage, and CDC ingestion from 14 source systems. Passed SOC 2 + HIPAA review on first attempt.

Sources unified
14
Audit findings
0
Data freshness
<5min
SnowflakeFivetrandbtSOC 2

How I work

Four ways teams hire me. Pick the shape that fits the problem.

Data platform builds

End-to-end lakehouse and pipeline architecture — from source systems to dashboards your CFO actually trusts.

  • Snowflake / Databricks / BigQuery
  • dbt + Airflow / Dagster orchestration
  • Streaming with Kafka, Flink, Pub/Sub

ML infrastructure

Feature stores, training infra, online inference. The boring plumbing that makes data scientists look brilliant.

  • Feature store design (Feast, Tecton, in-house)
  • Model registry + CI/CD for ML
  • Real-time + batch inference on AWS / GCP

Pipeline rescue

Inherited a 9-hour batch that fails twice a week? I’ve seen this movie. I know how it ends.

  • Performance + cost audits
  • Incremental refactor, no big-bang rewrites
  • On-call playbooks and runbooks

Fractional data leadership

Embed as your interim head of data. Hire the team, set the roadmap, ship the first quarter, hand it off.

  • Tech strategy + hiring loops
  • Vendor selection and procurement
  • Mentoring early-career engineers

Engagement model

From first call to handoff in under a quarter.

  1. 0130 min · free

    Discovery call

    30 minutes. We talk about what's broken, what you've tried, and what success looks like by quarter end.

  2. 0248 hours

    Scoped proposal

    Within 48 hours: a written scope, fixed-fee pricing, milestones, and an exit ramp. No surprises.

  3. 034 — 12 weeks

    Build and ship

    Async-first, weekly demos, your team in the loop. Code, infra, and docs land in your repos — never mine.

  4. 041 week + retainer

    Handoff and stay-on

    Runbooks, on-call docs, an internal demo. Optional retainer if you want me on call for the first quarter.

About

I’m the engineer you call when the data has to be right.

LN

Long Nguyen

Senior data engineer · Remote (UTC−5)

For the last decade I’ve been the person teams call when their data systems start lying to the business — pipelines that silently drop rows, dashboards that disagree with the warehouse, ML models retraining on data that no longer exists.

I’ve shipped data infrastructure at Fortune 500 retailers, Series-B fintechs, and AI-native SaaS teams. I write Python and SQL like first languages, lean toward boring tech, and care more about whether the on-call engineer can sleep at night than which framework won this quarter.

I’m booked a quarter ahead, take a maximum of two clients at a time, and price by the engagement, never the hour.

  • Years building data infra10+
  • Production deployments12
  • Concurrent clients2 max
  • Average engagement8 weeks

What clients say

Senior people who’ve hired me before, in their own words.

“Long parachuted into a tangled lakehouse migration that two prior consultancies had failed to land. Eight weeks later, we were on schedule and under budget. Worth every dollar.”

VP Engineering

Series-B Fintech

“The most senior data engineer I’ve worked with — and the rare contractor who writes documentation his replacement actually wants to read. We’ve hired him three times.”

Chief Technology Officer

Healthcare scale-up

“He delivered a feature store, a real-time inference layer, and a dashboard our exec team uses daily — in a single quarter. Quietly, without drama.”

Head of Data

AI-native SaaS

Common questions

What people ask before hiring me.

  • Most engagements land between $25k and $120k, fixed-fee, scoped to a clear deliverable. Retainers start at $8k/month for a half-day per week. I never bill hourly — you should know exactly what you’re paying before we start.

  • I’m typically booked 4 — 8 weeks ahead and take a maximum of two concurrent clients. The intro call is the fastest way to find out my next available slot.

  • Yes to all three. Your code lives in your repos under your accounts from day one. I’m comfortable with SOC 2, HIPAA, and PCI environments and have signed plenty of MNDAs.

  • Both, but rescue work is where I’m sharpest — inherited Airflow nightmares, dbt projects nobody owns, lakehouses that cost more than they earn. I prefer incremental refactors over big-bang rewrites.

  • I work solo by default. For larger builds I have a small bench of trusted contractors I can pull in — also senior, also remote, also fixed-fee.

  • I’m on US Eastern (UTC−5) and overlap comfortably with North America, EU, and APAC mornings. Async-first by default, with one weekly sync.

Let’s talk

Have a data problem
worth solving?

Book a free 30-minute call. Bring the messiest pipeline you have. We’ll either rough out a plan together or I’ll point you to someone better suited.

Replying within 24 hours · Mon — Fri