About the Role
Data is our lifeblood — and you’ll own the systems that keep it flowing. As our Machine Learning Engineer focused on our data platform, you’ll design the pipelines, services, and APIs that get clean, high-quality data into the hands of researchers and into the guts of our training and evaluation systems.
You’ll experiment with new retrieval and processing methods, build dataset quality benchmarks, and ensure that our data infrastructure scales with the demands of large-scale ML.
Responsibilities
Build, operate, and optimize large-scale ingestion, transformation, and retrieval pipelines
Design APIs and services for high-throughput, low-latency data access
Implement data quality benchmarks, versioning, and reproducibility workflows
Collaborate with research engineers to integrate datasets into training and evaluation loops
Ensure strong engineering discipline — version control, reviews, CI/CD, etc
Qualifications
Deep experience in data engineering and distributed processing systems
Experience with large-scale storage, streaming, and retrieval systems
Strong in Python, SQL, and modern data stacks (Kafka / Spark / Polars / Flink / Airflow / Arrow / Flyte / DuckDB etc)
Startup-ready mindset — adaptable, fast-moving, high-ownership
What makes us interesting
Small, elite team of ex-founders, researchers from top AI Labs, top CS grads, and engineers from top companies
True ownership You will not be blocked by bureaucracy, shipping meaningful work within weeks rather than months
Serious momentum We're well-funded by top investors, moving fast, and focused on execution
What we do
Ship consumer products powered by cutting-edge AI research, and
Build infrastructure that facilitates research and product, and
Innovate cutting-edge research that will open up new consumer product forms
The Details
Full-time, onsite role in Menlo Park
Startup hours apply
Generous salary, with additional benefits to be discussed during the hiring process