Location: San Francisco, CA

Work Model: In-person

Industry: AI training data infrastructure

Compensation: $140K-$250K base, plus equity

About The Company

Our partner is a YC-backed company building a new kind of marketplace in the AI training data space. Rather than operating as a labor marketplace, they provide infrastructure that lets data producers transform their existing data into formats AI labs want and sell it directly to those labs. This democratized model unlocks far more high-value data sources, and the team is growing quickly to keep up with demand.

The Opportunity

This is the company's top hiring priority and a genuinely hard research problem. Because data flows through a decentralized marketplace, ensuring quality at scale is the single biggest bottleneck to growth. As a Research Engineer, you will build the automated systems that verify and assure data quality so that suppliers consistently deliver excellent data to buyers.

You will start by digging into the data manually to understand failure modes, then design systems to automate quality checks at scale, combining rule-based approaches with AI for fuzzier cases and human-in-the-loop review where it makes sense. This is fundamentally a research role focused on building automated systems, not manual QA.

Responsibilities

Identify data quality issues including inconsistencies, formatting problems, and ingestion challenges
Perform initial manual data quality review to deeply understand failure modes
Build systems to automate quality checks at scale using rule-based and AI-driven approaches
Design hybrid systems that balance automation with human-in-the-loop review where appropriate
Continuously improve verification methods as the data landscape and AI tooling evolve

Requirements

Deeply technical, with a strong learning slope and the ability to ramp quickly in a fast-moving field
Background in AI/ML engineering, or software engineering at an AI-focused company with visible data ingestion and processing experience
Ability to reason about likely data quality problems from first principles
Comfortable owning ambiguous, open-ended problems end to end
Comfortable working in person, full-time, in a San Francisco office
Bonus: experience working with noisy or unstructured data, or judgment on when to use automation versus human-in-the-loop review

Research Engineer

Recommended Jobs