Machine Learning Engineer
Software Engineering
San Francisco, CA, USA
USD 190k-280k / year + Equity
About Taxa
The last few years have seen a proliferation of maturing foundation models — AlphaFold, ESM, NT — that have started to learn what biology is.
Atop these models, the field is advancing into RL on in silico rewards (docking scores, predicted stability) or wet-lab rewards (assay readouts, fluorescence, growth).
Taxa believes the time has come to start training on what biology does. We have built the stack to measure exactly that— real, clinically verifiable rewards from interventional studies in humans at a cadence, scale, and resolution no one else has. How you learn from a signal like that is an open and exciting problem; reinforcement learning is the natural starting point, and we call our framing RLCVR: Reinforcement Learning from Clinically Verifiable Rewards.
Over the past four years, Taxa has built the first clinical stack capable of running this closed-loop end to end, starting with the skin microbiome:
- Genetic engineering of intractable microbiome species — important species the rest of the field has struggled to work with.
- Reliable delivery of these probiotics to humans — a delivery problem that has been a major barrier across the field.
- High-throughput human study operations — interventional, longitudinal, at a cadence and cost no academic or pharma program approaches.
- Strain-level shotgun metagenomics at scale to resolve microbiome perturbations at extreme resolution— read-level, not species-level.
The convergence of these four — engineered chassis, in situ delivery, study throughput, and strain-level resolution — drive what makes an RL loop with sufficient human-derived reward signal possible.
The Opportunity (onsite in San Francisco)
You will be Taxa's first dedicated ML hire. You will build a microbiome foundation model that learns from shotgun metagenomics reads and bacterial genomes at scale, conditioned on strain-level abundances and downstream phenotypes — for in silico design of engineered probiotic interventions, with reinforcement learning on clinically verifiable rewards collected from our ongoing interventional studies.
End-to-end ownership: from raw reads to a model whose predictions get tested in humans– and the results come back to you in weeks, not years.You will own the model, the training pipeline, the reward harness, and the evaluation methodology. This is a combined ML+eval role and it is the whole loop, not one slice of it. From day one you will collaborate directly with our partners at leading frontier AI labs.
Required Qualifications
- PhD in a quantitative discipline — bioinformatics, computational biology, machine learning, computer science, physics, statistics, bioengineering, applied math, or a related field. We hire for the work people have done, not the department they sat in.
- Bioinformatics expertise — including hands-on experience with DNA / genomic data and shotgun metagenomes. Comfortable with raw, unfiltered read-level data from long- and short-read sequencing technologies.
- Production-grade ML engineering. You write code others can build on — and you've owned models in production, not just in notebooks. PyTorch required; JAX a plus.
- Modern deep learning research experience. You've trained, fine-tuned, or significantly adapted large neural models, and you have the research taste to know which experiments are worth running when ground truth is sparse, delayed, or noisy. We care more about your judgment in an open research problem than your familiarity with any one method.
Highly Desirable Experience
- Familiarity with the quirks of bacterial genomics and microbiome data — compositionality, sparsity, phylogenetic artifacts, genomic plasticity. A clear point of view on how to fold signal into a sequence-first architecture.
- Modeling complex systems over time. Experience with longitudinal, perturbational, or dynamical-systems data — learning how a system responds to an intervention and evolves, not just a static snapshot. Neural ODEs, state-space models, latent-dynamics or perturbation-response modeling are all relevant, and so is rigor about confounding and causality in observational time-series.
- Foundation-model experience, trained at real scale. You've pretrained or substantially adapted a foundation model (ideally on genomic sequences e.g. Nucleotide Transformer, HyenaDNA, Caduceus, DNABERT), and you've felt where it breaks at scale — you can make the parallelism and memory trade-off calls and defend why, rather than reaching for whatever the default config does.
- Optimizing against feedback.You've trained a model against a reward, preference, or downstream outcome, and you have good judgment when the signal is noisy, delayed, or expensive. The specific method matters less to us than knowing what goes wrong and how you'd catch it (RL, preference optimization, or reward-shaping; GRPO, PPO, DPO, RLOO, TRL, verl, vLLM).
Transparent Compensation
✔️ Base salary — $190,000 - $280,000
✔️ Equity — Equity package as a core component of total compensation.
✔️ Benefits — comprehensive health, dental, and vision insurance.
✔️ Retirement — 401(k) plan with 6% employer match.
✔️ PTO — Unlimited