Resume
Philip Monk
AI infrastructure engineer and systems programmer who leads large-scale model training and inference. Experienced in distributed systems and ML infrastructure across multiple accelerators.
github.com/philipcmonk, phil@pcmonk.me.
Experience
- Essential AI
- Member of Technical Staff, Infra lead (May 2024 - Present)
Led the infrastructure team, responsible for large-scale training and inference.
- Conducted multiple large-scale training runs on GPUs and TPUs, including Rnj-1, an 8B model trained on 8.4T tokens.
- Set up and maintained a bare-metal Kubernetes cluster of AMD MI300X GPUs.
- Maintained a unified training stack across GPUs and multiple generations of TPUs.
- Implemented and validated additional features in flash attention kernels for GPUs and TPUs.
- Wrote a bespoke dataloader to support full replay determinism and maximum batch diversity.
- Devised burn-in and diagnostic tests which reduced badput for multi-week training runs below 5%.
- Improved compute efficiency under various topologies and at various scales through sweeps, theoretical analysis, and implementing new features (example: an efficient sharding strategy for Muon in Jax)
- Implemented and evaluated various architectural features.
- Conducted trials and evaluations of pre-release accelerators with multiple cloud providers.
- Tlon Corporation
- CTO (October 2018 - March 2024)
Worked on Urbit, an “overlay operating system” for writing decentralized applications. Became technical lead of the infrastructure team in 2020, and CTO from 2021.
- Led an engineering organization of 25 people, designing the architecture at every layer of the stack.
- Wrote or substantially contributed to our compiler, language runtime, build system, networking protocols, and concurrency framework.
- Led the design and implementation of our application environment for 3rd party developers.
- Investigated and fixed numerous performance and reliability issues. Returned to Tlon as an engineer on
- Numerai
- Software engineer (May 2017 - August 2018)
One of a small team of engineers at a hedge fund based on a public machine learning tournament.
- Wrote, managed, and deployed tournament backend.
- Improved tournament design to eliminate exploits and improve submission quality.
- Tlon Corporation
- Software engineer (May 2014 - August 2016)
Worked on all levels of the Urbit system in a team of five.
- Improved our interpreter, allocator, profiler, filesystem, networking protocol, web server, application environment, build system, compiler, and standard libraries.
- Wrote user tutorials, programming tutorials, cookbook-style recipes, and reference documentation.
Open Source Contributions
Author of Phlisped, a graphical programming editor experiment written in Racket.
Co-Author of hoon.vim, syntax highlighting in vim for the (then undocumented) Hoon language.
Education
B.S. in Mathematics from Arizona State University (2012-2014,2017)