Philip Monk

AI infrastructure engineer and systems programmer who leads large-scale model training and inference. Experienced in distributed systems and ML infrastructure across multiple accelerators.

github.com/philipcmonk, phil@pcmonk.me.

Experience

Essential AI
Member of Technical Staff, Infra lead (May 2024 - Present)

Led the infrastructure team, responsible for large-scale training and inference.

  • Conducted multiple large-scale training runs on GPUs and TPUs, including Rnj-1, an 8B model trained on 8.4T tokens.
  • Set up and maintained a bare-metal Kubernetes cluster of AMD MI300X GPUs.
  • Maintained a unified training stack across GPUs and multiple generations of TPUs.
  • Implemented and validated additional features in flash attention kernels for GPUs and TPUs.
  • Wrote a bespoke dataloader to support full replay determinism and maximum batch diversity.
  • Devised burn-in and diagnostic tests which reduced badput for multi-week training runs below 5%.
  • Improved compute efficiency under various topologies and at various scales through sweeps, theoretical analysis, and implementing new features (example: an efficient sharding strategy for Muon in Jax)
  • Implemented and evaluated various architectural features.
  • Conducted trials and evaluations of pre-release accelerators with multiple cloud providers.
Tlon Corporation
CTO (October 2018 - March 2024)

Worked on Urbit, an “overlay operating system” for writing decentralized applications. Became technical lead of the infrastructure team in 2020, and CTO from 2021.

  • Led an engineering organization of 25 people, designing the architecture at every layer of the stack.
  • Wrote or substantially contributed to our compiler, language runtime, build system, networking protocols, and concurrency framework.
  • Led the design and implementation of our application environment for 3rd party developers.
  • Investigated and fixed numerous performance and reliability issues. Returned to Tlon as an engineer on
Numerai
Software engineer (May 2017 - August 2018)

One of a small team of engineers at a hedge fund based on a public machine learning tournament.

  • Wrote, managed, and deployed tournament backend.
  • Improved tournament design to eliminate exploits and improve submission quality.
Tlon Corporation
Software engineer (May 2014 - August 2016)

Worked on all levels of the Urbit system in a team of five.

  • Improved our interpreter, allocator, profiler, filesystem, networking protocol, web server, application environment, build system, compiler, and standard libraries.
  • Wrote user tutorials, programming tutorials, cookbook-style recipes, and reference documentation.

Open Source Contributions

Author of Phlisped, a graphical programming editor experiment written in Racket.

Co-Author of hoon.vim, syntax highlighting in vim for the (then undocumented) Hoon language.

Education

B.S. in Mathematics from Arizona State University (2012-2014,2017)