Yao Fu (符 尧)

Deep Learning Engineer at NVIDIA

prof_pic.jpg

I am a Deep Learning Engineer at NVIDIA. I received my Ph.D. in Computer Science from The University of Edinburgh, supervised by Prof. Luo Mai. I received my B.Eng. degree in Computer Science and Technology from Sun Yat-sen University in June 2021. I was supervised by Prof. Di Wu at Sun Yat-sen University as a member of Yat-sen Honor School.

I study the intersection of machine learning and distributed systems. My goal is to build efficient systems for the large-scale deployment of machine learning models. My current research focuses on the efficient inference of large language models in serverless computing clusters.

news

Jan 13, 2026 I have successfully graduated from the University of Edinburgh with my PhD and joined NVIDIA as a Deep Learning Engineer. Excited for the new chapter!
Sep 26, 2025 Our paper “MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems” has been accepted to the NeurIPS 2025 Datasets and Benchmarks Track. Sicheng will be presenting in San Diego, CA!
Mar 28, 2025 I’ll be giving a tutorial and demo on ServerlessLLM at the SESAME’25 workshop, co-located with ASPLOS and EuroSys in Rotterdam on March 31. I’ll also be attending the main conferences—let me know if you’d like to connect!

publications

  1. ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models
    Yao FuLeyang XueYeqi Huang, Andrei-Octavian Brabete , Dmitrii UstiugovYuvraj Patel, and Luo Mai
    OSDI, 2024
  2. MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
    Leyang XueYao Fu , Zhan Lu , Luo Mai, and Mahesh Marina
    arXiv preprint arXiv:2401.14361, 2024
  3. TorchOpt: An Efficient Library for Differentiable Optimization
    Jie Ren*, Xidong Feng* , Bo Liu* , Xuehai Pan* , Yao FuLuo Mai, and Yaodong Yang
    JMLR, 2023
  4. Ekko: A Large-Scale deep learning recommender system with Low-Latency model update
    Chijun Sima*Yao Fu*Man-Kit Sit, Liyi Guo , Xuri Gong , Feng Lin , Junyu Wu , Yongsheng Li , Haidong Rong , Pierre-Louis Aublin , and Luo Mai
    OSDI, 2022
  5. MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
    Yinsicheng Jiang*Yao Fu*Yeqi Huang*, Ping Nie , Zhan Lu , Leyang Xue, Congjie He , Man-Kit Sit , Jilong Xue , Li Dong , Ziming Miao , DaYou Du , Tairan Xu , Kai Zou , Edoardo Maria Ponti , and Luo Mai
    In NeurIPS , 2025