Yao Fu (符尧)

1.45, Informatics Forum

Edinburgh, EH8 9AB

Scotland, UK

I am a Ph.D. student in Computer Science at The University of Edinburgh, supervised by Prof. Luo Mai. I received my B.Eng. degree in Computer Science and Technology from Sun Yat-sen University in June 2021. I was supervised by Prof. Di Wu at Sun Yat-sen University as a member of Yat-sen Honor School.

I study the intersection of machine learning and distributed systems. My goal is to build efficient systems for the large-scale deployment of machine learning models. My current research focuses on the efficient inference of large language models in serverless computing clusters.

news

Mar 28, 2025	I’ll be giving a tutorial and demo on ServerlessLLM at the SESAME’25 workshop, co-located with ASPLOS and EuroSys in Rotterdam on March 31. I’ll also be attending the main conferences—let me know if you’d like to connect!
Jan 22, 2025	I’ll be visiting University of Pennsylvania (29 Jan) and Columbia University to present our latest work on Serverless AI, focusing on the efficient sharing of AI infrastructures. Check details here. I’ll be in Philly, DC, and NYC from Jan 27 to Feb 5. If you’re in the area and interested in having a chat, let’s meet up!
Oct 01, 2024	I’m honored to serve as a reviewer for IEEE Transactions on Mobile Computing (TMC). I’m excited about the opportunity to contribute to the community in this new role!
May 16, 2024	I’m selected as one of the ML and Systems Rising Stars! Thanks to everyone who has supported me along the way! I’ll be attending the workshop at NVIDIA’s headquarters in Santa Clara, CA, on July 15-16.
Mar 21, 2024	Our paper “ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models” has been accepted to OSDI 2024. Preprint available on ArXiv. Code will be released soon. Stay tuned!

publications

ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models

Yao Fu , Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete , Dmitrii Ustiugov, Yuvraj Patel, and Luo Mai

OSDI, 2024
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving

Leyang Xue, Yao Fu , Zhan Lu , Luo Mai, and Mahesh Marina

arXiv preprint arXiv:2401.14361, 2024
TorchOpt: An Efficient Library for Differentiable Optimization

Jie Ren*, Xidong Feng* , Bo Liu* , Xuehai Pan* , Yao Fu , Luo Mai, and Yaodong Yang

JMLR, 2023
Ekko: A Large-Scale deep learning recommender system with Low-Latency model update

Chijun Sima*, Yao Fu* , Man-Kit Sit, Liyi Guo , Xuri Gong , Feng Lin , Junyu Wu , Yongsheng Li , Haidong Rong , Pierre-Louis Aublin , and Luo Mai

OSDI, 2022