publications
*co-primary authors
2024
- ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language ModelsOSDI, 2024
- MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE ServingarXiv preprint arXiv:2401.14361, 2024
2023
- TorchOpt: An Efficient Library for Differentiable OptimizationJMLR, 2023
- Optimizing the numbers of queries and replies in convex federated learning with differential privacyIEEE Transactions on Dependable and Secure Computing, 2023
2022
- Ekko: A Large-Scale deep learning recommender system with Low-Latency model updateOSDI, 2022