Selected Publications
(*: equal contribution , †: project lead)
-
GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving
Shakya Jayakody*, Youpeng Zhao*†, Chinmay Dhanraj Nehate, Jun Wang
MLSys 2026
-
A.I.R.: Enabling Adaptive, Iterative, and Reasoning-based Frame Selection For Video Question Answering
Yuanhao Zou, Shengji Jin, Andong Deng, Youpeng Zhao, Jun Wang, Chen Chen
ICLR 2026
[Arxiv]
[Code]
-
Are We Scaling the Right Thing? A System Perspective on Test-Time Reasoning
Youpeng Zhao, Jinpeng LV, Di Wu, Jun Wang, Christopher Gooley
NeurIPS 2025 Efficient Reasoning Workshop
[Arxiv]
-
Classifier Enhancement Using Extended Context and Domain Experts for Semantic Segmentation
Huadong Tang, Youpeng Zhao, Min Xu, Jun Wang, Qiang Wu
IEEE Transactions on Multimedia (TMM)
[Arxiv]
-
MeRino: Entropy-driven Design for Generative Language Models on IoT Devices
Youpeng Zhao, Ming Lin, Huadong Tang, Qiang Wu, Jun Wang
AAAI 2025
[Arxiv][Poster]
-
ALISE: Accelerating Large Language Model Serving with Speculative Scheduling
Youpeng Zhao, Jun Wang
ICCAD 2024
[Arxiv]
-
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
Youpeng Zhao, Di Wu, Jun Wang
ISCA 2024
[Arxiv]
-
Parameter-Efficient Vision Transformer with Linear Attention
Youpeng Zhao, Huadong Tang, Yingying Jiang, Qiang Wu, Jun Wang
ICIP 2023
[Arxiv]