profile photo

Zhuoyang Zhang    张焯扬

I'm a first-year Ph.D. student at MIT EECS, advised by Prof. Song Han. I am mainly interested in vision-centric efficient machine learning, especially for foundation models. Previously, I obtained my Bachelor's degree in computer science from Yao Class, Tsinghua University. During my undergraduate study, I was fortunate to work with Prof. Li Yi and Prof. Hao Su on 3D computer vision.

Research

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Yecheng Wu*, Zhuoyang Zhang*, Junyu Chen, Haotian Tang, Dacheng Li, Yunhao Fang, Ligeng Zhu, Enze Xie, Hongxu Yin, Li Yi, Song Han, Yao Lu
ICLR 2025
[paper] [code] [online demo]
EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss
Zhuoyang Zhang, Han Cai, Song Han
CVPR 2024 ELVM Workshop
[paper] [code] [online demo]
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Minghua Liu*, Ruoxi Shi*, Linghao Chen*, Zhuoyang Zhang*, Chao Xu*, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, Hao Su
CVPR 2024
[paper] [website]
Complete-to-partial 4D distillation for self-supervised point cloud sequence representation learning
Zhuoyang Zhang*, Yuhao Dong*, Yunze Liu, Li Yi
CVPR 2023
[paper] [code]
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, Ankur Handa, Ming-Yu Liu, Donglai Xiang, Gordon Wetzstein, Tsung-Yi Lin
CVPR 2025
[paper] [website]
NVILA: Efficient Frontier Visual Language Models
Zhijian Liu*, Ligeng Zhu*, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yuxian Gu, Dacheng Li, Xiuyu Li, Yunhao Fang, Yukang Chen, Cheng-Yu Hsieh, De-An Huang, An-Chieh Cheng, Vishwesh Nath, Jinyi Hu, Sifei Liu, Ranjay Krishna, Daguang Xu, Xiaolong Wang, Pavlo Molchanov, Jan Kautz, Hongxu Yin, Song Han, Yao Lu
CVPR 2025
[paper] [code] [demo] [website]
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Yecheng Wu*, Zhuoyang Zhang*, Junyu Chen, Haotian Tang, Dacheng Li, Yunhao Fang, Ligeng Zhu, Enze Xie, Hongxu Yin, Li Yi, Song Han, Yao Lu
ICLR 2025
[paper] [code] [online demo]
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Haotian Tang, Yecheng Wu, Shang Yang, Enze Xie, Junsong Chen, Junyu Chen, Zhuoyang Zhang, Han Cai, Yao Lu, Song Han
ICLR 2025
[paper] [code] [online demo]
EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss
Zhuoyang Zhang, Han Cai, Song Han
CVPR 2024 ELVM Workshop
[paper] [code] [online demo]
Sparse Refinement for Efficient High-resolution Semantic Segmentation
Zhijian Liu*, Zhuoyang Zhang*, Samir Khaki, Shang Yang, Haotian Tang, Chenfeng Xu, Kurt Keutzer, Song Han
ECCV 2024
[paper] [code]
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Minghua Liu*, Ruoxi Shi*, Linghao Chen*, Zhuoyang Zhang*, Chao Xu*, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, Hao Su
CVPR 2024
[paper] [website]
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, Hao Su
Technical Report
[paper] [code]
Complete-to-partial 4D distillation for self-supervised point cloud sequence representation learning
Zhuoyang Zhang*, Yuhao Dong*, Yunze Liu, Li Yi
CVPR 2023
[paper] [code]

Academic Service

  • Conference reviewer: ICLR, ICML, NeurIPS, CVPR, ICCV, ECCV, etc.