Zhengzhong Tu

Selected Publications

Conditional Diffusion Distillation
Kangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, and Peyman Milanfar
arXiv:2310.01407, 2023
[Arxiv] [Twitter]

A novel conditional distillation method to distill an unconditional diffusion model into a conditional one for faster sampling while maintaining high image quality.

MULLER: Multilayer Laplacian Resizer for Vision
Zhengzhong Tu, Peyman Milanfar, and Hossein Talebi
Proceedings of IEEE International Conference on Computer Vision (ICCV), 2023, Paris
[Arxiv] [Paper (PDF)] [Supp] [Code] [Colab] [Poster] [Twitter] [BibTex] @InProceedings{Tu_2023_ICCV, author = {Tu, Zhengzhong and Milanfar, Peyman and Talebi, Hossein}, title = {MULLER: Multilayer Laplacian Resizer for Vision}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {6877-6887} }

Super lightweight resizer w/ literally a handful of trainable parameters. Plugged into training pipelines, it significantly boosts the performance of the underlying vision task at ~no cost.

V2V4Real: A large-scale real-world dataset for Vehicle-to-Vehicle Cooperative Perception
Runsheng Xu, Xin Xia, Jinlong Li, Hanzhao Li, Shuo Zhang, Zhengzhong Tu, Zonglin Meng, Hao Xiang, Xiaoyu Dong, Rui Song, Hongkai Yu, Bolei Zhou, Jiaqi Ma
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023, Vancouver
[Arxiv] [Paper (PDF)] [Supp] [Project page] [Code] [Video] [BibTex] @inproceedings{xu2023v2v4real, title={V2v4real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception}, author={Xu, Runsheng and Xia, Xin and Li, Jinlong and Li, Hanzhao and Zhang, Shuo and Tu, Zhengzhong and Meng, Zonglin and Xiang, Hao and Dong, Xiaoyu and Song, Rui and others}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={13712--13722}, year={2023} }

CVPR 2023 Highlight (2.5% of 9155 submissions)

V2V4Real is the first large-scale real-world dataset for Vehicle-to-Vehicle (V2V) cooperative perception in autonomous driving.

Pik-Fix: Restoring and Colorizing Old Photos
Runsheng Xu*, Zhengzhong Tu*, Yuanqi Du*, Xiaoyu Dong, Jinlong Li, Zibo Meng, Jiaqi Ma, , Alan Bovik, Hongkai Yu
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, Waikoloa
[Arxiv] [Paper (PDF)] [Supp] [Code] [Data] [Poster] [Twitter] [BibTex] @InProceedings{Xu_2023_WACV, author = {Xu, Runsheng and Tu, Zhengzhong and Du, Yuanqi and Dong, Xiaoyu and Li, Jinlong and Meng, Zibo and Ma, Jiaqi and Bovik, Alan and Yu, Hongkai}, title = {Pik-Fix: Restoring and Colorizing Old Photos}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {1724-1734} }

A novel learning framework that is able to both repair and colorize old, degraded pictures with a first-of-a-kind paired real old photos dataset.

CoBEVT: Cooperative bird's eye view semantic segmentation with sparse transformers
Runsheng Xu*, Zhengzhong Tu*, Hao Xiang, Wei Shao, Bolei Zhou, Jiaqi Ma
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, Waikoloa
[Arxiv] [Paper (PDF)] [OpenReview] [Code] [Data] [Twitter] [知乎] [BibTex] @inproceedings{xu2023cobevt, title={CoBEVT: Cooperative Bird’s Eye View Semantic Segmentation with Sparse Transformers}, author={Xu, Runsheng and Tu, Zhengzhong and Xiang, Hao and Shao, Wei and Zhou, Bolei and Ma, Jiaqi}, booktitle={Conference on Robot Learning}, pages={989--1000}, year={2023}, organization={PMLR} } }

A new cooperative BEV map segmentation transformer that contains 3D fused axial attention (FAX) module with linear complexity, and is generalizable to other tasks.

MaxViT: Multi-Axis Vision Transformer
Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, Yinxiao Li
European conference on computer vision (ECCV), 2022, Tel Aviv
[Arxiv] [Paper (PDF)] [Code] [Colab] [Tensorflow] [Twitter] [Video] [极市直播] [知乎] [BibTex] @inproceedings{tu2022maxvit, title={Maxvit: Multi-axis vision transformer}, author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao}, booktitle={European conference on computer vision}, pages={459--479}, year={2022}, organization={Springer} } }

Highlighted on-top in Jeff Dean's 2022 Annual Google Research Blog; Selected as top-3 papers of the year in Ahead of AI #4: A Big Year for AI; Retweeted by the Yann Lecun: link

A new scalable local-global attention mechanism called multi-axis attention, stacked into a family of hierarchical vision transformer dubbed MaxViT, that attains 86.5% ImageNet-1K top-1 accuracy without extra data and 88.7% top-1 accuracy with ImageNet-21K pre-training.

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer
Runsheng Xu*, Hao Xiang*, Zhengzhong Tu*, Xin Xia, Feng Yang, Ming-Hsuan Yang, Jiaqi Ma
European conference on computer vision (ECCV), 2022, Tel Aviv
[Arxiv] [Paper (PDF)] [Code] [Data] [Twitter] [Video] [知乎] [BibTex] @inproceedings{xu2022v2x, title={V2x-vit: Vehicle-to-everything cooperative perception with vision transformer}, author={Xu, Runsheng and Xiang, Hao and Tu, Zhengzhong and Xia, Xin and Yang, Ming-Hsuan and Ma, Jiaqi}, booktitle={European conference on computer vision}, pages={107--124}, year={2022}, organization={Springer} } }

A holistic vision Transformer that uses heterogeneous multi-agent attention and multi-scale window attention to handle common V2X challenges, including latency, pose errors, etc.

MAXIM: Multi-Axis MLP for Image Processing
Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, Yinxiao Li
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, New Orleans
[Arxiv] [Paper (PDF)] [Supp] [Code] [Colab] [Web Demo] [Twitter] [Slides] [Poster] [Video] [知乎] [BibTex] @InProceedings{Tu_2022_CVPR, author = {Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao}, title = {MAXIM: Multi-Axis MLP for Image Processing}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {5769-5780} } }

Best paper nomination award (0.4% of 8161 submissions)

An MLP-based architecture that can serve as a foundation model for image processing tasks, achieving SoTA performance on >10 benchmarks across broad image processing tasks, including denoising, deblurring, deraining, dehazing, and enhancement.

Subjective Quality Assessment of User-Generated Content Gaming Videos
Xiangxu Yu, Zhengzhong Tu, Zhenqiang Ying, Alan Bovik, Neil Birkbeck, Yilin Wang, Balu Adsumilli
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV workshop), 2022, Waikoloa
[Paper (PDF)] [Database] [BibTex] @InProceedings{Yu_2022_WACV, author = {Yu, Xiangxu and Tu, Zhengzhong and Ying, Zhenqiang and Bovik, Alan C. and Birkbeck, Neil and Wang, Yilin and Adsumilli, Balu}, title = {Subjective Quality Assessment of User-Generated Content Gaming Videos}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {January}, year = {2022}, pages = {74-83} } }

A new UGC gaming video VQA resource, named LIVE-YT-Gaming database, composed of 600 UGC gaming videos and 18,600 subjective quality ratings collected from an online subjective study

RAPIQUE: Rapid and Accurate Video Quality Prediction of User Generated Content
Zhengzhong Tu, Xiangxu Yu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan Bovik
IEEE Open Journal of Signal Processing 2, 425-440, 2021
[Arxiv] [Paper (PDF)] [Code] [BibTex] @article{tu2021rapique, title={RAPIQUE: Rapid and accurate video quality prediction of user generated content}, author={Tu, Zhengzhong and Yu, Xiangxu and Wang, Yilin and Birkbeck, Neil and Adsumilli, Balu and Bovik, Alan C}, journal={IEEE Open Journal of Signal Processing}, volume={2}, pages={425--440}, year={2021}, publisher={IEEE} } }

Highlighted in OJSP 2022 newsletter

A hybrid blind video quality assessment model for user-generated content, that performs comparably to SoTA models but with orders-of-magnitude faster runtime.

UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content
Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan Bovik
IEEE Transactions on Image Processing 30, 4449-4464, 2021
[Arxiv] [Paper (PDF)] [Code] [Benchmark] [BibTex] @article{tu2021ugc, title={UGC-VQA: Benchmarking blind video quality assessment for user generated content}, author={Tu, Zhengzhong and Wang, Yilin and Birkbeck, Neil and Adsumilli, Balu and Bovik, Alan C}, journal={IEEE Transactions on Image Processing}, volume={30}, pages={4449--4464}, year={2021}, publisher={IEEE} } }

The most cited paper published after 2021 in the video quality assessment field

For the first time, we defined and coined the 'UGC-VQA problem', providing comprehensive benchmark, and built a new compact-feature model with SoTA performance.

Adaptive Debanding Filter
Zhengzhong Tu, Jessie Lin, Yilin Wang, Balu Adsumilli, Alan Bovik
IEEE Signal Processing Letters 27, 1715-1719, 2020
[Arxiv] [Paper (PDF)] [Code] [BibTex] @article{tu2020adaptive, title={Adaptive debanding filter}, author={Tu, Zhengzhong and Lin, Jessie and Wang, Yilin and Adsumilli, Balu and Bovik, Alan C}, journal={IEEE Signal Processing Letters}, volume={27}, pages={1715--1719}, year={2020}, publisher={IEEE} } }

A debanding filter that is able to adaptively smooth banded regions while preserving image edges and details, yielding perceptually pleasing results.

BBAND index: A no-reference banding artifact predictor
Zhengzhong Tu, Jessie Lin, Yilin Wang, Balu Adsumilli, Alan Bovik
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020
[Arxiv] [Paper (PDF)] [Code] [BibTex] @inproceedings{tu2020bband, title={Bband index: A no-reference banding artifact predictor}, author={Tu, Zhengzhong and Lin, Jessie and Wang, Yilin and Adsumilli, Balu and Bovik, Alan C}, booktitle={ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={2712--2716}, year={2020}, organization={IEEE} } }

A new distortion-specific no-reference video quality model for predicting banding artifacts in compressed videos.