VCIP 2022 Program

Day 1: 13-Dec-2022:

8:00 – 11:40: Tutorials A (Coffer Break 10 minutes)


3D Signal Compression and Processing

Session ChairJin Zeng (Tongji University)


·   Xianming Liu (Harbin Institute of Technology)

·   Yuanchao Bai (Harbin Institute of Technology)

·   Wenbo Zhao (Peng Cheng Laboratory)

·   Zhenyu Li (Harbin Institute of Technology)


Vision Transformer: More is different

Session ChairYuanfang Guo (Beihang University)


·   Qiming Zhang (The University of Sydney)

·   Yufei Xu (The University of Sydney)

·   Jing Zhang (The University of Sydney)

·   Dacheng Tao (, Inc.)


Visual Content Creation: history, challenges and applications.

Session ChairXinfeng Zhang (University of Chinese Academy of Sciences)


·   Chenfei Wu (Microsoft Research Asia)

·   Nan Duan (Microsoft Research Asia)


12:00 – 14:00: Lunch Break

14:00 – 17:10: Tutorials B (Coffer Break 10 minutes)


Linear Video Coding and Transmission Schemes for Next Generation Video Applications

Session ChairZhanyu Ma (Beijing University of Posts and Telecommunications)


·   Anthony Trioux (Univ. Polytechnique Hauts-de-France/INSA Hauts-de-France)

·   François-Xavier Coudoux (Univ. Polytechnique Hauts-de-France/INSA Hauts-de-France)

·   Marco Cagnazzo (Institut Polytechnique de Paris/University of Padua)

·   Michel Kieffer (Univ. Paris-Saclay)


Representation, Evaluation and Utilities of Point Clouds

Session ChairRuiping Wang (Institute of Computing Technology, Chinese Academy of Sciences)


·   Weisi Lin (Nanyang Technological University)


Deep Learning for Light Fields

Session ChairSiheng Chen (Shanghai Jiao Tong University)


·   Junhui Hou (City University of Hong Kong)


17:15 – 17:50: Demo Session (Room 1)

Session Chair: Li Li (University of Science and Technology of China)

1.      FPX-NVC: An FPGA-Accelerated P-frame Based Neural Video Coding System

2.      Real-time Learned Image Codec on FPGA

3.      SalCrop: Spatio-temporal Saliency Based Video Cropping

4.      Intelligent Reflection Elimination Imaging Device based on Polarizer

5.      Portable Eye Movement Feature Collection Device for Children with Autism

6.      Quality-Constant Per-Shot Encoding by Two-Pass Learning-based Rate Factor Prediction

Day 2: 14-Dec-2022:

8:30 – 9:00: Opening (Room 1)

9:00 – 10:00: Keynote 1 (Room 1)

Keynote Topic: Contemporary Visual Computing: A System Perspective

Keynote Speaker:

Prof. Chang-Wen Chen (The Hong Kong Polytechnic University)

Session ChairJingjing Meng (University at Buffalo, SUNY)

10:00 – 10:10: Coffee Break

10:10 – 11:40: Oral 1

Room1Machine Learning for Multimedia

Session Chair: Mengyuan Liu (Sun Yat-sen University)

1.      One Shot Object Detection Via Hierarchical Adaptive Alignment

2.      BAM: A Bi-directional Attention Module for Masked Face Recognition

3.      MCascade R-CNN: A Modified Cascade R-CNN for Detection of Calcified on Coronary Artery Angiography Images

4.      ACCR: Auto-labeling for Ancient Chinese Handwritten Characters Recognition on CNN

5.      Improved PSP-Net Segmentation Network for Automatic Detection of Neovascularization in Color Fundus Images

6.      Weakly Supervised Region-Level Contrastive Learning for Efficient Object Detection

7.      A Large-scale Sports Tracking Dataset and Progressive Re-detection Based Sports Tracking

8.      PickDet: A Detection Framework for Aerial-view Scene

9.      ML-FDA: Meta-Learning via Feature Distribution Alignment for Few-Shot Learning

Room2Learning Based Compression

Session Chair: Weiqi Yan (Auckland University of Technology)

7.      Learned Lossless JPEG Transcoding via Joint Lossy and Residual Compression

8.      CNN-Based Post-Processing Filter for Video Compression with Multi-Scale Feature Representation

9.      Neural Frank-Wolfe Policy Optimization for Region-of-Interest Intra-Frame Coding with HEVC/H.265

10.   A Learning-based Approach for Martian Image Compression

11.   Frequency-aware Learned Image Compression for Quality Scalability

12.   Reducing The Mismatch Between Marginal and Learned Distributions in Neural Video Compression

13.   High-frequency guided CNN for video compression artifacts reduction

14.   Autoencoder-based intra prediction with auxiliary feature

15.   On Pre-chewing Compression Degradation for Learned Video Compression


12:00 – 14:00: Lunch Break

14:00 – 15:30: Oral 2

Room1Machine Learning for Multimedia

Session Chair: Yongxin Ge (Chongqing University)

1.      Clothing Retrieval from Class Aware Attention Embedding to KN Loss Learning

2.      DE-CrossDet: Divisible and Extensible Crossline Representation for Object Detection

3.      Mask-Guided Transformer for Human-Object Interaction Detection

4.      ERINet: Effective Rotation Invariant Network for Point Cloud based Place Recognition

5.      CdCLR: Clip-Driven Contrastive Learning for Skeleton-Based Action Recognition

6.      Asynchronous Autoregressive Prediction for Satellite Anomaly Detection

7.      Semantic Compensation Based Dual-Stream Feature Interaction Network for Multi-oriented Scene Text Detection

8.      Annotating Only at Definite Pixels: A Novel Weakly Supervised Semantic Segmentation Method for Sea Fog Recognition

9.      Cross-Layer Feature based Multi-Granularity Visual Classification

Room2Learning Based Compression

Session Chair: Ye Luo (Tongji University)

1.      End-to-end Image Compression with Swin-Transformer

2.      Rate Controllable Learned Image Compression Based on RFL Model

3.      Deep Reference Frame Interpolation based Inter Prediction Enhancement for Versatile Video Coding

4.      Human pose-based video compression via forward-referencing using deep learning.

5.      Improving Latent Quantization of Learned Image Compression with Gradient Scaling

6.      Multi-stage locally and long-range correlated feature fusion for Learned In-loop Filter in VVC

7.      Generalized Gaussian Distribution Based Distortion Model for the H.266/VVC Video Coder

8.      History-parameter-based Affine Model Inheritance


15:30 – 16:00: Coffee Break

16:00 – 17:00: Keynote 2 (Room 1)

Keynote Topic: New frontiers in machine learning interpretability

Keynote Speaker:

Prof. Mihaela van der Schaar (University of Cambridge)

Session ChairMathias Wien (RWTH Aachen University)

17:00 – 18:00: Grand Challenge

Room1Tire pattern image classification based on lightweight network

Grand Challenge ChairYing Liu (Xi’an University of Posts and Telecommunications)


17:00 - 17:10 Challenge summary, announcing winning teams

Presenter: Ying Liu (Xi’an University of Posts and Telecommunications)

17:10 - 17:20 Presentation from Winning Team 1

Presenter TBD

17:20 - 17:30 Presentation from Winning Team 2

Presenter TBD

17:30 - 17:40 Presentation from Winning Team 3

Presenter TBD

17:40 - 17:50 Presentation from Winning Team 4

Presenter TBD

17:50 - 18:00 Conclusion and taking photos

Room2Practical end-to-end image compression challenge

Grand Challenge ChairLi LiUniversity of Science and Technology of China


17:00 - 17:10 First Track (Coding Performance) - Ranking first team

Presenter TBD

17:10 - 17:20 First Track (Coding Performance) -Ranking second team

Presenter TBD

17:20 - 17:30 Second Track (Decoding Complexity) -Ranking first team

Presenter TBD

17:30 - 17:40 Second Track (Decoding Complexity) -Ranking second team

Presenter TBD

17:40 - 17:50 Third Track (Practical Solution) - Ranking first team

Presenter TBD

17:50 - 18:00 Third Track (Practical Solution) - Ranking second team

Presenter TBD


Day 3: 15-Dec-2022:

9:00 – 10:00: Keynote 3 (Room 1)

Keynote Topic: The future of video communication

Keynote Speaker:

Dr. Baining Guo (Microsoft Reasearch)

Session ChairJiwen Lu (Tsinghua University)

10:00 – 10:10: Coffee Break

10:10 – 11:40: Oral 3

Room1Machine Learning for Multimedia

Session Chair: Yansong Tang (Tsinghua University)

1.      On Data Annotation Efficiency for Image Based Crowd Counting

2.      Blood Volume Pulse Signal Extraction based on Spatio-Temporal Low-Rank Approximation for Heart Rate Estimation

3.      Space and Level Cooperation Framework for Pathological Cancer Grading

4.      Dual-stream Self-attention Network for Image Captioning

5.      STSI: Efficiently Mine Spatio-Temporal Semantic Information between Different Multimodal for Video Captioning

6.      Texture-aware Network for Smoke Density Estimation

7.      Identify, Guess and Reconstruct: Three Principles for Cloud Removal Task

8.      MAiVAR: Multimodal Audio-Image and Video Action Recognizer

9.      Blind Gaussian Deep Denoiser Network using Multi-Scale Pixel Attention

Room2Video Coding

Session Chair: Cheolkon Jung (Xidian University)

1.      Performance Analysis of WebRTC Embedding Optimized HEVC CodeC

2.      An Efficient Content-aware Downsampling-based Video Compression Framework

3.      Fast Inter Prediction Mode Decision Method Based on Random Forest For H.266/VVC

4.      Global Homography Motion Compensation for Versatile Video Coding

5.      Adaptive boundary width of Geometric Partitioning Mode for Beyond Versatile Video Coding

6.      Enhanced motion list reordering for video coding

7.      Fast CU Partition Method Based on Extra Trees for VVC Intra Coding

8.      Efficient Interpolation Filters for Chroma Motion Compensation in Video Coding

9.      Block Importance Mapping for Video Encoding

12:00 – 14:00: Lunch Break

12:00 – 14:00: VSPC-TC Meeting (Room 1)

14:00 – 15:00: Panel 1 (Room1)

Panel Title: Intelligent Medical Imaging

Panel Discussion Format:

1.        Each Panelist will first present 8 minutes on his/her intelligent medical imaging research work (48 minutes)

2.        Moderator will ask a few common questions for the panelists to answer (30 minutes)

3.        Open to the audience for more questions (12 minutes)

4.        Each panelist will be asked to share a one-sentence remarks on intelligent medical imaging


Prof. S Kevin Zhou (University of Science & Technology of China)


Prof. Yuan Feng (Shanghai Jiao Tong University)

Prof. Qian Wang (ShanghaiTech University)

Prof. Yinghuan Shi (Nanjing University)

Prof. Xiahai Zhuang (Fudan University)

Prof. Wenxuan Liang (University of Science & Technology of China)

Prof. Dan Wu (Zhejiang University)


15:00 – 15:10: Coffee Break

15:10 – 16:40: Oral 4

Room1Point Cloud Compress

Session Chair: Zheng Zhu (PhiGent Robotics)

1.      Near-lossless Point Cloud Geometry Compression Based on Adaptive Residual Compensation

2.      A efficient predictive wavelet transform for LiDAR point cloud attribute compression

3.      Geometry Reconstruction for Spatial Scalability in Point Cloud Compression Based on the Prediction of Neighbours’ Weights

4.      RGBD-based Real-time Volumetric Reconstruction System: Architecture Design and Implementation

5.      PCGFormer: Lossy Point Cloud Geometry Compression via Local Self-Attention

6.      Reduced Reference Quality Assessment for Point Cloud Compression

7.      Distribution-aware Low-bit Quantization for 3D Point Cloud Networks

8.      A Fast Motion Estimation Method With Hamming Distance for LiDAR Point Cloud Compression

9.      Azimuth Adjustment Considering LiDAR Calibration for the Predictive Geometry Compression in G-PCC

Room2Quality of Experience

Session Chair: Junlin Hu (Beihang University)

1.      Video Quality Assessment based on Quality Aggregation Networks

2.      No-reference Stereoscopic Image Quality Assessment Based on Parallel Multi-scale Perception

3.      MSCI: A Multi-source Compound Image Database for Compression Distortion Quality Assessment

4.      No Reference Stereoscopic Video Quality Assessment based on Human Vision System

5.      A Fast and Effective Framework for Camera Calibration in Sport Videos

6.      Ultra-High Resolution Image Segmentation with Efficient Multi-Scale Collective Fusion

7.      Multi-information Aggregation Network for Fundus Image Quality Assessment

8.      Semantic Attribute Guided Image Aesthetics Assessment

9.      Quality Assessment of Screen Content Images Based on Multi-Pathway Convolutional Neural Network

17:00 – 17:30: Award Ceremony Room 1


Day 4: 16-Dec-2022:

9:00 – 10:00: Keynote 4 (Room 1)

Keynote Topic: More Is Different: ViTAE elevates the art of computer vision

Keynote Speaker:

Prof. Dacheng Tao (JD Explore Academy)

Session ChairZhu Li (University of Missouri-Kansas City)

10:00 – 10:10: Coffee Break

10:10 – 11:40: Oral 5

Room1Quality of Experience

Session Chair: Jiahuan Zhou (Peking University)

1.      A Sparsity Analysis of Light Field Signal For Capturing Optimization of Multi-view Images

2.      Spectral Analysis of Aerial Light Field for Optimization Sampling and Rendering of Unmanned Aerial Vehicle

3.      High-Speed Scene Reconstruction from Low-Light Spike Streams

4.      MRIQA: Subjective Method and Objective Model for Magnetic Resonance Image Quality Assessment

5.      Recurrent Network with Enhanced Alignment and Attention-Guided Aggregation for Compressed Video Quality Enhancement

6.      On the Importance of Temporal Dependencies of Weight Updates in Communication Efficient Federated Learning

7.      SAD360: Spherical Viewport-Aware Dynamic Tiling for 360-Degree Video Streaming

8.      Distinguishing Computer-generated Images from Photographic Images: A Texture-Aware deep learning-based Method

9.      Flocking Birds of a Feather Together: Dual-step GAN Distillation via Realer-Fake Samples

Room2Low-level data processing

Session Chair: Yue Zhao (Chongqing University of Posts and Telecommunications)

1.      DesnowFormer: an effective transformer-based image desnowing network

2.      A Comparative Study of Cross-Model Universal Adversarial Perturbation for Face Forgery

3.      A Privacy-Preserving and End-to-End-Based Encrypted Image Retrieval Scheme

4.      Image Inpainting with Frequency Domain Wavelet Convolution

5.      Visual Analysis motivated Super-Resolution Model for Image Reconstruction

6.      Single Image Super-Resolution Using ConvNeXt

7.      Face Super Resolution based on Contrastive Learning

8.      Refine-PU: A Graph Convolutional Point Cloud Upsampling Network using Spatial Refinement

9.      Controllable Space-Time Video Super-Resolution via Enhanced Bidirectional Flow Warping

12:00 – 14:00: Lunch Break

14:00 – 15:00: Panel 2 (Room 1)

Panel Title: Deep Learning based Image and Video Compression

Panel Discussion Format:

1.        Each Panelist will first present 6-10 minutes on the learning based image/video compression he/she is working on (30-45 minutes)

2.        Moderator will ask a few common questions for the panelists to answer (30 minutes)

3.        Open to the audience for more questions (15 minutes)

4.        Each panelist will be asked to share a one-sentence remarks on learning based image/video compression


Prof. Siwei Ma (Peking University)


Prof. Lu Yu (Zhejiang University)

Prof. Zhan Ma (Nanjing University)

Prof. Dong Liu (University of Science and Technology of China)

Dr. Jiaying Liu (Peking University)

Dr. Yan Wang (Tsinghua University)


15:00 – 15:10: Coffee Break

15:10 – 16:40: Oral 6

Room1Special Session

Session Chair: Yueqi Duan (Tsinghua University)

1.      Augmented Normalizing Flow for Point Cloud Geometry Coding

2.      PointNetGeM: Simple and Efficient Point Cloud Based Network for Place Recognition

3.      SparseARFM-SI: Rotary Point Cloud Place Recognition Based on Multi-Resolution and Attention Mechanism

4.      Dynamic Mesh Commonality Modeling Using The Cuboidal Partitioning

5.      3D Tensor Display for Non-Lambertian Content

6.      Spike Signal Reconstruction Based on Inter-Spike Similarity

7.      Low Light RAW Image Enhancement Using Paired Fast Fourier Convolution and Transformer

8.      Recurrent Multi-connection Fusion Network for Single Image Deraining

Room2Multimedia Content Analysis, Representation, and Understanding

Session Chair: Jinglin Xu (University of Science and Technology Beijing)

1.      Hierarchical Reinforcement Learning Based Video Semantic Coding for Segmentation

2.      CFNet: A Coarse-to-Fine Network for Few Shot Semantic Segmentation

3.      Robust Dynamic Background Modeling for Foreground Estimation

4.      Mining Regional Relation from Pixel-wise Annotation for Scene Parsing

5.      ENDE-GNN: An Encoder-decoder GNN Framework for Sketch Semantic Segmentation

6.      Learning from the NN-based Compressed Domain with Deep Feature Reconstruction Loss