VCIP 2022 Program

Day 1: 13-Dec-2022:

8:00 – 12:00: Tutorials A (Coffer Break 30 minutes)


Linear Video Coding and Transmission Schemes for Next Generation Video Applications


l  Anthony Trioux (Univ. Polytechnique Hauts-de-France/INSA Hauts-de-France)

l  Michel Kieffer (Univ. Paris-Saclay)


Representation, Evaluation and Utilities of Point Clouds


l  Weisi Lin (Nanyang Technological University)


Deep Learning for Light Fields


l  Junhui Hou (City University of Hong Kong)


12:00 – 14:00: Lunch Break

14:00 – 17:30: Tutorials B (Coffer Break 30 minutes)


3D Signal Compression and Processing


l  Xianming Liu (Harbin Institute of Technology)

l  Yuanchao Bai (Harbin Institute of Technology)

l  Wenbo Zhao (Peng Cheng Laboratory)

l  Zhenyu Li (Harbin Institute of Technology)


Vision Transformer: More is different


l  Qiming Zhang (The University of Sydney)

l  Yufei Xu (The University of Sydney)

l  Jing Zhang (The University of Sydney)

l  Dacheng Tao (, Inc.)


Visual Content Creation: history, challenges and applications


l  Chenfei Wu (Microsoft Research Asia)


18:00 – 20:00: Reception


Day 2: 14-Dec-2022:

8:30 – 9:00: Opening (Room ?, TBD)

9:00 – 10:00: Keynote 1 (Room ?, TBD)

Keynote Topic: ?

Keynote Speaker:

l  Prof. Chang-Wen Chen (The Hong Kong Polytechnic University)

10:00 – 10:30: Coffee Break

10:30 – 12:00: Oral 1

Room1Machine Learning for Multimedia

1.      One Shot Object Detection Via Hierarchical Adaptive Alignment

2.      BAM: A Bi-directional Attention Module for Masked Face Recognition

3.      MCascade R-CNN: A Modified Cascade R-CNN for Detection of Calcified on Coronary Artery Angiography Images

4.      ACCR: Auto-labeling for Ancient Chinese Handwritten Characters Recognition on CNN

5.      Improved PSP-Net Segmentation Network for Automatic Detection of Neovascularization in Color Fundus Images

6.      Weakly Supervised Region-Level Contrastive Learning for Efficient Object Detection

7.      A Large-scale Sports Tracking Dataset and Progressive Re-detection Based Sports Tracking

8.      PickDet: A Detection Framework for Aerial-view Scene

9.      ML-FDA: Meta-Learning via Feature Distribution Alignment for Few-Shot Learning

Room2Learning Based Compression

1.      Learned Lossless JPEG Transcoding via Joint Lossy and Residual Compression

2.      CNN-Based Post-Processing Filter for Video Compression with Multi-Scale Feature Representation

3.      Neural Frank-Wolfe Policy Optimization for Region-of-Interest Intra-Frame Coding with HEVC/H.265

4.      A Learning-based Approach for Martian Image Compression

5.      Frequency-aware Learned Image Compression for Quality Scalability

6.      Reducing The Mismatch Between Marginal and Learned Distributions in Neural Video Compression

7.      High-frequency guided CNN for video compression artifacts reduction

8.      Autoencoder-based intra prediction with auxiliary feature

9.      On Pre-chewing Compression Degradation for Learned Video Compression


12:00 – 14:00: Lunch Break

14:00 – 18:00: Demos (Hall, TBD)

14:00 – 15:30: Oral 2

Room1Machine Learning for Multimedia

1.      Clothing Retrieval from Class Aware Attention Embedding to KN Loss Learning

2.      DE-CrossDet: Divisible and Extensible Crossline Representation for Object Detection

3.      Mask-Guided Transformer for Human-Object Interaction Detection

4.      ERINet: Effective Rotation Invariant Network for Point Cloud based Place Recognition

5.      CdCLR: Clip-Driven Contrastive Learning for Skeleton-Based Action Recognition

6.      Asynchronous Autoregressive Prediction for Satellite Anomaly Detection

7.      Semantic Compensation Based Dual-Stream Feature Interaction Network for Multi-oriented Scene Text Detection

8.      Annotating Only at Definite Pixels: A Novel Weakly Supervised Semantic Segmentation Method for Sea Fog Recognition

9.      Cross-Layer Feature based Multi-Granularity Visual Classification

Room2Learning Based Compression

1.      End-to-end Image Compression with Swin-Transformer

2.      Rate Controllable Learned Image Compression Based on RFL Model

3.      Deep Reference Frame Interpolation based Inter Prediction Enhancement for Versatile Video Coding

4.      A new way of video compression via forward-referencing using deep learning

5.      Improving Latent Quantization of Learned Image Compression with Gradient Scaling

6.      Multi-stage locally and long-range correlated feature fusion for Learned In-loop Filter in VVC

7.      Generalized Gaussian Distribution Based Distortion Model for the H.266/VVC Video Coder

8.      History-parameter-based Affine Model Inheritance


15:30 – 16:00: Coffee Break

16:00 – 17:00: Keynote 2 (Room ?, TBD)

Keynote Topic: ?

Keynote Speaker:

l  Prof. Mihaela van der Schaar (University of Cambridge)

17:00 – 18:00: Great Challenge

Room1Tire pattern image classification based on lightweight network

Grand Challenge ChairYing Liu (Xi’an University of Posts and Telecommunications)


17:00 - 17:10 Challenge summary, announcing winning teams

Presenter: Ying Liu (Xi’an University of Posts and Telecommunications)

17:10 - 17:20 Presentation from Winning Team 1

Presenter TBD

17:20 - 17:30 Presentation from Winning Team 2

Presenter TBD

17:30 - 17:40 Presentation from Winning Team 3

Presenter TBD

17:40 - 17:50 Presentation from Winning Team 4

Presenter TBD

17:50 - 18:00 Conclusion and taking photos

Room2Practical end-to-end image compression challenge

Grand Challenge ChairLi LiUniversity of Science and Technology of China


17:00 - 17:10 First Track (Coding Performance) - Ranking first team

Presenter TBD

17:10 - 17:20 First Track (Coding Performance) -Ranking second team

Presenter TBD

17:20 - 17:30 Second Track (Decoding Complexity) -Ranking first team

Presenter TBD

17:30 - 17:40 Second Track (Decoding Complexity) -Ranking second team

Presenter TBD

17:40 - 17:50 Third Track (Practical Solution) - Ranking first team

Presenter TBD

17:50 - 18:00 Third Track (Practical Solution) - Ranking second team

Presenter TBD


Day 3: 15-Dec-2022:

9:00 – 10:00: Keynote 3 (Room ?, TBD)

Keynote Topic: ?

Keynote Speaker:

l  Dr. Baining Guo (Microsoft Research)

10:00 – 10:30: Coffee Break

10:30 – 12:00: Oral 3

Room1Machine Learning for Multimedia

1.      On Data Annotation Efficiency for Image Based Crowd Counting

2.      Blood Volume Pulse Signal Extraction based on Spatio-Temporal Low-Rank Approximation for Heart Rate Estimation

3.      Space and Level Cooperation Framework for Pathological Cancer Grading

4.      Dual-stream Self-attention Network for Image Captioning

5.      STSI: Efficiently Mine Spatio-Temporal Semantic Information between Different Multimodal for Video Captioning

6.      Texture-aware Network for Smoke Density Estimation

7.      Identify, Guess and Reconstruct: Three Principles for Cloud Removal Task

8.      MAiVAR: Multimodal Audio-Image and Video Action Recognizer

9.      Blind Gaussian Deep Denoiser Network using Multi-Scale Pixel Attention

Room2Video Coding

1.      Performance Analysis of WebRTC Embedding Optimized HEVC CodeC

2.      An Efficient Content-aware Downsampling-based Video Compression Framework

3.      Fast Inter Prediction Mode Decision Method Based On Random Forest For H.266/VVC

4.      Global Homography Motion Compensation for Versatile Video Coding

5.      Adaptive boundary width of Geometric Partitioning Mode for Beyond Versatile Video Coding

6.      Enhanced motion list reordering for video coding

7.      Fast CU Partition Method Based on Extra Trees for VVC Intra Coding

8.      Efficient Interpolation Filters for Chroma Motion Compensation in Video Coding

9.      Block Importance Mapping for Video Encoding

12:00 – 14:00: Lunch Break

12:00 – 14:00: VSPC-TC Meeting (Room?, TBD)

14:00 – 15:00: Panel 1 (Room ?, TBD)

Panel Title: Intelligent Medical Imaging

Panel Discussion Format:

1.        Each Panelist will first present 8 minutes on his/her intelligent medical imaging research work (48 minutes)

2.        Moderator will ask a few common questions for the panelists to answer (30 minutes)

3.        Open to the audience for more questions (12 minutes)

4.        Each panelist will be asked to share a one-sentence remarks on intelligent medical imaging


Prof. S Kevin Zhou (University of Science & Technology of China)


Prof. Yuan Feng (Shanghai Jiao Tong University  )

Prof. Qian Wang (ShanghaiTech University)

Prof. Yinghuan Shi (Nanjing University)

Prof. Xiahai Zhuang (Fudan University)

Prof. Wenxuan Liang (University of Science & Technology of China)

Prof. Dan Wu (Zhejiang University)


15:00 – 15:30: Coffee Break

15:30 – 17:00: Oral 4

Room1Point Cloud Compress

1.      Residual-based Near-lossless Point Cloud Geometry Compression

2.      A efficient predictive wavelet transform for LiDAR point cloud attribute compression

3.      Geometry Reconstruction for Spatial Scalability in Point Cloud Compression Based on the Prediction of Neighbours’ Weights

4.      RGBD-based Real-time Volumetric Reconstruction System: Architecture Design and Implementation

5.      PCGFormer: Lossy Point Cloud Geometry Compression via Local Self-Attention

6.      Reduced Reference Quality Assessment for Point Cloud Compression

7.      Distribution-aware Low-bit Quantization for 3D Point Cloud Networks

8.      A Fast Motion Estimation Method With Hamming Distance for LiDAR Point Cloud Compression

9.      Azimuth Adjustment Considering LiDAR Calibration for the Predictive Geometry Compression in G-PCC

Room2Quality of Experience

1.      Video Quality Assessment based on Quality Aggregation Networks

2.      No-reference Stereoscopic Image Quality Assessment Based on Parallel Multi-scale Perception

3.      MSCI: A Multi-source Compound Image Database for Compression Distortion Quality Assessment

4.      No Reference Stereoscopic Video Quality Assessment based on Human Vision System

5.      A Fast and Effective Framework for Camera Calibration in Sport Videos

6.      Ultra-High Resolution Image Segmentation with Efficient Multi-Scale Collective Fusion

7.      Multi-information Aggregation Network for Fundus Image Quality Assessment

8.      Semantic Attribute Guided Image Aesthetics Assessment

9.      Quality Assessment of Screen Content Images Based on Multi-Pathway Convolutional Neural Network

18:00 – 20:00: Banquet


Day 4: 16-Dec-2022:

9:00 – 10:00: Keynote 4 (Room ?, TBD)

Keynote Topic: ?

Keynote Speaker:

l  Prof. Dacheng Tao (JD Explore Academy)

10:00 – 10:30: Coffee Break

10:30 – 12:00: Oral 5

Room1Quality of Experience

1.      A Sparsity Analysis of Light Field Signal For Capturing Optimization of Multi-view Images

2.      Spectral Analysis of Aerial Light Field for Optimization Sampling and Rendering of Unmanned Aerial Vehicle

3.      High-Speed Scene Reconstruction from Low-Light Spike Streams

4.      MRIQA: Subjective Method and Objective Model for Magnetic Resonance Image Quality Assessment

5.      Recurrent Network with Enhanced Alignment and Attention-Guided Aggregation for Compressed Video Quality Enhancement

6.      On the Importance of Temporal Dependencies of Weight Updates in Communication Efficient Federated Learning

7.      SAD360: Spherical Viewport-Aware Dynamic Tiling for 360-Degree Video Streaming

8.      Distinguishing Computer-generated Images from Photographic Images: A Texture-Aware deep learning-based Method

9.      Flocking Birds of a Feather Together:\\ Dual-step GAN Distillation via Realer-Fake Samples

Room2Low-level data processing

1.      DesnowFormer: an effective transformer-based image desnowing network

2.      A Comparative Study of Cross-Model Universal Adversarial Perturbation for Face Forgery

3.      A Privacy-Preserving and End-to-End-Based Encrypted Image Retrieval Scheme

4.      Image Inpainting with Frequency Domain Wavelet Convolution

5.      Visual Analysis motivated Super-Resolution Model for Image Reconstruction

6.      Single Image Super-Resolution Using ConvNeXt

7.      Face Super Resolution based on Contrastive Learning

8.      Refine-PU: A Graph Convolutional Point Cloud Upsampling Network using Spatial Refinement

9.      Controllable Space-Time Video Super-Resolution via Enhanced Bidirectional Flow Warping

12:00 – 14:00: lunch Break

14:00 – 15:00: Panel 2 (Room ?, TBD)

Panel Title: Deep Learning based Image and Video Compression

Panel Discussion Format:

1.        Each Panelist will first present 6-10 minutes on the learning based image/video compression he/she is working on (30-45 minutes)

2.        Moderator will ask a few common questions for the panelists to answer (30 minutes)

3.        Open to the audience for more questions (15 minutes)

4.        Each panelist will be asked to share a one-sentence remarks on learning based image/video compression


Prof. Siwei Ma (Peking University)


Prof. Lu Yu (Zhejiang University)

Prof. Zhan Ma (Nanjing University)

Prof. Dong Liu (University of Science and Technology of China)

Dr. Jiaying Liu (Peking University)

Dr. Yan Wang (Tsinghua University)


15:00 – 15:30: Coffee Break

15:30 – 17:00: Oral 6

Room1Special Session

1.      Augmented Normalizing Flow for Point Cloud Geometry Coding

2.      PointNetGeM: Simple and Efficient Point Cloud Based Network for Place Recognition

3.      SparseARFM-SI: Rotary Point Cloud Place Recognition Based on Multi-Resolution and Attention Mechanism


5.      3D Tensor Display for Non-Lambertian Content

6.      Spike Signal Reconstruction Based on Inter-Spike Similarity

7.      Low Light RAW Image Enhancement Using Paired Fast Fourier Convolution and Transformer

8.      Recurrent Multi-connection Fusion Network for Single Image Deraining

9.      Spike Signal Reconstruction Based on Inter-Spike Similarity

Room2Multimedia Content Analysis, Representation, and Understanding

1.      Hierarchical Reinforcement Learning Based Video Semantic Coding for Segmentation

2.      CFNet: A Coarse-to-Fine Network for Few Shot Semantic Segmentation

3.      Robust Dynamic Background Modeling for Foreground Estimation

4.      Mining Regional Relation from Pixel-wise Annotation for Scene Parsing

5.      ENDE-GNN: An Encoder-decoder GNN Framework for Sketch Semantic Segmentation