Meeting ID: 838 7213 1099
Passcode: 668523
It will be opened during class time. Please feel free to ask questions in the chat. I will answer in a direct message or give an explanation for all members.

Report Submission:
UY Moodle – HDU-UY-DD Visual Information Processing

All resources are provided to class participants only. Don’t distribute them to outside parties.
Password for all resources is cvdd.

Schedule (PDF)

Slot, day Contents
01 4th, Feb 22 Introduction
Slides (PDF)
TED ~ Fei-Fei Li (17:58)
Settings – Google Colaboratory | Anaconda (PDF)
Video – Introduction of OpenCV x Colab, home.jpg for setting verification
Getting Started With Google Colab (Google-free)
How to use Google Colaboratory (YouTube)
Computer Vision – Instructional Exercise (Colab)
02 1st, Feb 23 Image preprocessing and feature extraction (Chapter 13)
Slides 13 (PDF)
Video – Chapter 13-1 (51:06)
Homework: Select one paper, put the title as a comment for Assignment 3.
03 3rd, Feb 28 Video – Chapter 13-2 (67:08)
The Pinhole camera (Chapter 14)
Slides 14 (PDF)
Video – Chapter 14 (60:04)
04 4th, Feb 28 Models for transformation (Chapter 15)
Slides 15 (PDF)
Video – Chapter 15 (48:36)
05 4th, Mar 1 Multiple cameras (Chapter 16)
Slides 16 (PDF)
Video – Chapter 16 (52:02)
06 1st, Mar 2 Assessment 1: Stereo Matching (PDF)
Image set – (35.5MB)
Video – Assignment1 (38:53, Partially overlapped with Chapter 16)
07 3rd, Mar 7 Models for shape (Chapter 17)
Slides 17 (PDF)
Video – Chapter 17 (54:32)
08 4th, Mar 7 Assignment 2: Deep monocular depth estimation (PDF)
Video – Assignment2 (7:50)
09 4th, Mar 8 Assignment 3: Literature review (1) (PDF)
Video – Assignment3 (3:51)
10 1st, Mar 9 Assignment 3: Literature review (2) & Assignment 2
11 3rd, Mar 14 Assignment 3: Literature review (3)
12 4th, Mar 14 Assignment 3: Literature review (extra), Assignment 2

* 1st: 8:00-10:00, 2nd: 10:10-12:10, 3rd: 13:00-15:00, 4th: 15:10-17:10
(+1:00 in JST, 1st from 9:00-)

I want to assume you can use Google Colaboratory, as a background for python-opencv implementation, although I know well about your situation in China. If it is not available for some members, please use Anaconda or other environments. I hope another student will help the students.
If you can access Google tools, please follow the instructions for starting-up Google Colaboratory.

Literature review

You can input the whole title for searching the paper in Google Scholar.

Learning Graph Embeddings for Compositional Zero-shot Learning

Yan Jinzhe(燕 劲哲)
Point Cloud Upsampling via Disentangled Refinement

Zhong Weizhen(钟 维真)
Closed-Form Factorization of Latent Semantics in GANs

Chen Zhoujie(陈 洲杰)
PointGuard: Provably Robust 3D Point Cloud Classification

Gong Aiyue(龚 嫒玥)
Composing Photos Like a Photographer

Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling

Tan Xuan
Encoding in Style: A StyleGAN Encoder for Image-to-Image Translation

Zhang Xinyuan
Track to Detect and Segment: An Online Multi-Object Tracker

Xi Wenlong(席 文龙)
Pose Recognition with Cascade Transformers

Wu JingTao(吴景涛)
End-to-End Video Instance Segmentation with Transformers

Zhang Xi
Transformer Interpretability Beyond Attention Visualization
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers (ICCV2021)

Zhao Haifeng
Improving Sign Language Translation With Monolingual Data by Sign Back-Translation

Chen Zhe(陈 哲)
Depth from Camera Motion and Object Detection

Zhang Yixin
VirFace: Enhancing Face Recognition via Unlabeled Shallow Data

Zhou Haiqiang(周 海强)
CutPaste: Self-Supervised Learning for Anomaly Detection and Localization

Jiang Huasheng (蒋华胜)
Real-Time High-Resolution Background Matting

Zheng Haohao (郑 浩浩)
Distilling Knowledge via Knowledge Review

Bai Yizhuo(白 依卓)
D-NeRF: Neural Radiance Fields for Dynamic Scenes

Zhang Boyang
Body Meshes as Point

Su Xianhua(苏 先华)
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers


You can ask any questions on Slack #vip.
If you have trouble in implementation, ask Teaching Assistant (Prof. Yu Zhengsheng; at first.


This course offers opportunity to learn 2D and 3D computer vision and computer graphics. We do so by combining such fundamentals as image processing, computational geometry, machine learning, numerical computation, linear algebra, and others.

Methods to realize tracking, recognition, and other functions on 2D images will be discussed at first. To be more specific, feature detection and image descriptor for images, machine learning algorithms for recognition or classification of images are discussed. OpenCV, the de facto standard library of computer vision, will support the students to understand and prototype the methods.

Camera calibration for 3D depth estimation is the main topic of the next part. Special camera models and human vision model are also our concern.

In the last part, the student understands and explains several state-of-the-art algorithms for computer vision and computer graphics. The student is able to research newly available computer vision and computer graphics on his/her own to implement and benefit from the methods.

The student must have command of linear algebra and calculus, skills in programming (e.g., by using python, C++, or MATLAB), as well as understanding of important algorithms and data structures. One should also know basics of image processing techniques (e.g., image filtering) and machine learning (e.g., clustering, classifiers such as Support Vector Machines, dimensionality reduction, or regression).

Grading Policy

Technical report that involves programming (75%+5%)
Stereo matching (Assignment 1)
Deep monocular depth estimation (Assignment 2)

Literature review and presentation (15%+5%) (Assignment 3)

* No classwork (10%) for online class


Computer Vision: Models, Learning and Inference
Simon J.D. Prince – Chapters 13-20

OpenCV-Python Tutorials


OpenCV 3.x with Python By Example
Gabriel Garrido and Prateek Joshi

Learning OpenCV 3
Adrian Kaehler and Gary Bradski



Slack #vip in HDU-UY-DD-2022
Masahiro Toyoura (