Description

This course studies learning visual representations for common computer vision tasks including matching, retrieval, classification, and object detection. The course discusses well-known methods from low-level description to intermediate representation, and their dependence on the end task. It then studies a data-driven approach where the entire pipeline is optimized jointly in a supervised fashion, according to a task-dependent objective. Deep learning models are studied in detail and interpreted in connection to conventional models. The focus of the course is on recent, state of the art methods and large scale applications.

The course is part of master program Research in Computer Science (SIF) of University of Rennes 1.

The following refers to the second iteration of the course in Nov. 2018 - Jan. 2019. Past iterations follow:

2017-18

Instructor

Yannis Avrithis

Discussions

Piazza

Class

Monday and Wednesday
16:15 - 18:15

Evaluation

Oral presentations: 50%
Written exam: 50%

Planning and Syllabus

Event Date Room Description Material*
Lecture 1 Monday
Nov 19
B12D i-58 (44) Introduction
Research field. Neuroscience, computer vision and machine learning background. Modern deep learning. About this course.
[slides]
Lecture 2 Wednesday
Nov 21
B12D i-58 (44) Visual representation
Global/local visual descriptors, dense/sparse representation, local feature detectors. Encoding/pooling, vocabularies, bag-of-words. VLAD*, Fisher vectors*, embeddings*. HMAX*.
[slides]
Lecture 3 Monday
Nov 26
B12D i-58 (44) Local features and spatial matching
Derivatives, scale space and scale selection. Edges, blobs, corners/junctions. Dense optical flow / sparse feature tracking. Wide-baseline matching. Geometric models, RANSAC*, Hough transform*. Fast spatial matching*.
[slides]
Lecture 4 Wednesday
Nov 28
B12D i-58 (44) Codebooks and kernels
Geometry/appearance matching. Bag-of-words. k-means clustering, hierarchical*, approximate*, vocabulary tree*. Soft assignment, max pooling. Match kernels, Hamming embedding, ASMK*. Pyramid matching, spatial pyramids. Hough pyramids*.
[slides]
Lecture 5 Wednesday
Dec 5
B12D i-58 (44) Learning
Binary classification. Perceptron, support vector machines, logistic regression. Gradient descent, regularization, loss functions, unified model. Multi-class classification. Linear regression*, basis functions. Neural networks, activation functions.
[slides]
Lecture 6 Monday
Dec 10
B12D i-58 (44) Differentiation
Stochastic gradient descent. Numerical gradient approximation. Function decomposition, chain rule, analytical gradient computation, back-propagation. Chaining, splitting and sharing. Common forward and backward flow patterns. Dynamic automatic differentiation.
[slides][pynet]
Lecture 7 Wednesday
Dec 12
B12D i-58 (44) Convolution and network architectures
Convolution, cross-correlation, linearity, equivariance, weight sharing. Feature maps, matrix multiplication, 1x1 convolution. Padded, strided, dilated convolution. Pooling and invariance. Convolutional networks: LeNet-5, AlexNet, ZFNet*, VGG, NiN*, GoogLeNet.
[slides]
Milestone Sunday
Dec 16
Oral presentation
Selection of papers to present.
[instructions]
Lecture 8 Monday
Dec 17
B12D i-58 (44) Optimization and deeper architectures
Optimizers: momentum, RMSprop, Adam, second-order*. Initialization: Gaussian matrices, unit variance, orthogonal*, data-dependent*. Normalization: input, batch, layer*, weight*. Deeper networks: residual, identity mappings*, stochastic depth*, densely connected.
[slides]
Lecture 9 Monday
Jan 7
B12D i-58 (44) Object detection
Background: Viola and Jones, DPM, ISM, ESS, object proposals, non-maximum suppression. Two-stage: R-CNN, SPP, fast/faster R-CNN, RPN. Bounding box regression. Part-based: R-FCN, spatial transformers*, deformable convolution. Upsampling*: FCN, feature pyramids. One-stage: OverFeat*, YOLO, SSD*, RetinaNet*, focal loss.
[slides]
Lecture 10 Wednesday
Jan 9
B12D i-58 (44) Retrieval
Local vs. global descriptors for visual search. Pooling from CNN representations: MAC, R-MAC, SPoC*, CroW*. Manifold learning, siamese and triplet architectures. Fine-tuning with constrastive or triplet loss, learning to rank. Graph-based methods, diffusion, unsupervised fine-tuning.
[slides]
Evaluation 1 Wednesday
Jan 16
B12D i-58 (44) Written exam
Evaluation 2 Monday
Jan 21
B12D i-58 (44) Oral presentations [instructions]

*All material licensed CC BY-SA 4.0

Prerequisites


Basic knowledge of Linear Algebra, Calculus, Probabilities, Machine Learning, Signal Processing, Python.

Related courses

Computer vision

Computer Vision, Cornell University (Huttenlocher), 2008.
Computer Vision, University of Washington (Steitz and Szeliski), 2008.
Object Recognition and Scene Understanding, MIT (Torralba), 2008.
Computer Vision, University of Texas (Grauman), 2009.
Computer Vision, New York University (Fergus), 2018.
Computer Vision, University of Illinois (Lazebnik), 2018.
Advances in Computer Vision, MIT (Freeman, Torralba and Isola), 2018.

Deep learning

Convolutional Neural Networks for Visual Recognition, Stanford University (Li), 2018.
Deep Learning: Do-It-Yourself! Hands-on tour to deep learning, ENS Paris (Lelarge et al.), 2018.
Deep Learning, IDIAP (Fleuret), 2018.
Introduction to Deep Learning, University of Illinois (Lazebnik), 2018.
Deep Learning, University of Amsterdam (Gavves), 2018.
Deep Learning, Université Paris-Saclay (Grisel and Ollion), 2018.
Deep Learning, New York University (LeCun), 2017.
Deep Learning in Computer Vision, University of Toronto (Fidler), 2016.
Topics Course on Deep Learning, UC Berkeley (Bruna), 2016.