This course studies learning visual representations for common computer vision tasks including matching, retrieval, classification, and object detection. The course discusses well-known methods from low-level description to intermediate representation, and their dependence on the end task. It then studies a data-driven approach where the entire pipeline is optimized jointly in a supervised fashion, according to a task-dependent objective. Deep learning models are studied in detail and interpreted in connection to conventional models. The focus of the course is on recent, state of the art methods and large scale applications.
The course is part of master program Research in Computer Science (SIF) of University of Rennes 1.
The following refers to the second iteration of the course in Nov. 2018 - Jan. 2019. Past iterations follow:
Event | Date | Room | Description | Material^{*} |
---|---|---|---|---|
Lecture 1 | Monday Nov 19 |
B12D i-58 (44) |
Introduction Research field. Neuroscience, computer vision and machine learning background. Modern deep learning. About this course. |
[slides] |
Lecture 2 | Wednesday Nov 21 |
B12D i-58 (44) |
Visual representation Global/local visual descriptors, dense/sparse representation, local feature detectors. Encoding/pooling, vocabularies, bag-of-words. VLAD*, Fisher vectors*, embeddings*. HMAX*. |
[slides] |
Lecture 3 | Monday Nov 26 |
B12D i-58 (44) |
Local features and spatial matching Derivatives, scale space and scale selection. Edges, blobs, corners/junctions. Dense optical flow / sparse feature tracking. Wide-baseline matching. Geometric models, RANSAC*, Hough transform*. Fast spatial matching*. |
[slides] |
Lecture 4 | Wednesday Nov 28 |
B12D i-58 (44) |
Codebooks and kernels Geometry/appearance matching. Bag-of-words. k-means clustering, hierarchical*, approximate*, vocabulary tree*. Soft assignment, max pooling. Match kernels, Hamming embedding, ASMK*. Pyramid matching, spatial pyramids. Hough pyramids*. |
[slides] |
Lecture 5 | Wednesday Dec 5 |
B12D i-58 (44) |
Learning Binary classification. Perceptron, support vector machines, logistic regression. Gradient descent, regularization, loss functions, unified model. Multi-class classification. Linear regression*, basis functions. Neural networks, activation functions. |
[slides] |
Lecture 6 | Monday Dec 10 |
B12D i-58 (44) |
Differentiation Stochastic gradient descent. Numerical gradient approximation. Function decomposition, chain rule, analytical gradient computation, back-propagation. Chaining, splitting and sharing. Common forward and backward flow patterns. Dynamic automatic differentiation. |
[slides][pynet] |
Lecture 7 | Wednesday Dec 12 |
B12D i-58 (44) |
Convolution and network architectures Convolution, cross-correlation, linearity, equivariance, weight sharing. Feature maps, matrix multiplication, 1x1 convolution. Padded, strided, dilated convolution. Pooling and invariance. Convolutional networks: LeNet-5, AlexNet, ZFNet*, VGG, NiN*, GoogLeNet. |
[slides] |
Milestone | Sunday Dec 16 |
Oral presentation Selection of papers to present. |
[instructions] | |
Lecture 8 | Monday Dec 17 |
B12D i-58 (44) |
Optimization and deeper architectures Optimizers: momentum, RMSprop, Adam, second-order*. Initialization: Gaussian matrices, unit variance, orthogonal*, data-dependent*. Normalization: input, batch, layer*, weight*. Deeper networks: residual, identity mappings*, stochastic depth*, densely connected. |
[slides] |
Lecture 9 | Monday Jan 7 |
B12D i-58 (44) |
Object detection Background: Viola and Jones, DPM, ISM, ESS, object proposals, non-maximum suppression. Two-stage: R-CNN, SPP, fast/faster R-CNN, RPN. Bounding box regression. Part-based: R-FCN, spatial transformers*, deformable convolution. Upsampling*: FCN, feature pyramids. One-stage: OverFeat*, YOLO, SSD*, RetinaNet*, focal loss. |
[slides] |
Lecture 10 | Wednesday Jan 9 |
B12D i-58 (44) |
Retrieval Local vs. global descriptors for visual search. Pooling from CNN representations: MAC, R-MAC, SPoC*, CroW*. Manifold learning, siamese and triplet architectures. Fine-tuning with constrastive or triplet loss, learning to rank. Graph-based methods, diffusion, unsupervised fine-tuning. |
[slides] |
Evaluation 1 | Wednesday Jan 16 |
B12D i-58 (44) | Written exam | |
Evaluation 2 | Monday Jan 21 |
B12D i-58 (44) | Oral presentations | [instructions] |