CS699  Representation Learning: Theory and Practice
Welcome to the course homepage of CS699  Representation Learning: Theory and Practice.
The course is designed for PhD students wishing to gain theoretical and applied
skills in Inference, Machine Learning, Information Theory, and Deep Learning.
The goal of this course is to bring students uptospeed on skills required for
publishing Machine Learning papers in toptier venues (ICML, NIPS, KDD, CVPR, etc).
We will cover the necessary background in Mathematics (Linear Algebra, Matrix Calculus,
Information Theory) and Programming (Numpy, Computational Graphs, TensorFlow).
Course includes a variety of interrelated theoretical topics including Deep Learning,
Graphical Models, Variational Methods, Embeddings, and others.
We extensively apply the theoretical concepts on applications (Natural
Language Processing, Computer Vision, Graph Theory). The majority of the grade
is handson: specifically, by implementing programs which require the student to
thoroughly understand both the theory and the application domain. Our goal is to
show students that most of the applications require a similar set of theoretical
skills, which we will teach in the course.
There are no official prerequisits for the course. The unofficial requirements
are good math skills and good programming skills.
Time & Location
Class will be held Mondays and Wednesdays, 10AMnoon, at VKC 102.
Sami's office hours are held Mondays and Wednesdays, 23:20 PM, at Basement of Leavey Library (at the open discussion tables).
Instructors
This course is delivered to you by:
Teaching Assistant
Grading
We will follow the following grading scheme:
Component  % of Grade 
Assignment 1 (due Sunday Sept 15)  8 
Assignment 2 (due Saturday Oct 5)  16 
Assignment 3 (due Sunday Nov 3)  14 
Assignment 4  14 
Beyond Assignment  14 
Test [last class; multiple choice]  14 
Participation [inclass, piazza]  10 
All items are to be completed individually, except for the
Beyond Assignment, which should be completed in groups.
Assignments
The purpose of the assignment is give you sufficient experience in deriving
mathematical expressions for models, implementing them (in TensorFlow), and
understanding the models e.g. through visualizations. All assignments must be
completed individually. You can ask your classmates for help, but following rules:
Assignment discussions must without a computer (e.g. using whiteboard). No
one is allowed to take physical/electronic notes from the discussion. After the discussion,
students must take an hour gap before returning to the assignment. The only electronic thing,
that can be shared, are links of thirdparty information (e.g. paper links)
All assignments must be completed on Vocareum. If you have not received access,
please contact Sami.
Late Days
Each student has a total of 7 late days for the semester, that can be used on
all the assignments combined. If a student uses more than 7 late days, they will
get 20% deduction for every additional late day for the assignment they are
submitting.
Beyond Assignment
Students are to form groups (of 2to4 students each) where they will extend one
of the assignments in some direction. The direction is completely optional and
there are no guidelines except for: be creative. Some suggestions include: try
the same problem on a different dataset (preferrably, from a different domain
e.g. NLP if the assignment was on Vision), or extend the model in some novel way
which should hopefully improve some metrics of your choice. The instructors will
give a list of ideas on each of the assignments. You can choose any of them or
come up with your own.
We will be sending more information about this during the course.
Textbook
There is no one golden textbook in Machine Learning or Representation Learning.
The reason is obvious: If one starts drafting a textbook, the field would
significantly change by the time the writing is done. Nonetheless, we will use
the Deep Learning book (though we wont cover the whole thing) and we will
supplement the book with published papers and (free) online notes, as linked on
the syllabus.
In addition, we are currently developing a supplement handout, to be completed in class,
for the Deep Learning material.
Course Outline
The course outline is a moving target. We will be changing the schedule aswego.
If you have any suggestions (E.g. topics you really like or do not like, even
they are not listed on the syllabus), please talk to us, either inperson (preferred)
or via email.
Week  Monday  Wednesday 
Refresher

Aug/26


Reading: Linear Algebra

Welcome; Syllabus Overview; Course Goals; Logistics; Overview of Learning Paradigms;
Matrix Calculus; Backpropagation Algorithm;
[by Sami on whiteboard + slides 1 & 2]


Reading: DL#3;

Probability Theory and Information Theory (part 1: probability mass functions, densities, joints and conditionals. Changeofvariable rule)
[by Greg on whiteboard + slides].

numpy refresher (basics, io, broadcasting, shared memory, advanced indexing, boolean ops, concatenation & stacking);
Skipped: come to office hours for help!
[by Sami on whiteboard + slides].

Sept/2 
Labor Day Holiday [no class]


Recommended Reading: (DL#5)

Probability Theory and Information Theory (part 2).
[by Greg; slides]

Deep Learning light intro. Computation Graphs. Tensors. Transformations. Decision Boundaries,
Tensorflow for gradient calculation.
[by Sami on whiteboard + slides]

Supervised Representation Learning: P(YX)

Sept/9 

Reading (DL#6 except 6.1 and 6.5)

Supplement Handout

Deep Learning: First example. Intuition: what do the layers learn (NLP application).

Geometric and Bayesian Perspectives of Regularization

Geometric and Bayesian Interpretations of minimizing Cross Entropy.
[by Sami; on whiteboard]


Deep Representation Learning for Computer Vision

Euclidean Convolution (DL#9 up to 9.3);
Spatial Pooling
 Dropout

Summary of Computer Vision Tasks
[by Sami; on slides]

Sept/16 
[by Sami; slides above]


Deep Architectures:
Residual Networks, Dense Networks, UNet, Spatial Transformer Networks
Normalization, FullyConvolutional Networks.

Summary of last 2 weeks; Deep Networks as Function Approximators.
[by Sami, slides]

Unsupervised (Representation) Learning: P(X)

Sept/23 

Generative Models

Probabilistic Graph Models (PGM)

Basic Intro to Statistical Physics.
[by Aram, slides]


PGM (continue)

Intro to Variational Learning

Restricted Boltzmann Machines (RBM);
[by Aram, slides]

Sept/30 
 Representation learning goals

Autoencoders

Variational Auto Encoders (DL#20)
[by Greg, slides]
[Logistics slides]

 RateDistortion, mutual information, noisy channels

Disentanglement.

Invariant Representation.
[by Greg, slides]
[Logistic Slide]

Oct/7 
 Desirable Properties of Unsupervised Models
 Autoregressive Models & Density Estimation
 Concrete Implementations: WaveNet and PixelCNN
 Scheduled Sampling
[by Sami, slides]

Review of derivation of Cross Entropy. Deeper dive on the Computational Graphs of:
 Autoregressive Models (FullyVisible Sigmoid Belief Networks)
 Variational Auto Encoders.
Both for Learning and Inference.
[by Sami, on whiteboard]

Oct/14 
Deeper dive on the Computational Graphs of:
 Restricted Boltzmann Machines
 PixelCNN and WaveNet
[by Sami, on whiteboard]

Guest Speaker: Alessandro Achille
 Talk Title: Information in the Weights of Deep Neural Networks

Abstract
We introduce the notion of information contained in the weights of a Deep Neural Network and show that it can be used to control and describe the training process of DNNs, and can explain how properties, such as invariance to nuisance variability and disentanglement, emerge naturally in the learned representation. Through its dynamics, stochastic gradient descent (SGD) implicitly regularizes the information in the weights, which can then be used to bound the generalization error through the PACBayes bound. Moreover, the information in the weights can be used to defined both a topology and an asymmetric distance in the space of tasks, which can then be used to predict the training time and the performance on a new task given a solution to a pretraining task.
While this information distance models difficulty of transfer in first approximation, we show the existence of nontrivial irreversible dynamics during the initial transient phase of convergence when the network is acquiring information, which makes the approximation fail. This is closely related to critical learning periods in biology, and suggests that studying the initial convergence transient can yield important insight beyond those that can be gleaned from the wellstudied asymptotics.
[Superset of taught slides]

Representations for variablesized data

Oct/21 
Introduction to Embedding Learning.
 Recommender Systems & Similarity matrices.
 Closedform Embedding Learning
 Stochastic Embedding Learning
[by Sami, slides & whiteboard]

 Language Embeddings (deriving skipgram objective from first principles)
Beyond Sequences and Euclidean Structures: Intro to Machine Learning on Graphs (part 1).
 Notation & overview of applications.
 Transition Matrix; Stationary Distribution; Random Walks
[by Sami, on whiteboard]

Oct/28 
Language:
 Explicit Matrixform decomposition
 Context Distribution for language.
Intro to ML on Graphs (part 2):
 Semisupervision via consistency on related entities
 Unsupervised Embedding: Matrix Factorization; Autoencoding the Adjacency Matrix; Modeling Context;
 Skipgram on Random Walks
 Learning Context Distributions for Graphs.
[by Sami, on slides]

Sequences & Recurrent Neural Networks (RNNs)
 Simplest Form
 Applications
 Backpropagation through time
 Extensions
 Implementations
[by Sami, slides]

Nov/4 
Graph Convolution and Graph Neural Networks.
 Graph Neural Networks
 Graph Convolutional Networks (GCNs)
 Graph Pooling
 HigherOrder Graph Convolution
 Mapping nodes onto axes
 Via heuristics
 Spectral Fourier Basis
[by Sami, on slides and whiteboard]

Application: Computational Graphs for Implementing Language/Graph Embeddings and Graph Convolutional Networks
 We can collect votes on which ones we should whiteboard.
[by Sami, on whiteboard]

Misc Topics

Nov/11 
Guest Speaker: Irving Biederman (Neuroscience Professor @ USC)
 Talk Title: ShapeBased Object Recognition in Brains and Machines.
 [Slides]

Guest Speaker: Bryan Perozzi (Senior Research Scientist @ Google AI)

Nov/18 
Student Presentations for Beyond Assignments (1 hour)
 Di, Kexuan, Yuzhong, Zihao: Embedding with Context Attention.
 Karl Pertsch, Ayush Jain, Youngwoon Lee: Prediction Beyond Pixel Reconstruction.
 Sarath Shekkizhar: Laplacian regularized Auto Encoders.
 Iordanis: Towards understanding discrete representations.
Lecture by Greg [slides]
 Normalizing Flows (part 1)

Student Presentations for Beyond Assignments (1 hour)
 Elan, Keshav, Mehrnoosh: Random Walk as a Differentiable Operator
 Chi Zhang, Chenyu, Kiran Lekkala: Inductive Bias for Video Prediction in Atari Games.
 Jiaman, Zhengfei, Yuliang, Pengda: How to generate 3D meshes in different complex topology.
 Avijit: Disentangling Knowledge and Instruction Representations.
 Yilei, Meryem, Yunhao: Multimodal Representation Learning On Social Media.
Lecture by Greg [slides]
 Normalizing Flows (part 2)

Nov/25 
Student Presentations for Beyond Assignments (30 minutes)
 Hamidreza, Omid, Brendan: Uncovering geographical latent factors from speech transcripts
 Nazanin, Pegah: Disentangled Graph VAEs
 TG, Mozhdeh, Ninareh, Myrl: Invertible Functions
Lecture By Greg [slides]
 Mutual information estimation with neural nets (GV)

Thanksgiving Holiday. No class

Dec/2 
Student Presentations
Lightenning Talks on:
 MetaLearning
 Neural Architecture Search
 Practical: training embeddings using TensorFlow
[by Sami, slides]

In class test (multiple choice) on Misc topics; Farewell; Advice for the future.

The End
