CS699 - Representation Learning: Theory and Practice

Welcome to the course homepage of CS699 - Representation Learning: Theory and Practice.

The course is designed for PhD students wishing to gain theoretical and applied skills in Inference, Machine Learning, Information Theory, and Deep Learning. The goal of this course is to bring students up-to-speed on skills required for publishing Machine Learning papers in top-tier venues (ICML, NIPS, KDD, CVPR, etc). We will cover the necessary background in Mathematics (Linear Algebra, Matrix Calculus, Information Theory) and Programming (Numpy, Computational Graphs, TensorFlow). Course includes a variety of inter-related theoretical topics including Deep Learning, Graphical Models, Variational Methods, Embeddings, and others. We extensively apply the theoretical concepts on applications (Natural Language Processing, Computer Vision, Graph Theory). The majority of the grade is hands-on: specifically, by implementing programs which require the student to thoroughly understand both the theory and the application domain. Our goal is to show students that most of the applications require a similar set of theoretical skills, which we will teach in the course.

There are no official pre-requisits for the course. The unofficial requirements are good math skills and good programming skills.

Time & Location

Class will be held Mondays and Wednesdays, 10AM-noon, at VKC 102.

Sami's office hours are held Mondays and Wednesdays, 2-3:20 PM, at Basement of Leavey Library (at the open discussion tables).


This course is delivered to you by:

Aram Galstyan
Greg Ver Steeg
Sami Abu-El-Haija

Teaching Assistant

Kyle Reing


We will follow the following grading scheme:
Component% of Grade
Assignment 1 (due Sunday Sept 15)8
Assignment 2 (due Saturday Oct 5)16
Assignment 3 (due Sunday Nov 3)14
Assignment 414
Beyond Assignment14
Test [last class; multiple choice]14
Participation [in-class, piazza]10
All items are to be completed individually, except for the Beyond Assignment, which should be completed in groups.


The purpose of the assignment is give you sufficient experience in deriving mathematical expressions for models, implementing them (in TensorFlow), and understanding the models e.g. through visualizations. All assignments must be completed individually. You can ask your classmates for help, but following rules: Assignment discussions must without a computer (e.g. using whiteboard). No one is allowed to take physical/electronic notes from the discussion. After the discussion, students must take an hour gap before returning to the assignment. The only electronic thing, that can be shared, are links of third-party information (e.g. paper links)

All assignments must be completed on Vocareum. If you have not received access, please contact Sami.

Late Days

Each student has a total of 7 late days for the semester, that can be used on all the assignments combined. If a student uses more than 7 late days, they will get 20% deduction for every additional late day for the assignment they are submitting.

Beyond Assignment

Students are to form groups (of 2-to-4 students each) where they will extend one of the assignments in some direction. The direction is completely optional and there are no guidelines except for: be creative. Some suggestions include: try the same problem on a different dataset (preferrably, from a different domain e.g. NLP if the assignment was on Vision), or extend the model in some novel way which should hopefully improve some metrics of your choice. The instructors will give a list of ideas on each of the assignments. You can choose any of them or come up with your own. We will be sending more information about this during the course.


There is no one golden textbook in Machine Learning or Representation Learning. The reason is obvious: If one starts drafting a textbook, the field would significantly change by the time the writing is done. Nonetheless, we will use the Deep Learning book (though we wont cover the whole thing) and we will supplement the book with published papers and (free) online notes, as linked on the syllabus.
In addition, we are currently developing a supplement handout, to be completed in class, for the Deep Learning material.

Course Outline

The course outline is a moving target. We will be changing the schedule as-we-go. If you have any suggestions (E.g. topics you really like or do not like, even they are not listed on the syllabus), please talk to us, either in-person (preferred) or via email.

  • Reading: Linear Algebra
  • Welcome; Syllabus Overview; Course Goals; Logistics; Overview of Learning Paradigms; Matrix Calculus; Backpropagation Algorithm;
    [by Sami on whiteboard + slides 1 & 2]
  • Reading: DL#3;
  • Probability Theory and Information Theory (part 1: probability mass functions, densities, joints and conditionals. Change-of-variable rule)
    [by Greg on whiteboard + slides].
  • numpy refresher (basics, io, broadcasting, shared memory, advanced indexing, boolean ops, concatenation & stacking); Skipped: come to office hours for help!
    [by Sami on whiteboard + slides].
Sept/2 Labor Day Holiday [no class]
  • Recommended Reading: (DL#5)
  • Probability Theory and Information Theory (part 2).

    [by Greg; slides]
  • Deep Learning light intro. Computation Graphs. Tensors. Transformations. Decision Boundaries, Tensorflow for gradient calculation.
    [by Sami on whiteboard + slides]

Supervised Representation Learning: P(Y|X)
  • Reading (DL#6 except 6.1 and 6.5)
  • Supplement Handout
  • Deep Learning: First example. Intuition: what do the layers learn (NLP application).
  • Geometric and Bayesian Perspectives of Regularization
  • Geometric and Bayesian Interpretations of minimizing Cross Entropy.
[by Sami; on whiteboard]
  • Deep Representation Learning for Computer Vision
  • Euclidean Convolution (DL#9 up to 9.3); Spatial Pooling
  • Dropout
  • Summary of Computer Vision Tasks

[by Sami; on slides]
Sept/16 [by Sami; slides above]
  • Deep Architectures: Residual Networks, Dense Networks, U-Net, Spatial Transformer Networks Normalization, Fully-Convolutional Networks.
  • Summary of last 2 weeks; Deep Networks as Function Approximators.

[by Sami, slides]
Unsupervised (Representation) Learning: P(X)
  • Generative Models
  • Probabilistic Graph Models (PGM)
  • Basic Intro to Statistical Physics.

[by Aram, slides]
  • PGM (continue)
  • Intro to Variational Learning
  • Restricted Boltzmann Machines (RBM);

[by Aram, slides]
  • Representation learning goals
  • Autoencoders
  • Variational Auto Encoders (DL#20)
[by Greg, slides]
[Logistics slides]
  • Rate-Distortion, mutual information, noisy channels
  • Disentanglement.
  • Invariant Representation.
[by Greg, slides]
[Logistic Slide]
  • Desirable Properties of Unsupervised Models
  • Autoregressive Models & Density Estimation
  • Concrete Implementations: WaveNet and PixelCNN
  • Scheduled Sampling

[by Sami, slides]
Review of derivation of Cross Entropy. Deeper dive on the Computational Graphs of:
  • Autoregressive Models (Fully-Visible Sigmoid Belief Networks)
  • Variational Auto Encoders.
Both for Learning and Inference.
[by Sami, on whiteboard]
Oct/14 Deeper dive on the Computational Graphs of:
  • Restricted Boltzmann Machines
  • PixelCNN and WaveNet

[by Sami, on whiteboard]
Guest Speaker: Alessandro Achille
  • Talk Title: Information in the Weights of Deep Neural Networks
  • Abstract We introduce the notion of information contained in the weights of a Deep Neural Network and show that it can be used to control and describe the training process of DNNs, and can explain how properties, such as invariance to nuisance variability and disentanglement, emerge naturally in the learned representation. Through its dynamics, stochastic gradient descent (SGD) implicitly regularizes the information in the weights, which can then be used to bound the generalization error through the PAC-Bayes bound. Moreover, the information in the weights can be used to defined both a topology and an asymmetric distance in the space of tasks, which can then be used to predict the training time and the performance on a new task given a solution to a pre-training task.

    While this information distance models difficulty of transfer in first approximation, we show the existence of non-trivial irreversible dynamics during the initial transient phase of convergence when the network is acquiring information, which makes the approximation fail. This is closely related to critical learning periods in biology, and suggests that studying the initial convergence transient can yield important insight beyond those that can be gleaned from the well-studied asymptotics.
[Superset of taught slides]
Representations for variable-sized data
Oct/21 Introduction to Embedding Learning.
  • Recommender Systems & Similarity matrices.
  • Closed-form Embedding Learning
  • Stochastic Embedding Learning
[by Sami, slides & whiteboard]
  • Language Embeddings (deriving skipgram objective from first principles)
Beyond Sequences and Euclidean Structures: Intro to Machine Learning on Graphs (part 1).
  • Notation & overview of applications.
  • Transition Matrix; Stationary Distribution; Random Walks
[by Sami, on whiteboard]
Oct/28 Language:
  • Explicit Matrix-form decomposition
  • Context Distribution for language.
Intro to ML on Graphs (part 2):
  • Semi-supervision via consistency on related entities
  • Unsupervised Embedding: Matrix Factorization; Auto-encoding the Adjacency Matrix; Modeling Context;
  • Skipgram on Random Walks
  • Learning Context Distributions for Graphs.
[by Sami, on slides]
Sequences & Recurrent Neural Networks (RNNs)
  • Simplest Form
  • Applications
  • Backpropagation through time
  • Extensions
  • Implementations
[by Sami, slides]
Nov/4 Graph Convolution and Graph Neural Networks.
  • Graph Neural Networks
  • Graph Convolutional Networks (GCNs)
  • Graph Pooling
  • Higher-Order Graph Convolution
  • Mapping nodes onto axes
    • Via heuristics
    • Spectral Fourier Basis
[by Sami, on slides and whiteboard]
Application: Computational Graphs for Implementing Language/Graph Embeddings and Graph Convolutional Networks
  • We can collect votes on which ones we should whiteboard.
[by Sami, on whiteboard]
Misc Topics
Nov/11 Guest Speaker: Irving Biederman (Neuroscience Professor @ USC)
  • Talk Title: Shape-Based Object Recognition in Brains and Machines.
  • [Slides]
Guest Speaker: Bryan Perozzi (Senior Research Scientist @ Google AI)
Nov/18 Student Presentations for Beyond Assignments (1 hour)
  • Di, Kexuan, Yuzhong, Zihao: Embedding with Context Attention.
  • Karl Pertsch, Ayush Jain, Youngwoon Lee: Prediction Beyond Pixel Reconstruction.
  • Sarath Shekkizhar: Laplacian regularized Auto Encoders.
  • Iordanis: Towards understanding discrete representations.
Lecture by Greg [slides]
  • Normalizing Flows (part 1)
Student Presentations for Beyond Assignments (1 hour)
  • Elan, Keshav, Mehrnoosh: Random Walk as a Differentiable Operator
  • Chi Zhang, Chen-yu, Kiran Lekkala: Inductive Bias for Video Prediction in Atari Games.
  • Jiaman, Zhengfei, Yuliang, Pengda: How to generate 3D meshes in different complex topology.
  • Avijit: Disentangling Knowledge and Instruction Representations.
  • Yilei, Meryem, Yunhao: Multimodal Representation Learning On Social Media.
Lecture by Greg [slides]
  • Normalizing Flows (part 2)
Nov/25 Student Presentations for Beyond Assignments (30 minutes)
  • Hamidreza, Omid, Brendan: Uncovering geographical latent factors from speech transcripts
  • Nazanin, Pegah: Disentangled Graph VAEs
  • TG, Mozhdeh, Ninareh, Myrl: Invertible Functions
Lecture By Greg [slides]
  • Mutual information estimation with neural nets (GV)

Thanksgiving Holiday. No class
    Student Presentations
    Lightenning Talks on:
  • Meta-Learning
  • Neural Architecture Search
  • Practical: training embeddings using TensorFlow
  • [by Sami, slides]
In class test (multiple choice) on Misc topics; Farewell; Advice for the future.
The End