CMPT 983 (Spring 2022): Special topics in Artificial Intelligence - Grounded Natural Language Understanding

Overview

This course is a graduate-level, seminar-oriented research course covering topics at the intersection of language, vision, graphics, and robotics. The class focuses on the grounding of language to various representations and modalities. Students are expected to have prior experience with deep learning concepts and framework (Pytorch, Tensorflow, etc), and should also have familiarity with at one of the following areas: natural language processing, vision, graphics or robotic.

Each week, students will read papers in a particular area of language grounding, and discuss the contributions, limitations and interconnections between the papers. Students will also work on a research project during the course, culminating in a final presentation and written report. The course aims to provide practical experience in comprehending, analyzing and synthesizing research in grounded natural language understanding.

Note: This course is NOT an introductory course to natural language processing. If you are interested in learning about natural language processing, CMPT 413/713 is offered in the fall.

Background

There are no formal prerequisites for this class. However, you are expected to be familiar with the following:

For some topics that we will cover in the class, it is also helpful to be familiar with:

Topics

Quick info

Syllabus

Below is a tentative outline for the course.

R: Readings, BG: (Optional) Background material / reading for deeper understanding. Provided for reference.

Date Topic Notes
Jan 10 Introduction to grounding & logistics [slides] BG: The Symbol Grounding Problem
BG: Six lessons from babies
V: How language shapes the way we think
Jan 12 How to read papers & project overview [slides] BG: How to read a paper
Jan 17 Review of basic deep learning models [slides] BG: Deep learning
BG: Contextual word representations
Jan 19 Multimodal embeddings [slides] BG: Multimodal Machine Learning
BG: Contrastive learning
Jan 24 Paper discussion 1 R: ViCo: Word Embeddings from Visual Co-occurrences
R: CLIP
Jan 26 Attention for multimodal grounding [slides] BG: Attention? Attention!
Jan 31 Paper discussion 2 R: Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
R: FiLM: Visual Reasoning with a General Conditioning Layer
Feb 2 Vision-and-language pre-training with transformers [slides] BG: Attention Is All You Need
BG: The Illustrated Transformer
Feb 7 Paper discussion 3 R: Vilbert
R: MERLOT: Multimodal Neural Script Knowledge Models
Feb 9 Text conditioned content generation [slides] BG: Generative models
BG: Text to image survey
BG: 3D generative models
Feb 14 Paper discussion 4 Project proposal due
R: Cross-Modal Contrastive Learning for Text-to-Image Generation
R: GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Feb 16 Compositional grounding and structured representations [slides] BG: Linguistic generalization and compositionality in modern artificial neural networks
BG: Relational inductive biases, deep learning, and graph networks
Feb 21 No class - Reading break  
Feb 23 No class - Reading break  
Feb 28 Paper discussion 5 R: Learning to compose neural networks for question answering
R: Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning
Mar 2 Semantic parsing for grounding [slides] BG: Semantic parsers
BG: Language to Logical Form with Neural Attention
Mar 7 Paper discussion 6 R: Grounded Compositional Semantics For Finding And Describing Images With Sentences
R: Neural Event Semantics for Grounded Language Understanding
Mar 9 Instruction following (intro to RL) [slides] BG: Experience Grounds Language
BG: Extending machine language models toward human-level language understanding
BG: Deep RL
BG: A (Long) Peek into Reinforcement Learning
Mar 14 Paper discussion 7 Project milestone due
R: Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
R: ELLA: Exploration through Learned Language Abstraction
Mar 16 Instruction following (visual language navigation) [slides] BG: Visual language Navigation
Mar 21 Paper discussion 8 R: Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding
R: REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
Mar 23 Instruction following (rearrangement) [slides] BG: Rearrangement
Mar 28 Paper discussion 9 R: PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World
R: A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution
Mar 30 Speaker-listener models [slides] BG: Rational Speech Acts
Apr 4 Paper discussion 10 R: CLIPORT: What and Where Pathways for Robotic Manipulation
Apr 6 Interactive language learning [slides] BG: Power to the people
Apr 11 Project presentations and conclusion [slides] Project writeup due

Grading

General policies

Academic integrity

SFU’s Academic Integrity web site is filled with information on what is meant by academic dishonesty, where you can find resources to help with your studies and the consequences of cheating. Check out the site for more information and videos that help explain the issues in plain English.

Each student is responsible for his or her conduct as it affects the University community. Academic dishonesty, in whatever form, is ultimately destructive of the values of the University. Furthermore, it is unfair and discouraging to the majority of students who pursue their studies honestly. Scholarly integrity is required of all members of the University. Please refer to this web site.