SFU NLP class: Syllabus

Georges Artsrouni's mechanical brain, a translation device patented in 1933 in France.

Syllabus

The syllabus is preliminary and subject to change. Lecture notes are from prior years and are subject to change. Updated lecture notes will be posted after the lecture.

Course Introduction

Lecture notes

Readings (=optional)

I'm sorry Dave, I'm afraid I can't do that. Lillian Lee.
Talking to Computers in Natural Language. Percy Liang.
Advances in Natural Language Processing. Julia Hirschberg and Christopher D. Manning.

Links (=optional)

Speech Recognition Breakthrough for the Spoken, Translated Word by Rick Rashid. Rick Rashid.
IBM Watson competes in Jeopardy!.
Speech Recognition Breakthrough for the Spoken, Translated Word by Rick Rashid. Rick Rashid.
NLP and Text Visualization. Anoop Sarkar.
NLP Highlights podcast. Various Allen Institute AI2 Researchers.

Probability models and language (reviewed in tutorial)

Lecture notes

Links (=optional)

Python tidbits for NLP.
Natural Language Corpus Data: Beautiful Data. Peter Norvig.
A Mathematical Theory of Communication. Claude Shannon.
Prediction and Entropy of Printed English. Claude Shannon.

Language models

Text classification

Lecture notes

Readings (=optional)

Naive Bayes and Sentiment Classification (J+M chapter 4). Dan Jurafsky and James H. Martin.
Logistic Regression (J+M chapter 5). Dan Jurafsky and James H. Martin.
Log-linear models, MEMMs and CRFs. Michael Collins.

Feedforward neural networks

Lecture notes

Readings (=optional)

Neural Networks and Neural Language Models (J+M chapter 7). Dan Jurafsky and James H. Martin.
Feedforward neural networks. Michael Collins.
A Primer on Neural Network Models for Natural Language Processing. Yoav Goldberg.

Links (=optional)

A Review of the Neural History of Natural Language Processing. Sebastian Ruder.
Natural Language Understanding with Distributed Representation. Kyunghyun Cho.
Efficient Backprop. LeCun, Bottou, Orr and Muller.
What Every Computer Scientist Should Know About Floating-Point Arithmetic. D. Goldberg.
Gradient checks and parameter updates. Andrej Karpathy.
Computational Graphs, and Backpropagation. Michael Collins.

Word Vectors

Lecture notes

Readings (=optional)

Vector Semantics and Embeddings (J+M chapter 6). Dan Jurafsky and James H. Martin.
Improving Distributional Similarity with Lessons Learned from Word Embeddings. Omer Levy, Yoav Goldberg, and Ido Dagan.

Sequence Models (Hidden Markov Models)

Lecture notes

Readings (=optional)

Hidden Markov Models (J+M Appendix A.1-A.4). Dan Jurafsky and James H. Martin.

Links (=optional)

Introduction to Hidden Markov Models. Anoop Sarkar.
N-grams versus HMMs. Anoop Sarkar.
Parsing with HMMs. Anoop Sarkar.
Viterbi algorithm for HMMs. Anoop Sarkar.
Language models and HMMs. Anoop Sarkar.
Supervised learning of HMMs. Anoop Sarkar.
Lagrange multipliers for HMM parameter updates. Anoop Sarkar.
Hidden Markov Models. Michael Collins.
Forward-backward algorithm. Michael Collins.

Sequence Models (RNNs)

Lecture notes

Recurrent Neural Networks
LSTMs and GRUs
RNN based Neural Language model (experiments) (Tomas Mikolov)
Python Notebook: RNN Language model

Readings (=optional)

RNNs and LSTMs (J+M chapter 8). Dan Jurafsky and James H. Martin.
Understanding LSTM Networks. Christopher Olah.

Links (=optional)

How to implement a recurrent neural network. peterroelants.github.io.
Statistical language models based on neural networks. Tomas Mikolov.

Sequence to Sequence models and Neural Machine Translation

Lecture notes

Sequence to sequence models

Transformers

Lecture notes

Transformers

Readings (=optional)

Transformers (J+M chapter 9). Dan Jurafsky and James H. Martin.
Attention Is All You Need. Vaswani et al. 2017.
The Annotated Transformer. Alexander Rush.

Links (=optional)

Illustrated Transformer. Jay Alammar.

Contextualized Word Embeddings

Lecture notes

Contextualized Word Embeddings

Readings (=optional)

Masked Language Models (J+M chapter 11). Dan Jurafsky and James H. Martin.
Illustrated BERT. Jay Alammar.
Contextual Word Representations: A Contextual Introduction. Noah A. Smith.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Devlin et al. 2018.
Emergent linguistic structure in artificial neural networks trained by self-supervision. Manning et al. 2020.
A Primer in BERTology: What We Know About How BERT Works. Rogers, Kovaleva, and Rumshisky.

Tokenization

Lecture notes

Links (=optional)

BPE tutorial at Huggingface. Huggingface.

Pre-training Language Models

Lecture notes

Lecture notes
Improving language understanding with unsupervised learning (Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever)

Readings (=optional)

Large Language Models (J+M chapter 10). Dan Jurafsky and James H. Martin.

Links (=optional)

Semi-supervised Sequence Learning. Andrew M. Dai, Quoc V. Le.
Deep contextualized word representations. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer.
RoBERTa: A Robustly Optimized BERT Pretraining Approach. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.

Decoding

Lecture notes

Lecture notes
Intro to Generation with LLMs (Huggingface)
Generation Strategies (Huggingface)
Contrastive Search (Tian Lan)
Categorical Reparameterization with Gumbel-Softmax (Eric Jang, Shixiang Gu, Ben Poole)

Links (=optional)

A Contrastive Framework for Neural Text Generation. Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier.

Parameter-efficient Fine Tuning

Lecture notes

Lecture notes

Readings (=optional)

HuggingFace PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware. Sourab Mangrulkar, Sayak Paul".
Prefix-Tuning: Optimizing Continuous Prompts for Generation. Xiang Lisa Li, Percy Liang.
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning. Yaqing Wang, Sahaj Agarwal, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao.
LoRA: Low-Rank Adaptation of Large Language Models. Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen.
Adapter methods. docs.adapterhub.ml.

Links (=optional)

AdapterHub: A Framework for Adapting Transformers. Jonas Pfeiffer, Andreas Rücklé, Clifton Poth, Aishwarya Kamath, Ivan Vulić, Sebastian Ruder, Kyunghyun Cho, Iryna Gurevych.
Parameter-Efficient Transfer Learning for NLP. Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly.
Simple, Scalable Adaptation for Neural Machine Translation. Ankur Bapna, Naveen Arivazhagan, Orhan Firat.
AdapterFusion: Non-Destructive Task Composition for Transfer Learning. Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, Kyunghyun Cho, Iryna Gurevych.
Parameter-Efficient Tuning with Special Token Adaptation. Xiaocong Yang, James Y. Huang, Wenxuan Zhou, Muhao Chen.

Few-shot and in-context learning

Lecture notes

Lecture notes

Readings (=optional)

Model Alignment, Prompting, and In-Context Learning (J+M chapter 12.1, 12.2). Dan Jurafsky and James H. Martin.
Language Models are Unsupervised Multitask Learners. Open AI.
Language Models are Few-Shot Learners. Open AI.
GPT-4 Technical Report. Open AI.
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?. Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer.
In-Context Learning Learns Label Relationships but Is Not Conventional Learning. Jannik Kossen, Yarin Gal, Tom Rainforth.

Links (=optional)

In-context Examples Selection for Machine Translation. Sweta Agrawal, Chunting Zhou, Mike Lewis, Luke Zettlemoyer, Marjan Ghazvininejad.
How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation. Amr Hendy, Mohamed Abdelrehim, Amr Sharaf, Vikas Raunak, Mohamed Gabr, Hitokazu Matsushita, Young Jin Kim, Mohamed Afify, Hany Hassan Awadalla.