Syllabus
The syllabus is preliminary and subject to change. Lecture notes are from prior years and are subject to change. Updated lecture notes will be posted after the lecture.
Lecture notes
Readings (=optional)
Links (=optional)
Lecture notes
Links (=optional)
Lecture notes
Readings (=optional)
Links (=optional)
Lecture notes
Readings (=optional)
Lecture notes
Readings (=optional)
Links (=optional)
Lecture notes
Readings (=optional)
Lecture notes
Readings (=optional)
Links (=optional)
Lecture notes
Readings (=optional)
Links (=optional)
Lecture notes
Readings (=optional)
Lecture notes
Links (=optional)
Lecture notes
Readings (=optional)
Links (=optional)
-
Semi-supervised Sequence Learning.
Andrew M. Dai, Quoc V. Le.
-
Deep contextualized word representations.
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer.
-
RoBERTa: A Robustly Optimized BERT Pretraining Approach.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
-
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context.
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
-
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators.
Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
-
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
-
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
Lecture notes
Links (=optional)
Lecture notes
Readings (=optional)
-
HuggingFace PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware.
Sourab Mangrulkar, Sayak Paul".
-
Prefix-Tuning: Optimizing Continuous Prompts for Generation.
Xiang Lisa Li, Percy Liang.
-
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.
Yaqing Wang, Sahaj Agarwal, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao.
-
LoRA: Low-Rank Adaptation of Large Language Models.
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen.
-
Adapter methods.
docs.adapterhub.ml.
Links (=optional)
-
AdapterHub: A Framework for Adapting Transformers.
Jonas Pfeiffer, Andreas Rücklé, Clifton Poth, Aishwarya Kamath, Ivan Vulić, Sebastian Ruder, Kyunghyun Cho, Iryna Gurevych.
-
Parameter-Efficient Transfer Learning for NLP.
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly.
-
Simple, Scalable Adaptation for Neural Machine Translation.
Ankur Bapna, Naveen Arivazhagan, Orhan Firat.
-
AdapterFusion: Non-Destructive Task Composition for Transfer Learning.
Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, Kyunghyun Cho, Iryna Gurevych.
-
Parameter-Efficient Tuning with Special Token Adaptation.
Xiaocong Yang, James Y. Huang, Wenxuan Zhou, Muhao Chen.
Lecture notes
Readings (=optional)
-
Model Alignment, Prompting, and In-Context Learning (J+M chapter 12.1, 12.2).
Dan Jurafsky and James H. Martin.
-
Language Models are Unsupervised Multitask Learners.
Open AI.
-
Language Models are Few-Shot Learners.
Open AI.
-
GPT-4 Technical Report.
Open AI.
-
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?.
Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer.
-
In-Context Learning Learns Label Relationships but Is Not Conventional Learning.
Jannik Kossen, Yarin Gal, Tom Rainforth.
Links (=optional)
-
In-context Examples Selection for Machine Translation.
Sweta Agrawal, Chunting Zhou, Mike Lewis, Luke Zettlemoyer, Marjan Ghazvininejad.
-
How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation.
Amr Hendy, Mohamed Abdelrehim, Amr Sharaf, Vikas Raunak, Mohamed Gabr, Hitokazu Matsushita, Young Jin Kim, Mohamed Afify, Hany Hassan Awadalla.
Lecture notes
Links (=optional)
-
Model Alignment, Prompting, and In-Context Learning (J+M chapter 12.3).
Dan Jurafsky and James H. Martin.
-
Scaling Instruction-Finetuned Language Models.
Google.
-
Proximal Policy Optimization Algorithms.
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov.
-
LIMA: Less Is More for Alignment.
Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy.
Lecture notes
Links (=optional)
Lecture notes
Readings (=optional)
Links (=optional)
Lecture notes
Readings (=optional)
Lecture notes
Readings (=optional)
Lecture notes
Readings (=optional)
Lecture notes
Readings (=optional)