Skip to main content

Final Project

Start by Feb 15, 2024 or earlier | Due on Apr 11, 2024

The class project is an opportunity for you to apply your newly acquired skills in NLP towards an in-depth application and/or research problem. Your project should aim to answer a scientific question and provide some kind of scientific knowledge gain similar to typical NLP research papers.

Project Ideas

You should pick something you are passionate about or something you find interesting. When selecting a project, you should make sure that you can find an well-defined dataset that you can use for your project.

Possible projects types include:

  • Re-implementing / rereproducing a recent paper
  • Applying an existing neural model to a new task
  • Implementing a complex neural architecture
  • Proposing a new neural model or a new variation of an existing model
  • Proposing a new training, optimization, or evaluation scheme
  • Experimental and/or theoretical analysis of a NLP model

For interesting project ideas, consider reading papers at top publishing venues for

You can also checkout the titles of group projects from past terms or final projects from Stanford CS224n for inspiration.

Resources

A list of shared task datasets are provided below. In some cases you can also extend your homework code to produce innovative project ideas for these tasks.

Shared Task Collections

CoNLL Shared Tasks

SemEval Shared Tasks

Classification Tasks

Information Extraction

Parsing

Machine Translation

Unlabeled Data for Clustering, Language Models, etc.

Sentiment and Opinion Mining

Natural Language Understanding and Inference

Question Answering

Project Submission

Project title and abstract (due on Feb 22, 2024)

For your project, please submit a Title and an Abstract that describes what topic/problem your group will work on, the scope of the project and the data you plan to use. The title/abstract is not graded, but you will receive feedback on the feasibility of the project.

To submit, go to Coursys. Under the Project Abstract activity, enter your Title and Abstract. Your abstract should be about 250 words (please definitely use less than 1000 words).

Project proposal (due on Mar 4, 2024)

For your project proposal please submit a PDF file that describes what problem you plan to work on, what data you will use, and the basic approach you plan to take.

The project proposal PDF should be 1-2 pages in the style of a conference (e.g. ACL/EMNLP) submission. Your project milestone and final report should use the same template. Links to acceptable templates are below:

Make sure the following points are in your proposal.

  • Problem Statement and Motivation
    • which NLP task do you plan to do;
    • which aspect of the problem / task did your group plan to work on (accuracy, interpretability, etc.);
    • provide some reasons for your choice
  • Related Work
    • Briefly describe existing related work (with citations) and how your work relates to it. Will your project be a re-implementation of a paper? Will your project be an attempt to replicate a set of experiments / findings in a paper? Will you be attempting to try something that wasn’t previously tried?
  • Approach
    • Describe the algorithms and machine learning models you plan to use in your project.
    • Using equations is not necessary but if you do, use a clear mathematical style to explain your model(s).
  • Experimental setup
    • Data: which datasets you will use? If you plan on collecting your own data, describe what data collection protocol you will follow.
    • Implementation: How will you implement your model? What existing code you will exploit, what will you implement yourself?
    • Evaluation: What evaluation metric will you use?
    • Comparison: What different baselines and/or variation of your approach will you compare?
  • Timeline and work breakdown
    • What do you plan to achieve by the milestone?
    • How will the work be allocated between the team members?
  • Reference - provide references using BibTex

Please upload your proposal.pdf to Gradescope Project Proposal.

Project milestone (due on Mar 21, 2024)

For your project milestone please submit a PDF file that describes progress you made so far on your project and plans for the remainder. For the final write-up you will need to also submit your code, so we encourage to get started early and submit preliminary code and results with the milestone.

The project milestone PDF should be 3-6 pages in the style of a conference (e.g. ACL/EMNLP) submission. Your final report should use the same template. Links to acceptable templates are below:

Make sure the following points are covered.

  • Progress
    • What have you achieved so far?
    • Are there any issues you encountered?
  • Plans
    • What are the remaining steps?
    • Do you have any plans for addressing challenges or issues?
  • Data
    • What did you find out about the data so far?
  • Results
    • Summary of preliminary results
    • Remaining results that you plan to produce

A good structure for the milestone report will include the following sections. It can then also serve as a draft for your final project write-up.

  • Introduction
    • Motivate the problem, describe your goals, and highlight your findings (if you have findings)
  • Related Work
    • Briefly describe existing related work (with citations) and how your work relates to it
  • Approach
    • Provide details on your main approach and baselines. Be specific. Make clear what part is original, what code you are writing yourself, what code you are using that is taken from elsewhere (homework, github, etc)
  • Experiments
    • Describe the dataset (provide some information about the data such as statistics and analysis).
    • Describe the evaluation metrics you will be using.
    • Describe what experiments you plan to run and/or any results you have so far.
    • Also provide training details, training times, etc. (if you have preliminary experiements)
  • Future Work
    • What is your plan for the rest of the project?
  • Reference - provide references using BibTex

Grading of the milestone will be based on the progress and the quality of writing.

Please upload your milestone.pdf to Gradescope Project Milestone.

Optionally, you can go to Coursys. Under the Project Milestone activity upload source.zip and output.zip for your project. These will not be graded.

Project Write-up (due on Apr 11, 2024)

You must submit your project write-up as a PDF document. In addition you must submit a Python notebook project.ipynb and your source code for your project in your Github repository:

git@csil-git1.cs.surrey.sfu.ca:USER/nlpclass-1241-g-GROUP.git

Put all your project files into the directory project in your Github repository.

Make sure you have a requirements.txt file for your project so that we can use a virtual environment to run your code.

Your Python notebook must be called project.ipynb.

In addition to writing code for a good project submission, the description of what you did for your project as a PDF report is also a very important part of your project submission. The PDF writeup should have the following sections and must cover the following information:

  • Introduction
    • Which aspect of the problem / task did your group choose to improve and reasons for your choice.
  • Approach
    • Describe the algorithms and machine learning models used in your project. Use clear mathematical notation and diagrams to explain your model(s).
  • Experiments
    • Data - What data was used? how was it used? Is there anything interesting about the data?
    • Implementation - What code did you use? Did you implement everything by yourself? If you used homework code, which homework code you used in your project. Provide exactly which code was used in your project not written by your group (e.g. use of an aligner from an open-source project).
    • Evaluation: Describe what kind of evaluation you are doing, what metrics you are using
    • Methods being compared: which methods you are comparing against each other?
    • Results: Include a detailed comparison of different method and analysis of the results. Did you improve over the baseline. Why or why not?
  • Conclusion
    • What did you learn from your experiments?
    • What could be fixed in your approach.
    • What you did not have time to finish, but you think would be a useful addition to your project.
  • References
    • Provide references using BibTex

Please read this guide to presenting your work. Also available is a video tutorial covering the same material.

Submit your project on Coursys

Please upload your report.pdf to Gradescope Final Project Report.

To submit your code and output, go to Coursys. Under the Final Project Report activity submit the following files:

  • report.pdf: this is the final project report
  • source.zip: this zip file should contain your iPython notebook and only the source code you have written (along with a requirements.txt for a virtualenv). Do not include any data files in this zipfile. Please also include a README.username file as you have done for all your homeworks in this zip file.
  • output.zip: output of your project implementation on a dataset. please include the evaluation code and references to allow us to check the evaluation you present in your write-up. Note this should only be your output on the test data file of some dataset plus any evaluation code and clear instructions on how to run the evaluation script. Do not include large models or data in this zip file. If you have large models or data that you need to share with us, please put them in a networked file storage (e.g. SFU vault) and provide a link and/or instructions for downloading the data and/or models.

The instructions for submission and development are provided in more detail in Homework 0.

There are no grace days for project submission! So submit early and often.

That’s it. You are done with your Final Project!

Grading of Final Project Work

The final projects for this course will be graded using the following criteria:

  • Originality
  • Substance (amount of work done for the project)
  • Well documented use of prior results from research papers
  • Clarity of the writing, code and documentaiton quality
  • Quality of experimental design
  • Quality of evaluation and results
  • Theoretical insights / Practical insights
  • Group work (did the group work effectively together)
  • Overall score (based on the above criteria, but can include other factors like overall polish or creativity)

Project Grading Framework

The project marks are distributed as follows:

  • Proposal. Project proposal. 20 marks (see the section on Project Proposal for grading details)
  • Milestone. Description of progress and project plans. 20 marks (see the section on Project Milestone for grading details)
  • Work. Work done in the project. Results obtained. 14 marks (see the section on Grading of the Final Project Work for grading details)
  • Report. Description and analysis of what was done. 26 marks (see the section on Project Write-up for grading details)
  • Poster. Performance at the online poster session and presentation quality. 20 marks (for video and presentation)