Skip to content

dccuchile/CC6205

Repository files navigation

CC6205 - Natural Language Processing

This is a course on natural language processing.

Info

This course aims to provide a comprehensive introduction to Natural Language Processing (NLP) by covering essential concepts. We strive to strike a balance between traditional techniques, such as N-gram language models, Naive Bayes, and Hidden Markov Models (HMMs), and modern deep neural networks, including word embeddings, recurrent neural networks (RNNs), and transformers.

The course material draws from various sources. In many instances, sentences from these sources are directly incorporated into the slides. The neural network topics primarily rely on the book Neural Network Methods for Natural Language Processing by Goldberg. Non-neural network topics, such as Probabilistic Language Models, Naive Bayes, and HMMs, are sourced from Michael Collins' course and Dan Jurafsky's book. Additionally, some slides are adapted from online tutorials and other courses, such as Manning's Stanford course.

2025 Update: We are currently updating the course program to incorporate new capabilities of Large Language Models and explore interesting applications. As a result, we will omit some topics covered in previous iterations, such as Naive Bayes, Hidden Markov Models, and Convolutional Neural Networks. Additionally, we will assume a basic understanding of machine learning concepts, including train/test/validation splits, cross-validation, and fundamental models (SVM, Naive Bayes, and possibly Random Forest). However, the material from previous years remains available.

Slides

As we implement this new version of the course, we will upload all the new slides.

Unit 1: NLP fundamentals

  1. Introduction to Natural Language Processing
  2. Vector semantics (in Spanish)
  3. Fundamental questions about language
  4. Probabilistic Language Models
  5. Linear Models

Unit 2: Neural networks and NLP

  1. Neural Networks
  2. Word Vectors
  3. Recurrent Neural Networks
  4. Sequence-to-sequence + Attention
  5. Transformers + BERT

Unit 3: Large Language Models: new paradigm and open questions

  1. GPT + Emergent Abilities in LLMs
  2. Retrieval Augmented Generation
  3. Interpretability
  4. Agents
  5. Ethics in NLP

Evaluation

  • NC: 2 tests (after Unit 1 and Unit 2)

  • NT: 3 group homework assignments

  • NP: Group presentation of a paper of your choice (we will upload the list of papers soon)

  • NF (final grade): (NC + NT + NP) / 3

    • Important: NC, NT and NP must be >= 4.0

Slides from previous years

  1. Introduction to Natural Language Processing | (tex source file), video 1, video 2
  2. Vector Space Model and Information Retrieval | (tex source file), video 1, video 2
  3. Probabilistic Language Models | (tex source file), notes, video 1, video 2, video 3, video 4
  4. Text Classification and Naive Bayes | (tex source file) , notes, video 1, video 2, video 3
  5. Linear Models | (tex source file), video 1, video 2, video 3, video 4
  6. Neural Networks | (tex source file), video 1, video 2, video 3, video 4
  7. Word Vectors | (tex source file) video 1, video 2, video 3
  8. Sequence Labeling and Hidden Markov Models | (tex source file), notes, video 1, video 2, video 3, video 4
  9. MEMMs and CRFs | (tex source file), notes 1, notes 2, video 1, video 2, video 3 (optional)
  10. Convolutional Neural Networks | (tex source file), video
  11. Recurrent Neural Networks | (tex source file), video 1, video 2,
  12. Sequence to Sequence Models and Attention | (tex source file), video 1, video 2
  13. Transformer Architecture | (tex source file), video 1
  14. Contextualized Embeddings and Large Language Models, video 1, video 2, video 3
  15. Large Language Models Usage and Evaluation Patterns, video

NLP Libraries and Tools

  1. NLTK: Natural Language Toolkit
  2. Gensim
  3. spaCy: Industrial-strength NLP
  4. Torchtext
  5. AllenNLP: Open source project for designing deep leaning-based NLP models
  6. HuggingFace Transformers
  7. ChatGPT
  8. Google Bard
  9. Stanza - A Python NLP Library for Many Human Languages
  10. FlairNLP: A very simple framework for state-of-the-art Natural Language Processing (NLP)
  11. WEFE: The Word Embeddings Fairness Evaluation Framework
  12. WhatLies: A library that tries help you to understand. "What lies in word embeddings?"
  13. LASER:a library to calculate and use multilingual sentence embeddings
  14. Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch
  15. Datasets: a lightweight library with one-line dataloaders for many public datasets in NLP
  16. RiverText: A Python Library for Training and Evaluating Incremental Word Embeddings from Text Data Streams

Notes and Books

  1. Speech and Language Processing (3rd ed. draft) by Dan Jurafsky and James H. Martin.
  2. Michael Collins' NLP notes.
  3. A Primer on Neural Network Models for Natural Language Processing by Joav Goldberg.
  4. Natural Language Understanding with Distributed Representation by Kyunghyun Cho
  5. A Survey of Large Language Models
  6. Natural Language Processing Book by Jacob Eisenstein
  7. NLTK book
  8. Embeddings in Natural Language Processing by Mohammad Taher Pilehvar and Jose Camacho-Collados
  9. Dive into Deep Learning Book
  10. Contextual Word Representations: A Contextual Introduction by Noah A. Smith

Other NLP Courses

  1. CS224n: Natural Language Processing with Deep Learning, Stanford course
  2. Deep Learning in NLP: slides by Horacio Rodríguez
  3. David Bamman NLP Slides @Berkley
  4. CS 521: Statistical Natural Language Processing by Natalie Parde, University of Illinois
  5. 10 Free Top Notch Natural Language Processing Courses

Videos

  1. Natural Language Processing MOOC videos by Dan Jurafsky and Chris Manning, 2012
  2. Natural Language Processing MOOC videos by Michael Collins, 2013
  3. Natural Language Processing with Deep Learning by Chris Manning and Richard Socher, 2017
  4. CS224N: Natural Language Processing with Deep Learning | Winter 2019
  5. Computational Linguistics I by Jordan Boyd-Graber University of Maryland
  6. Visualizing and Understanding Recurrent Networks
  7. BERT Research Series by Chris McCormick
  8. Successes and Challenges in Neural Models for Speech and Language - Michael Collins
  9. More on Transforemers: BERT and Friends by Jorge Pérez

Other Resources

  1. ACL Portal
  2. Awesome-nlp: A curated list of resources dedicated to Natural Language Processing
  3. NLP-progress: Repository to track the progress in Natural Language Processing (NLP)
  4. Corpora Mailing List
  5. 🤗 Open LLM Leaderboard
  6. Real World NLP Book: AllenNLP tutorials
  7. The Illustrated Transformer: a very illustrative blog post about the Transformer
  8. Better Language Models and Their Implications OpenAI Blog
  9. Understanding LoRA and QLoRA — The Powerhouses of Efficient Finetuning in Large Language Models
  10. RNN effectiveness
  11. SuperGLUE: an benchmark of Natural Language Understanding Tasks
  12. decaNLP The Natural Language Decathlon: a benchmark for studying general NLP models that can perform a variety of complex, natural language tasks.
  13. Chatbot and Related Research Paper Notes with Images
  14. Ben Trevett's torchtext tutorials
  15. PLMpapers: a collection of papers about Pre-Trained Language Models
  16. The Illustrated GPT-2 (Visualizing Transformer Language Models)
  17. Linguistics, NLP, and Interdisciplinarity Or: Look at Your Data, by Emily M. Bender
  18. The State of NLP Literature: Part I, by Saif Mohammad
  19. From Word to Sense Embeddings:A Survey on Vector Representations of Meaning
  20. 10 ML & NLP Research Highlights of 2019 by Sebastian Ruder
  21. Towards a Conversational Agent that Can Chat About…Anything
  22. The Super Duper NLP Repo: a collection of Colab notebooks covering a wide array of NLP task implementations
  23. The Big Bad NLP Database, a collection of nearly 300 well-organized, sortable, and searchable natural language processing datasets
  24. A Primer in BERTology: What we know about how BERT works
  25. How Self-Attention with Relative Position Representations works
  26. Deep Learning Based Text Classification: A Comprehensive Review
  27. Teaching NLP is quite depressing, and I don't know how to do it well by Yoav Goldberg
  28. The NLP index
  29. 100 Must-Read NLP Papers