CS4248 Natural Language Processing

Click a lecture to expand. New notebooks added weekly as lectures progress.

? Setup Guide

HTML (no setup)

Click the green HTML button to view a pre-rendered version of the notebook in your browser. Read-only, no setup needed.

Google Colab (no setup)

Click the yellow Colab button to open a runnable notebook in Google Colab. To save your work: File > Save a copy in Drive.

Greyed-out Colab buttons indicate notebooks that may not work in Colab due to missing dependencies.

Local Jupyter

Clone the repo and start Jupyter from inside it:

git clone https://github.com/lavanyagarg112/CS4248_NB_Drive.git
cd CS4248_NB_Drive
jupyter lab    # or: jupyter notebook

Install Jupyter if needed: pip install jupyterlab or pip install notebook

Then click the blue Jupyter button on this page to open the notebook in your running server. If the port differs from 8888, update the Base URL field above.

Lecture 1: What is NLP and Why is it so Hard?

5 notebooks
Data Preparation for Training LLMs — An Overview
Token Indexing with Vocabulariesoptional
Working with Batches for Sequence Tasksoptional
Data Batching for Training LLMsoptional
NumPy — Basic Tutorialoptional

Lecture 2: Strings & Words

8 notebooks
Regular Expressions
Text Tokenization
Word Tokenizer (implementation from scratch)
Byte-Pair Encoding
WordPiece
Text Normalization
Stemming & Lemmatization
Porter Stemmeroptional

Lecture 3: n-Gram Language Models

5 notebooks
Language Models
n-Gram Language Models (basic)
n-Gram Language Models (advanced)
RNN-based Language Modelsoptional
Transformer-based Language Modelsoptional

Lecture 4: Structure in Language

5 notebooks
Part-of-Speech Tagging
Constituency Parsing
Dependency Parsing
POS Tagging with HMMsoptional
CYK Algorithmoptional

Lecture 5: Text Classification

4 notebooks
Multinomial Naive Bayes
Vector Space Model
Text Classificationoptional
Naive Bayes Classifier (from scratch)optional

Lecture 6: Introduction into Connectionist Machine Learning

10 notebooks
Logistic Regression (Basics)
Artificial Neural Networks (Basic Architecture)
Gradient Descent
Backpropagation (Basic Examples)
Bias & Variance (Machine Learning)
Logistic Regressionoptional
The Linear Layeroptional
The Softmax Functionoptional
Backpropagationoptional
Implementing an ANN from Scratchoptional

Lecture 7: Word Embeddings

3 notebooks
Word & Text Embeddings (Overview)
Word2Vec (Basics)
Word2Vec (Training from Scratch)

Lecture 8: Encoder-Decoder

3 notebooks
Recurrent Neural Networks
Language Modeling with RNNs
Working with Batches for Sequence Tasks

Lecture 9: Transformers

7 notebooks
Attention Mechanism
Transformers (Basic Architecture)
Positional Encodings (Basics)
Positional Encodings (Original Transformer)
Positional Encodings — RoPEoptional
Machine Translation with Transformers
Masking in Sequence Models

Lecture 10: LLMs

6 notebooks
Positional Encodings (RoPE)
Building a GPT-Style LLM from Scratch
Working with the OpenAI API
Using Pretrained LLMs Locally
Data Preparation for Training LLMsoptional
Data Batching for Training LLMsoptional