Assignment 6: Diffusion Models

Welcome to the world of generative AI! In this assignment, you will implement a Denoising Diffusion Probabilistic Model (DDPM) from scratch. Diffusion models learn to generate data by reversing a gradual noising process: the forward process incrementally corrupts clean data with Gaussian noise until the signal is destroyed, while the reverse process learns to undo this corruption step by step. By the end of this assignment, you will train a neural network on 2D point distributions in Python, serialize the learned weights, and bring the model to life in real time through a WebGL shader that visualizes thousands of particles flowing between noise and structure.

This assignment has two parts. Part A covers training in a Jupyter notebook (Python/PyTorch). Part B covers the real-time GLSL shader implementation. You should complete Part A first, then use the trained weights in Part B.

Reading

You may find the following materials helpful:

Course Slides on Diffusion Models and Score-Based Generative Models
Ho et al., "Denoising Diffusion Probabilistic Models" (NeurIPS 2020)
Song et al., "Denoising Diffusion Implicit Models" (ICLR 2021)
Lilian Weng, "What are Diffusion Models?" (Blog Post)
Yang Song, "Generative Modeling by Estimating Gradients of the Data Distribution" (Blog Post)
tanelp/tiny-diffusion (GitHub)
CGAI Tutorial: Google Colab
CGAI Tutorial: Training a Network in PyTorch

Starter Code

Please visit the following GitHub repository to get our latest starter code: https://github.com/cg-gatech/cgai. Make sure to run git pull to synchronize the latest version. Make sure you can access the default CGAI web page after starting the npm server.

The starter code for this assignment is located in the folder src/app/(assignment)/assignment/A6. This folder contains the main page page.tsx, the GLSL shader fragment.glsl, and the Jupyter notebook diffusion.ipynb.

To view the assignment page, navigate to http://localhost:3000/assignment/A6 (note that the port number may vary depending on the available ports on your local computer). After completing Part A and pasting your trained weights into the shader, the page will display a real-time animation of particles diffusing between structured data and Gaussian noise.

Mathematical Background

A Denoising Diffusion Probabilistic Model (DDPM) consists of two paired processes: a fixed forward process that gradually destroys data by adding Gaussian noise, and a learned reverse process that reconstructs data by iteratively denoising. Our implementation follows Ho et al., "Denoising Diffusion Probabilistic Models" (NeurIPS 2020).

Forward Process

The forward process is a Markov chain that gradually corrupts clean data \(\mathbf{x}_0\) with Gaussian noise over \(T\) steps: \[ q(\mathbf{x}_{1:T} \mid \mathbf{x}_0) = \prod_{t=1}^{T} q(\mathbf{x}_t \mid \mathbf{x}_{t-1}) \] Each transition adds a small amount of noise controlled by a variance schedule \(\beta_1, \dots, \beta_T\): \[ q(\mathbf{x}_t \mid \mathbf{x}_{t-1}) = \mathcal{N}\!\left(\mathbf{x}_t;\; \sqrt{1 - \beta_t}\,\mathbf{x}_{t-1},\; \beta_t \mathbf{I}\right) \] Define \(\alpha_t = 1 - \beta_t\) and \(\bar{\alpha}_t = \prod_{s=1}^{t} \alpha_s\). By the properties of Gaussians, we can directly sample \(\mathbf{x}_t\) from \(\mathbf{x}_0\) in closed form, skipping all intermediate steps: \[ q(\mathbf{x}_t \mid \mathbf{x}_0) = \mathcal{N}\!\left(\mathbf{x}_t;\; \sqrt{\bar{\alpha}_t}\,\mathbf{x}_0,\; (1 - \bar{\alpha}_t)\mathbf{I}\right) \] Via reparameterization, sampling becomes: \[ \mathbf{x}_t = \sqrt{\bar{\alpha}_t}\,\mathbf{x}_0 + \sqrt{1 - \bar{\alpha}_t}\,\boldsymbol{\epsilon}, \qquad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}) \] As \(t \to T\), \(\bar{\alpha}_t \to 0\) and \(\mathbf{x}_t\) converges to pure Gaussian noise. We use a linear schedule for \(\beta_t\) and index it with a continuous \(t \in [0, 1]\), mapped to integer indices internally.

Reverse Process

The reverse process is also a Markov chain, but runs backwards in time. The learned reverse transition is: \[ p_\theta(\mathbf{x}_{t-1} \mid \mathbf{x}_t) = \mathcal{N}\!\left(\mathbf{x}_{t-1};\; \boldsymbol{\mu}_\theta(\mathbf{x}_t, t),\; \boldsymbol{\Sigma}_\theta(\mathbf{x}_t, t)\right) \] where the mean \(\boldsymbol{\mu}_\theta\) is parameterized via the noise predictor \(\boldsymbol{\epsilon}_\theta\): \[ \boldsymbol{\mu}_\theta(\mathbf{x}_t, t) = \frac{1}{\sqrt{\alpha_t}} \!\left(\mathbf{x}_t - \frac{1 - \alpha_t}{\sqrt{1 - \bar{\alpha}_t}}\,\boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)\right) \] For the variance, Ho et al. found that fixing \(\boldsymbol{\Sigma}_\theta = \tilde{\beta}_t \mathbf{I}\) works as well as learning it, where the posterior variance is: \[ \tilde{\beta}_t = \frac{1 - \bar{\alpha}_{t-1}}{1 - \bar{\alpha}_t}\,\beta_t \] Combining the mean and fixed variance, each reverse step becomes: \[ \mathbf{x}_{t-1} = \frac{1}{\sqrt{\alpha_t}}\!\left(\mathbf{x}_t - \frac{1 - \alpha_t}{\sqrt{1 - \bar{\alpha}_t}}\,\boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)\right) + \sqrt{\tilde{\beta}_t}\,\mathbf{z}, \qquad \mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}) \] At the final step \(t = 0\), no noise is added (\(\mathbf{z} = \mathbf{0}\)). Starting from \(\mathbf{x}_T \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\) and iterating for \(t = T{-}1, \dots, 0\) produces a sample from the learned distribution.

Part A: Training (Jupyter Notebook)

Open diffusion.ipynb and work through it from top to bottom. The notebook walks you through dataset sampling, the diffusion schedule, network architecture, training, and reverse sampling. You will fill in the sections marked with the comments Your implementation starts and Your implementation ends.

Step 0: Imports and Setup

Run this section to install the required packages and import all necessary libraries. No implementation is required for this step.

Step 1: 2D Point Cloud Dataset

The dataset code is fully provided. Read through the three available distributions and choose one by setting DATASET_NAME:

circle: points sampled uniformly on a 1D circular curve.
checkerboard: points sampled uniformly from the white squares of a 4×4 grid in \([-1,1]^2\).
image: points sampled from the dark regions of a custom image (see sample_from_image).

Run the visualization cell to confirm that your dataset looks correct before proceeding. No implementation is required for this step.

Step 2: Diffusion Schedule and Forward Process

Your first task is to complete the DiffusionSchedule class, which manages the noise schedule throughout training and sampling. See the Mathematical Background section above for the full derivation.

Step 2.1: Precompute Schedule Quantities

Inside __init__(), precompute and store the quantities that the reverse process will need at every step:

\(\alpha_t = 1 - \beta_t\)
\(\bar{\alpha}_t = \prod_{s=1}^{t} \alpha_s\) (cumulative product of \(\alpha\)s)
\(\tilde{\beta}_t = \frac{1 - \bar{\alpha}_{t-1}}{1 - \bar{\alpha}_t} \beta_t\) (posterior variance)

Step 2.2: Forward Sampling

Implement q_sample(), which corrupts a batch of clean points \(\mathbf{x}_0\) to produce noisy samples \(\mathbf{x}_t\). Using the closed-form reparameterization, you can jump directly to any noise level without iterating through intermediate steps: \[ \mathbf{x}_t = \sqrt{\bar{\alpha}_t}\,\mathbf{x}_0 + \sqrt{1 - \bar{\alpha}_t}\,\boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}) \] The method receives continuous time values \(t \in [0, 1]\). Use the provided helper _t_to_idx(), which maps a continuous \(t\) to the nearest integer index in the schedule arrays, to look up the corresponding \(\bar{\alpha}_t\).

Step 3: Denoising Network

The overall network architecture is provided: a time-conditioned MLP \(\boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)\) that predicts the noise added at step \(t\). The input \(\mathbf{x}_t\) is first lifted with a NeRF-style sinusoidal positional encoding before being fed into the network: \[ \gamma(\mathbf{x}_t) = \bigl[\mathbf{x}_t,\; \sin(2^0 \mathbf{x}_t),\; \cos(2^0 \mathbf{x}_t),\; \dots,\; \sin(2^{L-1} \mathbf{x}_t),\; \cos(2^{L-1} \mathbf{x}_t)\bigr] \] where \(L\) is the number of frequency bands. The scalar \(t \in [0,1]\) is concatenated directly (without encoding) to the encoded \(\mathbf{x}_t\), and the combined vector is passed through 3 hidden layers of width 48 with ReLU activations, followed by a linear output layer that produces a 2D noise prediction.

Copy your positional encoding implementation from the A3b NeRF assignment into _positional_encoding. Then read through forward to understand how the encoded input is assembled and passed through the MLP. No other changes are needed in this step.

Step 4: Training

Implement train_step(), which performs one gradient update. Each call should carry out the following steps:

Sample a time step \(t\) uniformly from \([0, 1]\) for each item in the batch.
Corrupt the clean data: call q_sample() to obtain the noisy sample \(\mathbf{x}_t\) and the ground-truth noise \(\boldsymbol{\epsilon}\).
Predict the noise with the network: \(\hat{\boldsymbol{\epsilon}} = \boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)\).
Compute the MSE loss: \(\mathcal{L} = \mathbb{E}\left[\|\boldsymbol{\epsilon} - \hat{\boldsymbol{\epsilon}}\|^2\right]\).
Perform a gradient update using the provided optimizer and learning-rate scheduler.

Step 5: Reverse Diffusion Sampling

Implement the core reverse step inside ddpm_sample(). The reverse process starts from pure Gaussian noise \(\mathbf{x}_T\) and iterates from step \(T{-}1\) down to \(0\), progressively denoising until structured samples emerge. At each step:

Convert the integer step index to a continuous time value \(t \in [0, 1]\) for the network.
Retrieve \(\alpha_t\), \(\bar{\alpha}_t\), and \(\tilde{\beta}_t\) from the schedule.
Predict noise: \(\hat{\boldsymbol{\epsilon}} = \boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)\).
Compute the posterior mean: \[ \boldsymbol{\mu}_\theta = \frac{1}{\sqrt{\alpha_t}}\left(\mathbf{x}_t - \frac{1 - \alpha_t}{\sqrt{1 - \bar{\alpha}_t}}\,\hat{\boldsymbol{\epsilon}}\right) \]
Sample the previous state: \(\mathbf{x}_{t-1} = \boldsymbol{\mu}_\theta + \sqrt{\tilde{\beta}_t}\,\mathbf{z}\), where \(\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\). Skip the noise injection at the final step \(t = 0\).

Step 6: Serialize Model

After training and sampling are complete, run the serialization cell at the end of the notebook. It writes a file called serialized_model.txt, which encodes your trained network weights as a GLSL function using the same packed-matrix approach from the NeRF assignment. Open the file and paste its contents into the queryNetwork placeholder in fragment.glsl, replacing the stub return vec2(0.0). No implementation is required for this step.

Part B: GLSL Visualization (Shader)

The shader implementation lives in fragment.glsl. Before implementing the two functions below, paste the contents of serialized_model.txt into the queryNetwork placeholder at the top of the file, replacing the stub return vec2(0.0). This fills in your trained neural network weights so the shader can predict noise at any position and time.

Step 7: Forward Process in the Shader

Implement forwardProcess(), which performs one Markov step of the forward diffusion chain. Given the current particle position xt and the next time step t_next, apply: \[ \mathbf{x}' = \sqrt{1 - \beta(t)}\,\mathbf{x} + \sqrt{\beta(t)}\,\boldsymbol{\epsilon} \] The Gaussian noise \(\boldsymbol{\epsilon}\) is already drawn for you in the local variable z, and the schedule function beta() is provided in the shader.

Step 8: Reverse Process in the Shader

Implement reverseProcess(), which performs one DDPM reverse step. This mirrors what you implemented in Python in Step 5. Given the current position xt, the current time t, and the previous time t_prev:

Compute the schedule quantities: \(\alpha = 1 - \beta(t)\), \(\bar{\alpha} = \text{alphaBar}(t)\), and the posterior variance \(\tilde{\beta} = \frac{1 - \text{alphaBar}(t_{\text{prev}})}{1 - \bar{\alpha}} \cdot \beta(t)\).
Predict noise by calling queryNetwork(xt, t).
Compute the posterior mean: \[ \boldsymbol{\mu} = \frac{1}{\sqrt{\alpha}}\left(\mathbf{x} - \frac{1 - \alpha}{\sqrt{1 - \bar{\alpha}}}\,\boldsymbol{\epsilon}\right) \]
Sample the previous state: \(\mathbf{x}' = \boldsymbol{\mu} + \sqrt{\max(\tilde{\beta}, 0)}\,\boldsymbol{\epsilon}\). Skip the noise term when t_prev ≤ 0.0.

The Gaussian noise variable z, and the helper functions beta() and alphaBar(), are already provided in the shader.

Creative Expression

In the Creative Expression section of this assignment, train the model on a different 2D distribution (for example, using the image-based sampler with a custom image, or designing your own point distribution). Re-serialize the weights, paste them into the shader, and showcase the resulting animation. You are also encouraged to modify the shader to customize the visual presentation (for example, experimenting with particle colors, sizes, or trails). The creative expression theme for this assignment is From Chaos to Order.

Submission

Your completed diffusion.ipynb with all cell outputs visible
Your completed fragment.glsl with trained weights pasted in and both functions implemented
A screen recording of the WebGL diffusion animation trained on the bee image dataset
A screen recording of your creative expression WebGL animation
A concise technical explanation of your implementation

Grading

This assignment is worth a total of 8 points, with the grading criteria outlined as follows:

Technical Contribution (7 points):
- Step 2: 2 points
- Step 4: 1 point
- Step 5: 2 points
- Step 7: 1 point
- Step 8: 1 point
Creative Expression (1 point): Train on a custom distribution and showcase it in the WebGL visualization. The theme is From Chaos to Order.

Sharing Your Work

You are encouraged to share your graphical work with the class. If you want to do so, please upload your video to the Ed Discussion post A6 Gallery: From Chaos to Order. Share with us your unique diffusion animation!