Welcome to the world of generative AI! In this assignment, you will implement a Denoising Diffusion Probabilistic Model (DDPM) from scratch. Diffusion models learn to generate data by reversing a gradual noising process: the forward process incrementally corrupts clean data with Gaussian noise until the signal is destroyed, while the reverse process learns to undo this corruption step by step. By the end of this assignment, you will train a neural network on 2D point distributions in Python, serialize the learned weights, and bring the model to life in real time through a WebGL shader that visualizes thousands of particles flowing between noise and structure.
This assignment has two parts. Part A covers training in a Jupyter notebook (Python/PyTorch). Part B covers the real-time GLSL shader implementation. You should complete Part A first, then use the trained weights in Part B.
You may find the following materials helpful:
Please visit the following GitHub repository to get our latest starter code: https://github.com/cg-gatech/cgai. Make sure to run git pull to synchronize the latest version. Make sure you can access the default CGAI web page after starting the npm server.
The starter code for this assignment is located in the folder src/app/(assignment)/assignment/A6. This folder contains the main page page.tsx, the GLSL shader fragment.glsl, and the Jupyter notebook diffusion.ipynb.
To view the assignment page, navigate to http://localhost:3000/assignment/A6 (note that the port number may vary depending on the available ports on your local computer). After completing Part A and pasting your trained weights into the shader, the page will display a real-time animation of particles diffusing between structured data and Gaussian noise.
A Denoising Diffusion Probabilistic Model (DDPM) consists of two paired processes: a fixed forward process that gradually destroys data by adding Gaussian noise, and a learned reverse process that reconstructs data by iteratively denoising. Our implementation follows Ho et al., "Denoising Diffusion Probabilistic Models" (NeurIPS 2020).
The forward process is a Markov chain that gradually corrupts clean data \(\mathbf{x}_0\) with Gaussian noise over \(T\) steps: \[ q(\mathbf{x}_{1:T} \mid \mathbf{x}_0) = \prod_{t=1}^{T} q(\mathbf{x}_t \mid \mathbf{x}_{t-1}) \] Each transition adds a small amount of noise controlled by a variance schedule \(\beta_1, \dots, \beta_T\): \[ q(\mathbf{x}_t \mid \mathbf{x}_{t-1}) = \mathcal{N}\!\left(\mathbf{x}_t;\; \sqrt{1 - \beta_t}\,\mathbf{x}_{t-1},\; \beta_t \mathbf{I}\right) \] Define \(\alpha_t = 1 - \beta_t\) and \(\bar{\alpha}_t = \prod_{s=1}^{t} \alpha_s\). By the properties of Gaussians, we can directly sample \(\mathbf{x}_t\) from \(\mathbf{x}_0\) in closed form, skipping all intermediate steps: \[ q(\mathbf{x}_t \mid \mathbf{x}_0) = \mathcal{N}\!\left(\mathbf{x}_t;\; \sqrt{\bar{\alpha}_t}\,\mathbf{x}_0,\; (1 - \bar{\alpha}_t)\mathbf{I}\right) \] Via reparameterization, sampling becomes: \[ \mathbf{x}_t = \sqrt{\bar{\alpha}_t}\,\mathbf{x}_0 + \sqrt{1 - \bar{\alpha}_t}\,\boldsymbol{\epsilon}, \qquad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}) \] As \(t \to T\), \(\bar{\alpha}_t \to 0\) and \(\mathbf{x}_t\) converges to pure Gaussian noise. We use a linear schedule for \(\beta_t\) and index it with a continuous \(t \in [0, 1]\), mapped to integer indices internally.
The reverse process is also a Markov chain, but runs backwards in time. The learned reverse transition is: \[ p_\theta(\mathbf{x}_{t-1} \mid \mathbf{x}_t) = \mathcal{N}\!\left(\mathbf{x}_{t-1};\; \boldsymbol{\mu}_\theta(\mathbf{x}_t, t),\; \boldsymbol{\Sigma}_\theta(\mathbf{x}_t, t)\right) \] where the mean \(\boldsymbol{\mu}_\theta\) is parameterized via the noise predictor \(\boldsymbol{\epsilon}_\theta\): \[ \boldsymbol{\mu}_\theta(\mathbf{x}_t, t) = \frac{1}{\sqrt{\alpha_t}} \!\left(\mathbf{x}_t - \frac{1 - \alpha_t}{\sqrt{1 - \bar{\alpha}_t}}\,\boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)\right) \] For the variance, Ho et al. found that fixing \(\boldsymbol{\Sigma}_\theta = \tilde{\beta}_t \mathbf{I}\) works as well as learning it, where the posterior variance is: \[ \tilde{\beta}_t = \frac{1 - \bar{\alpha}_{t-1}}{1 - \bar{\alpha}_t}\,\beta_t \] Combining the mean and fixed variance, each reverse step becomes: \[ \mathbf{x}_{t-1} = \frac{1}{\sqrt{\alpha_t}}\!\left(\mathbf{x}_t - \frac{1 - \alpha_t}{\sqrt{1 - \bar{\alpha}_t}}\,\boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)\right) + \sqrt{\tilde{\beta}_t}\,\mathbf{z}, \qquad \mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}) \] At the final step \(t = 0\), no noise is added (\(\mathbf{z} = \mathbf{0}\)). Starting from \(\mathbf{x}_T \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\) and iterating for \(t = T{-}1, \dots, 0\) produces a sample from the learned distribution.
Open diffusion.ipynb and work through it from top to bottom. The notebook walks you through dataset sampling, the diffusion schedule, network architecture, training, and reverse sampling. You will fill in the sections marked with the comments Your implementation starts and Your implementation ends.
Run this section to install the required packages and import all necessary libraries. No implementation is required for this step.
The dataset code is fully provided. Read through the three available distributions and choose one by setting DATASET_NAME:
sample_from_image).Run the visualization cell to confirm that your dataset looks correct before proceeding. No implementation is required for this step.
Your first task is to complete the DiffusionSchedule class, which manages the noise schedule throughout training and sampling. See the Mathematical Background section above for the full derivation.
Inside __init__(), precompute and store the quantities that the reverse process will need at every step:
Implement q_sample(), which corrupts a batch of clean points \(\mathbf{x}_0\) to produce noisy samples \(\mathbf{x}_t\). Using the closed-form reparameterization, you can jump directly to any noise level without iterating through intermediate steps:
\[
\mathbf{x}_t = \sqrt{\bar{\alpha}_t}\,\mathbf{x}_0 + \sqrt{1 - \bar{\alpha}_t}\,\boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})
\]
The method receives continuous time values \(t \in [0, 1]\). Use the provided helper _t_to_idx(), which maps a continuous \(t\) to the nearest integer index in the schedule arrays, to look up the corresponding \(\bar{\alpha}_t\).
The overall network architecture is provided: a time-conditioned MLP \(\boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)\) that predicts the noise added at step \(t\). The input \(\mathbf{x}_t\) is first lifted with a NeRF-style sinusoidal positional encoding before being fed into the network: \[ \gamma(\mathbf{x}_t) = \bigl[\mathbf{x}_t,\; \sin(2^0 \mathbf{x}_t),\; \cos(2^0 \mathbf{x}_t),\; \dots,\; \sin(2^{L-1} \mathbf{x}_t),\; \cos(2^{L-1} \mathbf{x}_t)\bigr] \] where \(L\) is the number of frequency bands. The scalar \(t \in [0,1]\) is concatenated directly (without encoding) to the encoded \(\mathbf{x}_t\), and the combined vector is passed through 3 hidden layers of width 48 with ReLU activations, followed by a linear output layer that produces a 2D noise prediction.
Copy your positional encoding implementation from the A3b NeRF assignment into _positional_encoding. Then read through forward to understand how the encoded input is assembled and passed through the MLP. No other changes are needed in this step.
Implement train_step(), which performs one gradient update. Each call should carry out the following steps:
q_sample() to obtain the noisy sample \(\mathbf{x}_t\) and the ground-truth noise \(\boldsymbol{\epsilon}\).
Implement the core reverse step inside ddpm_sample(). The reverse process starts from pure Gaussian noise \(\mathbf{x}_T\) and iterates from step \(T{-}1\) down to \(0\), progressively denoising until structured samples emerge. At each step:
After training and sampling are complete, run the serialization cell at the end of the notebook. It writes a file called serialized_model.txt, which encodes your trained network weights as a GLSL function using the same packed-matrix approach from the NeRF assignment. Open the file and paste its contents into the queryNetwork placeholder in fragment.glsl, replacing the stub return vec2(0.0). No implementation is required for this step.
The shader implementation lives in fragment.glsl. Before implementing the two functions below, paste the contents of serialized_model.txt into the queryNetwork placeholder at the top of the file, replacing the stub return vec2(0.0). This fills in your trained neural network weights so the shader can predict noise at any position and time.
Implement forwardProcess(), which performs one Markov step of the forward diffusion chain. Given the current particle position xt and the next time step t_next, apply:
\[
\mathbf{x}' = \sqrt{1 - \beta(t)}\,\mathbf{x} + \sqrt{\beta(t)}\,\boldsymbol{\epsilon}
\]
The Gaussian noise \(\boldsymbol{\epsilon}\) is already drawn for you in the local variable z, and the schedule function beta() is provided in the shader.
Implement reverseProcess(), which performs one DDPM reverse step. This mirrors what you implemented in Python in Step 5. Given the current position xt, the current time t, and the previous time t_prev:
queryNetwork(xt, t).t_prev ≤ 0.0.
The Gaussian noise variable z, and the helper functions beta() and alphaBar(), are already provided in the shader.
In the Creative Expression section of this assignment, train the model on a different 2D distribution (for example, using the image-based sampler with a custom image, or designing your own point distribution). Re-serialize the weights, paste them into the shader, and showcase the resulting animation. You are also encouraged to modify the shader to customize the visual presentation (for example, experimenting with particle colors, sizes, or trails). The creative expression theme for this assignment is From Chaos to Order.
diffusion.ipynb with all cell outputs visiblefragment.glsl with trained weights pasted in and both functions implementedThis assignment is worth a total of 8 points, with the grading criteria outlined as follows:
You are encouraged to share your graphical work with the class. If you want to do so, please upload your video to the Ed Discussion post A6 Gallery: From Chaos to Order. Share with us your unique diffusion animation!