CS 180 Project 5

Part A0: Set Up

After 20 denoising and 20 upsampling steps, here are the results:

an oil painting of a snowy mountain village
a man wearing a hat
a rocket ship

The first image didn’t quite capture the texture of the oil painting. Instead, it looks more like a children’s cartoon.

The second image is pretty realistic. It even kept the background void since there’s no descriptions provided in the prompt.

The last image also looks cartoony, maybe due to the diction of “rocket ship” appears more frequently in describing cartoons instead of actual spaceships.

I experimented with the prompt ”a man wearing a hat” using 100, 500, and 1000 inference steps.

100
500
1000

Interestingly, the men look different in each of the images. For the 1000 steps output, we are seeing an outdoors background, as opposed to the previous attemps that kep the background a solid color. I also observe a strange halo around the person for the 100 and 500 steps outputs.

Part A1: Sampling Loops

1.1: Implementing the forward pass

noised images at different noise levels:

250
500
750

1.2: Classical Denoising

Using the gaussian blur denoising method, I got the following results:

250
500
750

These just look like the blurry version of the noisy images.

1.3: Implementing One Step Denoising

Using the noise estimate from the unet, here are the denoised results of the 3 timesteps:

t=250
t=500
t=750

The denoised results at t=250 looks the best, presumably since it’s closer to the original image compared to the other two. On the other end of the spectrum, t=750 the denoised results still look quite blurry, and doesn’t look like the campanile much.

1.4: Implementing Iterative Denoising

Here are the steps for the iterative denoising steps:

t=690
t=540
t=390
t=240
t=90
final image

Compared to the previous methods:

one step estimation
gaussian blur

1.5 Diffusion Model Sampling

The results don’t look too great. I tried with various seeds to no avail.

1.6 Classifer-Free Guidance

Now the generations look much nicer! Here are the results:

1.7 Image-to-image Translation

test_im

Here are the results from running the iterative_denoise_cfg on the test_im

starting index=1
starting index=3

starting index=5
starting index=7
starting index=10
starting index=20

Buddist Temple

Here’s the other image I tested on (original):

Tibetan Buddist Temple found in Gansu province

Results:

starting index=1
starting index=3
starting index=5
starting index=7
starting index=10
starting index=20

Cow

Original image:

Results:

starting index=1
starting index=3
starting index=5
starting index=7
starting index=10
starting index=20

Interesting regress at starting index 7, after starting index 5 where the center shape looking much more like the original cow .

1.7.1: Editing Hand-Drawn Images

For the Avocado:

Original:

Edited:

1
3
5
7
10
20

Hand drawn 1:

Edited:

1
3
5
7
10
20

Hand drawn 2:

Edited:

1
3
5
7
10
20

Interesting regress at start index=7, but pretty “creative”.

1.7.2: Inpainting

Test image:

Image
Mask
To Replace
results

Temple:

Image
Temple Mask
To Replace
results

Cow:

Image
Cow Mask
To Replace
results

1.7.3 Text-Conditioned Image-to-image Translation

For my own images, I’m using the cow with 2 different prompts

Prompt: a photo of a dog

start index=20
original Image

The composition of this last image really closely resembles that of the original cow! So it seems you don’t need that many time steps of noise if you want the composition to remain similar

Prompt: a lithograph of a skull

start index=20
original Image

1.8: Visual Anagrams

an oil painting of people around a campfire + an oil painting of an old man

a photo of the amalfi cost + an oil painting of a snowy mountain village

a lithograph of a skull + a lithograph of waterfalls

1.9 Hybrid images

High pass: an oil painting of people around a campfire

Low pass: a photo of the amalfi coast

High pass: a lithograph of waterfalls

Low pass: a lithograph of a skull

High pass: a pencil

Low pass: a rocket ship

Part B1: Training a Single-Step Denoising UNet

1.1 - 1.2 Implementing the Unet & Training a Denoiser

In order to get noised images, I sample from randn. Here are the noised images:

1.2.1 Training

Loss graph of training batches:

Some results

Original Images
Noised Images
Results after epoch 1
Results after epoch 5

1.2.2 Out-of-Distribution Testing

here are the results for OOD with different sigmas:

Part B2:

2.1 - 3: Time Conditioning to UNet

Time Conditional Loss
Epoch 5

Epoch 20

2.4: Add Class Conditioning to UNet

Class Conditional Loss
Epoch 5
Epoch 20