CS 180 Project 5

Part A0: Set Up

After 20 denoising and 20 upsampling steps, here are the results:

an oil painting of a snowy mountain village

The first image didn’t quite capture the texture of the oil painting. Instead, it looks more like a children’s cartoon.

The second image is pretty realistic. It even kept the background void since there’s no descriptions provided in the prompt.

The last image also looks cartoony, maybe due to the diction of “rocket ship” appears more frequently in describing cartoons instead of actual spaceships.

I experimented with the prompt ”a man wearing a hat” using 100, 500, and 1000 inference steps.

Interestingly, the men look different in each of the images. For the 1000 steps output, we are seeing an outdoors background, as opposed to the previous attemps that kep the background a solid color. I also observe a strange halo around the person for the 100 and 500 steps outputs.

Part A1: Sampling Loops

1.1: Implementing the forward pass

noised images at different noise levels:

1.2: Classical Denoising

Using the gaussian blur denoising method, I got the following results:

These just look like the blurry version of the noisy images.

1.3: Implementing One Step Denoising

Using the noise estimate from the unet, here are the denoised results of the 3 timesteps:

The denoised results at t=250 looks the best, presumably since it’s closer to the original image compared to the other two. On the other end of the spectrum, t=750 the denoised results still look quite blurry, and doesn’t look like the campanile much.

1.4: Implementing Iterative Denoising

Here are the steps for the iterative denoising steps:

Compared to the previous methods:

1.5 Diffusion Model Sampling

The results don’t look too great. I tried with various seeds to no avail.

1.6 Classifer-Free Guidance

Now the generations look much nicer! Here are the results:

1.7 Image-to-image Translation

test_im

Here are the results from running the iterative_denoise_cfg on the test_im

Buddist Temple

Here’s the other image I tested on (original):

Results:

Cow

Original image:

Results:

Interesting regress at starting index 7, after starting index 5 where the center shape looking much more like the original cow .

1.7.1: Editing Hand-Drawn Images

For the Avocado:

Original:

Edited:

Hand drawn 1:

Edited:

Hand drawn 2:

Edited:

Interesting regress at start index=7, but pretty “creative”.

1.7.2: Inpainting

Test image:

Temple:

Cow:

1.7.3 Text-Conditioned Image-to-image Translation

For my own images, I’m using the cow with 2 different prompts

Prompt: a photo of a dog

The composition of this last image really closely resembles that of the original cow! So it seems you don’t need that many time steps of noise if you want the composition to remain similar

Prompt: a lithograph of a skull