Stable Diffusion调研-天翼云开发者社区

1 Stable Diffusion简介

Stable Diffusion (SD)在2022年发表，一种基于Latent Diffusion Models的新兴机器学习技术。

SD可以文生图 ，图生图以及图像inpainting ，文生图指根据输入文本生成相应的图像，它是SD的基础功能，而图生图和图像inpainting是在文生图的基础上延伸出来的两个功能；

Stable Diffusion 目前主要用于娱乐与创意,但未来有望在更广范围内应用于设计、教育、医学等领域。

SD的主要特点有：

1. 开源：代码和模型都在 MIT 许可下开源，可以自由使用和修改；

2. 高质量：生成的图像质量比较高,细节丰富,色彩鲜艳；

3. 快速：使用 GPU 可以非常快速地生成大量高质量图像；

4. 可控：可以精细控制生成图像的具体风格、图像素材等方面；

5. 可解释：理论上可以解释模型的决策过程,这有助于避免生成有害内容；

6. 可微调：提供了通过微调来改进模型的方法，用户可以使用自己的数据集来微调模型,从而生成更适合自己需求的图像。

2 Stable Diffusion 原理

SD是一个基于latent的扩散模型，它在UNet中引入text condition来实现基于文本生成图像。SD的核心来源于Latent Diffusion，常规的扩散模型是基于pixel的生成模型，而Latent Diffusion是基于latent的生成模型，

它先采用一个autoencoder将图像压缩到latent空间，然后用扩散模型来生成图像的latents，最后送入autoencoder的decoder模块就可以得到生成的图像，如下图：

SD涉及到以下两个主要概念：

扩散过程（Diffusion Process）：扩散过程是指一种随机漫步过程，其中粒子会从一个位置随机移动到另一个位置。在机器学习中，我们可以利用扩散过程将高维度数据转换到低维度空间中进行训练，进而提高模型的稳定性和泛化能力。
扩散方程（Diffusion Equation）：扩散方程是描述扩散过程的数学方程式，可以用来计算随机漫步的概率分布。在机器学习中，我们可以利用扩散方程来生成随机样本，并将其用于模型训练和测试。

Stable Diffusion需要训练好一个编码模型AutoEncoder，包括一个编码器（Encoder）和一个解码器（Decoder）

我们利用编码器对图片进行压缩，然后在潜在表示空间(Latent Space)上做diffusion，最后我们再用解码器恢复到原始像素空间即可，模型论文将这个方法称为感知压缩（Perceptual Compression）；

Stable Diffusion 的训练过程如下:

1）使用大型数据集(如 LAION-5B),训练 CLIP model,建立图像和文本的联合嵌入；

2）使用 DDPM 对齐噪声图像,得到清晰的图像；

3）使用 CLIP model 的损失来训练 DDPM,使其生成的图像与输入文本描述更匹配；

4）反复进行2和3,不断优化 DDPM,使其生成的图像质量更高,与文本描述的相关性更强；

5）得到训练好的SD模型,可以输入文本描述并生成匹配的图像。

3 Stable Diffusion面临的问题

Stable Diffusion 可以帮助用户快速、准确地生成想要的场景及图片。不过当前使用 Stable Diffusion 面临如下问题：

1）单个Pod处理请求的吞吐率有限，如果多个请求转发到同一个Pod，会导致服务端过载异常，因此需要精准的控制单个Pod 请求并发处理数。

2）GPU资源很珍贵，期望做到按需使用资源，在业务低谷及时释放GPU资源

4 功能测试

镜像构建编排：

FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04
 
ENV DEBIAN_FRONTEND noninteractive
 
RUN sed -i s/archive.ubuntu.com/mirrors.aliyun.com/g /etc/apt/sources.list && sed -i s/security.ubuntu.com/mirrors.aliyun.com/g /etc/apt/sources.list && apt-get update
 
RUN set -ex && \
 
apt install -y wget git python3 python3-venv python3-pip libglib2.0-0 ffmpeg libsm6 libxext6 && \
 
rm -rf /var/lib/apt/lists/*
 
ENV INDEX_URL https://pypi.tuna.tsinghua.edu.cn/simple
 
RUN mkdir ~/.pip && echo "[global]\nindex-url = https://pypi.tuna.tsinghua.edu.cn/simple\n[install]\ntrusted-host = https://pypi.tuna.tsinghua.edu.cn" > ~/.pip/pip.conf
 
RUN python3 -m pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
 
RUN python3 -m pip install git+https://github.com/TencentARC/GFPGAN.git@8d2447a2d918f8eba5a4a01463fd48e45126a379 --prefer-binary
 
RUN python3 -m pip install git+https://github.com/openai/CLIP.git@d50d76daa670286dd6cacf3bcd80b5e4823fc8e1 --prefer-binary
 
RUN python3 -m pip install git+https://github.com/mlfoundations/open_clip.git@bb6e834e9c70d9c27d0dc3ecedeebeaeb1ffad6b --prefer-binary
 
RUN python3 -m pip install xformers==0.0.16rc425 --prefer-binary
 
RUN python3 -m pip install pyngrok --prefer-binary
 
RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
 
RUN git clone https://github.com/Stability-AI/stablediffusion.git /stable-diffusion-webui/repositories/stable-diffusion-stability-ai
 
RUN git -C /stable-diffusion-webui/repositories/stable-diffusion-stability-ai checkout cf1d67a6fd5ea1aa600c4df58e5b47da45f6bdbf
 
RUN git clone https://github.com/CompVis/taming-transformers.git /stable-diffusion-webui/repositories/taming-transformers
 
RUN git -C /stable-diffusion-webui/repositories/taming-transformers checkout 24268930bf1dce879235a7fddd0b2355b84d7ea6
 
RUN git clone https://github.com/crowsonkb/k-diffusion.git /stable-diffusion-webui/repositories/k-diffusion
 
RUN git -C /stable-diffusion-webui/repositories/k-diffusion checkout 5b3af030dd83e0297272d861c19477735d0317ec
 
RUN git clone https://github.com/sczhou/CodeFormer.git /stable-diffusion-webui/repositories/CodeFormer
 
RUN git -C /stable-diffusion-webui/repositories/CodeFormer checkout c5b4593074ba6214284d6acd5f1719b6c5d739af
 
RUN git clone https://github.com/salesforce/BLIP.git /stable-diffusion-webui/repositories/BLIP
 
RUN git -C /stable-diffusion-webui/repositories/BLIP checkout 48211a1594f1321b00f14c9f7a5b4813144b2fb9
 
RUN python3 -m pip install -r /stable-diffusion-webui/repositories/CodeFormer/requirements.txt --prefer-binary
 
RUN python3 -m pip install -r /stable-diffusion-webui/requirements_versions.txt --prefer-binary
 
RUN set -ex && cd stable-diffusion-webui \
 
&& git clone https://gitcode.net/ranting8323/sd-webui-additional-networks.git extensions/sd-webui-additional-networks \
 
&& git clone https://gitcode.net/ranting8323/sd-webui-cutoff extensions/sd-webui-cutoff \
 
&& git clone https://ghproxy.com/https://github.com/toshiaki1729/stable-diffusion-webui-dataset-tag-editor.git extensions/stable-diffusion-webui-dataset-tag-editor \
 
&& git clone https://ghproxy.com/https://github.com/yfszzx/stable-diffusion-webui-images-browser extensions/stable-diffusion-webui-images-browser \
 
&& git clone https://gitcode.net/ranting8323/stable-diffusion-webui-wd14-tagger.git extensions/stable-diffusion-webui-wd14-tagger \
 
&& git clone https://gitcode.net/overbill1683/stable-diffusion-webui-localization-zh_Hans.git extensions/stable-diffusion-webui-localization-zh_Hans \
 
&& git clone https://gitcode.net/ranting8323/a1111-sd-webui-tagcomplete.git extensions/a1111-sd-webui-tagcomplete \
 
&& git clone https://github.com/Mikubill/sd-webui-controlnet.git extensions/sd-webui-controlnet
 
RUN python3 -m pip install -r /stable-diffusion-webui/extensions/sd-webui-controlnet/requirements.txt --prefer-binary
 
EXPOSE 7860
 
WORKDIR /stable-diffusion-webui/
 
CMD ["python3", "launch.py", "--listen", "--xformers", "--medvram", "--enable-insecure-extension-access"]

workload部署编排：

数据集下载：https://huggingface.co/

1 Stable Diffusion简介

Stable Diffusion (SD)在2022年发表，一种基于Latent Diffusion Models的新兴机器学习技术。

Stable Diffusion 目前主要用于娱乐与创意,但未来有望在更广范围内应用于设计、教育、医学等领域。

SD的主要特点有：

1. 开源：代码和模型都在 MIT 许可下开源，可以自由使用和修改；

2. 高质量：生成的图像质量比较高,细节丰富,色彩鲜艳；

3. 快速：使用 GPU 可以非常快速地生成大量高质量图像；

4. 可控：可以精细控制生成图像的具体风格、图像素材等方面；

5. 可解释：理论上可以解释模型的决策过程,这有助于避免生成有害内容；

6. 可微调：提供了通过微调来改进模型的方法，用户可以使用自己的数据集来微调模型,从而生成更适合自己需求的图像。

2 Stable Diffusion 原理

它先采用一个autoencoder将图像压缩到latent空间，然后用扩散模型来生成图像的latents，最后送入autoencoder的decoder模块就可以得到生成的图像，如下图：

SD涉及到以下两个主要概念：

扩散过程（Diffusion Process）：扩散过程是指一种随机漫步过程，其中粒子会从一个位置随机移动到另一个位置。在机器学习中，我们可以利用扩散过程将高维度数据转换到低维度空间中进行训练，进而提高模型的稳定性和泛化能力。
扩散方程（Diffusion Equation）：扩散方程是描述扩散过程的数学方程式，可以用来计算随机漫步的概率分布。在机器学习中，我们可以利用扩散方程来生成随机样本，并将其用于模型训练和测试。

Stable Diffusion需要训练好一个编码模型AutoEncoder，包括一个编码器（Encoder）和一个解码器（Decoder）

Stable Diffusion 的训练过程如下:

1）使用大型数据集(如 LAION-5B),训练 CLIP model,建立图像和文本的联合嵌入；

2）使用 DDPM 对齐噪声图像,得到清晰的图像；

3）使用 CLIP model 的损失来训练 DDPM,使其生成的图像与输入文本描述更匹配；

4）反复进行2和3,不断优化 DDPM,使其生成的图像质量更高,与文本描述的相关性更强；

5）得到训练好的SD模型,可以输入文本描述并生成匹配的图像。

3 Stable Diffusion面临的问题

Stable Diffusion 可以帮助用户快速、准确地生成想要的场景及图片。不过当前使用 Stable Diffusion 面临如下问题：

1）单个Pod处理请求的吞吐率有限，如果多个请求转发到同一个Pod，会导致服务端过载异常，因此需要精准的控制单个Pod 请求并发处理数。

2）GPU资源很珍贵，期望做到按需使用资源，在业务低谷及时释放GPU资源

4 功能测试

镜像构建编排：

FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04
 
ENV DEBIAN_FRONTEND noninteractive
 
RUN sed -i s/archive.ubuntu.com/mirrors.aliyun.com/g /etc/apt/sources.list && sed -i s/security.ubuntu.com/mirrors.aliyun.com/g /etc/apt/sources.list && apt-get update
 
RUN set -ex && \
 
apt install -y wget git python3 python3-venv python3-pip libglib2.0-0 ffmpeg libsm6 libxext6 && \
 
rm -rf /var/lib/apt/lists/*
 
ENV INDEX_URL https://pypi.tuna.tsinghua.edu.cn/simple
 
RUN mkdir ~/.pip && echo "[global]\nindex-url = https://pypi.tuna.tsinghua.edu.cn/simple\n[install]\ntrusted-host = https://pypi.tuna.tsinghua.edu.cn" > ~/.pip/pip.conf
 
RUN python3 -m pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
 
RUN python3 -m pip install git+https://github.com/TencentARC/GFPGAN.git@8d2447a2d918f8eba5a4a01463fd48e45126a379 --prefer-binary
 
RUN python3 -m pip install git+https://github.com/openai/CLIP.git@d50d76daa670286dd6cacf3bcd80b5e4823fc8e1 --prefer-binary
 
RUN python3 -m pip install git+https://github.com/mlfoundations/open_clip.git@bb6e834e9c70d9c27d0dc3ecedeebeaeb1ffad6b --prefer-binary
 
RUN python3 -m pip install xformers==0.0.16rc425 --prefer-binary
 
RUN python3 -m pip install pyngrok --prefer-binary
 
RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
 
RUN git clone https://github.com/Stability-AI/stablediffusion.git /stable-diffusion-webui/repositories/stable-diffusion-stability-ai
 
RUN git -C /stable-diffusion-webui/repositories/stable-diffusion-stability-ai checkout cf1d67a6fd5ea1aa600c4df58e5b47da45f6bdbf
 
RUN git clone https://github.com/CompVis/taming-transformers.git /stable-diffusion-webui/repositories/taming-transformers
 
RUN git -C /stable-diffusion-webui/repositories/taming-transformers checkout 24268930bf1dce879235a7fddd0b2355b84d7ea6
 
RUN git clone https://github.com/crowsonkb/k-diffusion.git /stable-diffusion-webui/repositories/k-diffusion
 
RUN git -C /stable-diffusion-webui/repositories/k-diffusion checkout 5b3af030dd83e0297272d861c19477735d0317ec
 
RUN git clone https://github.com/sczhou/CodeFormer.git /stable-diffusion-webui/repositories/CodeFormer
 
RUN git -C /stable-diffusion-webui/repositories/CodeFormer checkout c5b4593074ba6214284d6acd5f1719b6c5d739af
 
RUN git clone https://github.com/salesforce/BLIP.git /stable-diffusion-webui/repositories/BLIP
 
RUN git -C /stable-diffusion-webui/repositories/BLIP checkout 48211a1594f1321b00f14c9f7a5b4813144b2fb9
 
RUN python3 -m pip install -r /stable-diffusion-webui/repositories/CodeFormer/requirements.txt --prefer-binary
 
RUN python3 -m pip install -r /stable-diffusion-webui/requirements_versions.txt --prefer-binary
 
RUN set -ex && cd stable-diffusion-webui \
 
&& git clone https://gitcode.net/ranting8323/sd-webui-additional-networks.git extensions/sd-webui-additional-networks \
 
&& git clone https://gitcode.net/ranting8323/sd-webui-cutoff extensions/sd-webui-cutoff \
 
&& git clone https://ghproxy.com/https://github.com/toshiaki1729/stable-diffusion-webui-dataset-tag-editor.git extensions/stable-diffusion-webui-dataset-tag-editor \
 
&& git clone https://ghproxy.com/https://github.com/yfszzx/stable-diffusion-webui-images-browser extensions/stable-diffusion-webui-images-browser \
 
&& git clone https://gitcode.net/ranting8323/stable-diffusion-webui-wd14-tagger.git extensions/stable-diffusion-webui-wd14-tagger \
 
&& git clone https://gitcode.net/overbill1683/stable-diffusion-webui-localization-zh_Hans.git extensions/stable-diffusion-webui-localization-zh_Hans \
 
&& git clone https://gitcode.net/ranting8323/a1111-sd-webui-tagcomplete.git extensions/a1111-sd-webui-tagcomplete \
 
&& git clone https://github.com/Mikubill/sd-webui-controlnet.git extensions/sd-webui-controlnet
 
RUN python3 -m pip install -r /stable-diffusion-webui/extensions/sd-webui-controlnet/requirements.txt --prefer-binary
 
EXPOSE 7860
 
WORKDIR /stable-diffusion-webui/
 
CMD ["python3", "launch.py", "--listen", "--xformers", "--medvram", "--enable-insecure-extension-access"]

workload部署编排：

数据集下载：https://huggingface.co/

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

Stable Diffusion调研

1 Stable Diffusion简介

2 Stable Diffusion 原理

3 Stable Diffusion面临的问题

4 功能测试

Stable Diffusion调研

1 Stable Diffusion简介

2 Stable Diffusion 原理

3 Stable Diffusion面临的问题

4 功能测试

活动

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

Stable Diffusion调研

1 Stable Diffusion简介

2 Stable Diffusion 原理

3 Stable Diffusion面临的问题

4 功能测试

Stable Diffusion调研

1 Stable Diffusion简介

2 Stable Diffusion 原理

3 Stable Diffusion面临的问题

4 功能测试