High-Resolution Image Synthesis with Latent Diffusion Models

728x90

High-Resolution Image Synthesis with Latent Diffusion Models 논문 요약

논문 제목(title)

High-Resolution Image Synthesis with Latent Diffusion Models
(”잠재확산모델을 이용한 고해상도 영상합성법”)

도표(figures)

초록(abstract)

Diffusion Model(DM)은 이미지 생성 분야에서 재학습의 필요 없이 훌륭한 성능을 보이고 있다. 하지만, DM은 일반적으로 픽셀 레벨에서 동작되기 때문에 많은 리소스가 필요한 문제가 있다.

pretrained auto encoder의 latent space에 DM을 적용하는 방법을 통해 이러한 제한 사항을 해결하는 방법을 제시한다. (Latent Space 학습 장점 : 복잡성 감소, 디테일 보존)

cross-attention lyaer를 도입해서 확산 모델을 텍스트와 같은 일반적인 컨디셔닝 입력을 위한 강력하고 유연한 생성기로 바꾼다. 또한 고해상도 합성, 인페이팅 등 다양한 작업이 가능하다.

결론적으로 pixel based DM 모델보다 계산 요구 사항도 많이 줄고, 다양한 생성 테스크(inpainting, class-conditional image synthesis, text-to-image, unconditional image generation, super-resolution)에서 경쟁력 있는 퍼포먼스를 보인다.

도입(introduction)

DM은 리소스 문제가 존재하는데, 초기 denoising 단계에서 이를 해결하는 것을 목표로 하지만 Pixel Level에서 모델을 학습하고 평가하는 과정이 수행되기 때문에 여전히 계산적으로 제한을 받는 상황이다.

논문에서는 DM의 성능을 손상시키지 않으면서 많은 리소스 소비를 줄이기 위해서는 훈련과 샘플링에 대한 계산 복잡성을 줄이기 위한 방법이 필요함을 언급했다.

확산 모델의 문제점

막대한 계산 비용 필요
많은 순차적 단계 때문에 훈련된 모델을 평가하는 데도 많은 시간이 소모됨

데이터 공간과 지각적으로 동일한 저 차원 표현 공간(더 낮은 차원의 매니폴드에서도 충분히 이미지의 정보를 담을 수 있음)을 제공하는 autoencoder를 훈련하여 더 나은 학습된 잠재 공간에서 확산 모델을 훈련하며 이를 통한 결과 모델 클래스를 LDM(Latent Diffusion Models)이라고 부른다.

이 접근 방식은 장점은 인코딩 단계를 한 번만 훈련하면 여러 다른 확산 모델 훈련에 재사용하거나 완전히 다른 작업에도 사용할 수 있다는 것이다.

결론

제안하는 방법은 고차원 데이터에 훌륭하게 확장가능하여 다음을 가능하게 한다.(b) megapixel 이미지의 high-resolution synthesis에 효율적으로 적용
(a) 이전의 연구들보다 더 충실하고 섬세한 reconstruction을 제공하는 compression level
계산 비용을 낮추면서 여러 Task 및 데이터 셋에서 경쟁력이 있는 성능을 달성했고, Pixel Level Based DM에 비해 추론 비용 또한 크게 절감했다.
Encoder-Decoder와 Score based Architecture를 동시에 학습하는 제안하는 방식은 충실한 재구성을 보장하고 latent space의 정규화가 거의 필요 없다.
Super Resoltuion, Inpainting, Semantic Synthesis과 같이 섬세한 조건의 작업에 대해 Stable Diffusion 모델은 컨볼루션 방식을 적용할 수 있으며 10242 pixel의 이미지에 일관되게 랜더링 할 수 있다.
Cross Attention 기반으로 general purpose conditioning 메커니즘을 설계하여 multi modal 학습이 가능하게 한다.

요약: LDM은 압축 공간에서 작업하기 때문에 메가픽셀 이미지와 같은 고해상도 합성 가능하다.

High-Resolution Image Synthesis with Latent Diffusion Models 생각 정리

저자가 뭘 해내고 싶어 했는가?

고해상도 이미지 합성을 위해 perceptuality가 동등하지만, 계산적으로 더 안정적인 space를 찾는 것
계산 비용을 크게 낮추며 다양한 작업에서 경쟁력 있는 성능 달성
충실한 재구성을 보장하고 잠재 공간의 정규화를 거의 필요로 하지 않음

이 연구의 접근에서 중요한 요소는 무엇인가?

다양한 입력 양식의 attention based model을 학습하는데 효과적인 corss attention mechanism을 소개
범용적인 auto encoding 단계를 한 번만 훈련하면 되어서 여러 DM 학습 시에 재사용하거나 완전히 다른 task에 적용할 수 있음

당신(논문독자)은 스스로 이 논문을 이용할 수 있는가?

고해상도 이미지 처리, 배경합성, 텍스트 투 이미지 등 다양하게 활용할 수 있다.
GAN과 LDM을 합쳐서 계산 복잡도는 줄이고 성능은 높인 모델을 만들 수 있다.

당신이 참고하고 싶은 다른 레퍼런스에는 어떤 것이 있는가?

[Diffusion-GAN: Training GANs with Diffusion]

https://arxiv.org/abs/1306.1091

Deep Generative Stochastic Networks Trainable by Backprop

We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distr

arxiv.org

[다양한 Diffusion 모델 논문 모음]

https://paperswithcode.com/method/diffusion

Papers with Code - Diffusion Explained

Diffusion models generate samples by gradually removing noise from a signal, and their training objective can be expressed as a reweighted variational lower-bound (https://arxiv.org/abs/2006.11239).

paperswithcode.com

논문 github

https://github.com/CompVis/latent-diffusion

GitHub - CompVis/latent-diffusion: High-Resolution Image Synthesis with Latent Diffusion Models

High-Resolution Image Synthesis with Latent Diffusion Models - GitHub - CompVis/latent-diffusion: High-Resolution Image Synthesis with Latent Diffusion Models

github.com

참고 블로그

https://kimjy99.github.io/%EB%85% BC% EB% AC% B8% EB% A6% AC% EB% B7% B0/ldm/

[논문리뷰] High-Resolution Image Synthesis with Latent Diffusion Models (Stable Diffusion)

Stable Diffusion 논문 리뷰

kimjy99.github.io

https://velog.io/@yeonheedong/RUS-High-Resolution-Image-Synthesis-with-Latent-Diffusion-Models

[RUS] High-Resolution Image Synthesis with Latent Diffusion Models

RUS about paper Stable Diffusion에 대해 알아보던 도중, [High-Resolution Image Synthesis with Latent Diffusion Models] 논문 리뷰를 간결하게 잘 해놓은 영상이 있어 해당 내용을 공부할 겸 속기한

velog.io

https://dlaiml.tistory.com/entry/LDM-High-Resolution-Image-Synthesis-with-Latent-Diffusion-Model s

LDM: High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer, (2021.12) [Ludwig Maximilian University of Munich & IWR, Heidelberg University, Runway ML] (이전 Diffusion Models paper review(DDPM, DDIM, Improved-DDPM 등)에서 다루었

dlaiml.tistory.com

저작자표시 (새창열림)

'AI > 논문 읽어보기' 카테고리의 다른 글

RAG의 역사 논문 (3) End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering Review (0)	2024.11.07
RAG의 역사 논문 (2) Retrieval-Augmented Language Model Pre-Training (REALM) (1)	2024.10.10
RAG의 역사 논문 (1) - Dense Passage Retrieval for Open-Domain Question Answering )DPR (0)	2024.10.08
Generative Adversarial Nets 논문 리뷰 (0)	2023.10.26
Attention Is All You Need 논문 리뷰 (1)	2023.10.20

머신로그

High-Resolution Image Synthesis with Latent Diffusion Models

High-Resolution Image Synthesis with Latent Diffusion Models 논문 요약

High-Resolution Image Synthesis with Latent Diffusion Models 생각 정리

논문 github

'AI > 논문 읽어보기' 카테고리의 다른 글

티스토리툴바

High-Resolution Image Synthesis with Latent Diffusion Models

High-Resolution Image Synthesis with Latent Diffusion Models 논문 요약

High-Resolution Image Synthesis with Latent Diffusion Models 생각 정리

논문 github

'AI > 논문 읽어보기' 카테고리의 다른 글

관련글

티스토리툴바