VQ-VAE (Neural Discrete Representation Learning)

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

윤제로의 제로베이스

VQ-VAE (Neural Discrete Representation Learning) 본문

Self Paper-Seminar/VAE

VQ-VAE (Neural Discrete Representation Learning)

윤_제로 2022. 10. 12. 23:11

VQ-VAE

VQ-VAE의 특징을 간략히 말하면

1) 이산 codes를 출력함.

2) Proior이 정적이 아닌 학습이 가능함.

추가적으로 설명하자면 VQ-VAE는 이산 표현을 다룬다. VQ(Vector Quantization)을 사용하면서 posterior과 prior distribution은 categorical하며, 이때 sampling 된 sample은 embedding table을 indxing 한다.

이 embeddings 가 decoder에 들어가게 된다.

Vector Quantization는 딕셔너리 형태로 카테고리 매핑을 한다.

Vector Quantization은 Kmeans clustering을 사용하여 codebook을 만드는 과정이다.

이 code book을 이용하여 데이터를 압축하는 것이다.

Discrete Latent Variables

embedding e는 이산표현으로 이루어지며 이것을 code book이라 칭한다.

K : 이산 표현 공간의 크기 (K-way categorical)

D : embedding vector e_i

x는 encoder의 input으로 들어가 z_e(x)를 output으로 내뱉는다.

여기서 z는 embedding space e 에서 가장 가까운 embedding vector를 찾는다.

수식 1

여기서 q(z=k|x)는 deterministic하며 z에 대해 균등 prior를 정의한다. 이 때문에 KL-Divergence를 상수로 얻을 수 있다.

수식 2

zq(x)=ek,wherek=argminj∥ze(x)−ej∥2(2)

z_q(x)는 embedding 중 가장 가까운 우너소를 찾고 discretisation bottle neck으로 전달한다.

Learning

여기서 수식 2는 gradient가 없다.

이를 straight through estimator와 비슷하게 근사가 가능하다.

decoder input z_q(x)를 encoder output으로 gradient를 복사한다.

Forward로는 z_q(x)가 전달되며 Backward로는 gadient가 encoder에 그대로 복사되는 것이다.

수식 3

첫번째 Reconsruction Loss의 경우 decoder와 encoder 모두 최적화 시키는 항이다.

두번재 code book Loss에서는 embedding e_i는 gradient가 없다보니 VQ를 사용한다. VQ objective는 각 e를 encdoer의 출력 z_e(x)로 이동하게 만든다.

embedding e_i는 주로 벡터의 형태로 나타나지만 실제로는 2차원 이상의 형태를 보인다. 총 K개를 가지고 있으며 K-means와 Commitment loss 계산을 할 땐 평균을 내서 사용한다.

세번째 Commitment Loss에서는 embeding space가 무한대로 가다보니 e_i가 학습이 잘 안되는 경우를 고려하여 추가되었다. Beta를 통해 commitment loss의 중요도를 조절할 수 있다.

위 수식에서 sg의 경우에 Stop Gradient를 의미한다.

수식 4

decoder p(x|z)는 MAP 추론을 통해 z=z_q(x)로 학습한다.

이 때문에 만약 z!=z_q(x)라면 decoder는 그 어떠한 distribution도 구하지 않아도 되게 된다.

MAP(Maximum A Posterior estimation)

수식 5

MAP방식이란 주어진 관측 결과와 사전 지식을 결합하여 최적의 모수를 찾는 방법이다.

Priors

Discrete latents p(z)에 대한 prior distribution = categorical distribution

feature map 안에서 다른 z에 의존하여 autogressive하게 만들어질 수 있음.

VQ-VAE의 경우 학습하는 동안 prior를 상수로 균등하게 유지한다.

'Self Paper-Seminar > VAE' 카테고리의 다른 글

Intro-VAE (0)	2022.10.25
InfoVAE: Information Maximizing Variational Autoencoders (0)	2022.10.19
Beta-VAE (1)	2022.10.05
Conditional Variational AutoEncoder: CVAE (0)	2022.09.28
Variational Autoencoders: VAE (0)	2022.09.21

'Self Paper-Seminar/VAE' Related Articles

윤제로의 제로베이스

VQ-VAE (Neural Discrete Representation Learning) 본문

VQ-VAE (Neural Discrete Representation Learning)

VQ-VAE

Discrete Latent Variables

Learning

Priors

'Self Paper-Seminar > VAE' 카테고리의 다른 글

티스토리툴바