Overview
VQGAN inodzvanya mapikicha mugiredhi yemadhisikiti tokens yakatorwa kubva kubhuku rekodhi rakadzidzwa, ichirega shanduko inogadzira mifananidzo nenzira imwecheteyo mamodheru emitauro kugadzira zvinyorwa.
VQGAN neCodebook Image Synthesis ndeyekombuta-kuona mafambiro anodudzira kana kugadzira midhiya yekuona yekuongorora, mashandiro, uye kugadzira.
Deep Dive
VQGAN, yakaunzwa mu2021 bepa 'Taming Transformers for High-Resolution Image Synthesis,' inosanganisa vector-quantized autoencoder (VQVAE) ine mhandu uye yekuziva kudzidziswa. Iyo encoder inomepu chifananidzo kune diki grid yemaficha mavector; vhekita yega yega inonamirwa kune iri padhuze yekupinda mubhuku rekodhi rakadzidzwa re, toti, 1024 discrete macode, kushandura chifananidzo kuita nhevedzano yetiger tokens. Decoder inovakazve mufananidzo kubva kune izvo tokeni, yakadzidziswa neGAN kusarura uye kurasikirwa kwemaonero kuitira kuti kuvaka patsva kutaridzike kwakapinza pane kusajeka. Nekuti mifananidzo ikozvino yave discrete tokeni kutevedzana, iyo autoregressive transformer inogona kuitevedzera semutauro, ichifanotaura tokens imwe neimwe. VQGAN ine mukurumbira maturusi ekutanga mameseji-kune-mufananidzo maturusi kana akabatanidzwa neCLIP kutungamira.
Technical Insight
Iyo yakakosha kushanda ndeye vector quantization: inoenderera encoder inobuda inotsiviwa nevari pedyo codebook vectors, ine 'yakatwasuka-kuburikidza' gradient estimator kuitira kuti encoder irambe ichidzidza kunyangwe isingasiyanise kutarisa. Kuwedzera chigamba-based GAN discriminator pamusoro peiyo autoencoder ndiko kunoita kuti VQGAN ishandise diki gidhi rechiratidzo (semu.
Mastering VQGAN uye Codebook Image Synthesis
VQGAN inodzvanya mapikicha mugiredhi yemadhisikiti tokens yakatorwa kubva kubhuku rekodhi rakadzidzwa, ichirega shanduko inogadzira mifananidzo nenzira imwecheteyo mamodheru emitauro kugadzira zvinyorwa. VQGAN neCodebook Image Synthesis ndeyekombuta-kuona mafambiro anodudzira kana kugadzira midhiya yekuona yekuongorora, mashandiro, uye kugadzira. Kuvaka kunzwisisa kwakadzama, tora VQGAN uye Codebook Image Synthesis semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvinodikanwa, kujekesa fungidziro, uye patsanura zvinogona kuitwa nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.
Mukuita, zvikwata zvakasimba zvinoshandisa VQGAN uye Codebook Image Synthesis chiyero chechokwadi nemashandiro anoita semhando yedata, kusiyana kwemwenje, uye kuenderana kwemazita. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.
Visual AI inogona kuita otomatiki yekuongorora, yekuona, uye yekumaka mabasa pachiyero. Panguva imwecheteyo, kodzero dzeMufananidzo uye kubvumirwa kunogona kuve njodzi dzepamutemo kana hunhu husina kujeka. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.
Strategic Impact
Visual AI inogona kuita otomatiki yekuongorora, yekuona, uye yekumaka mabasa pachiyero.
Visual AI inogona kuita otomatiki yekuongorora, yekuona, uye yekumaka mabasa pachiyero. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Zvikwata zvekugadzira zvinogona prototype pfungwa nekukurumidza nekudzokororwa kwemaoko mashoma.
Zvikwata zvekugadzira zvinogona prototype pfungwa nekukurumidza nekudzokororwa kwemaoko mashoma. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Mashandisirwo anogona kushandisa masaini emifananidzo nemavhidhiyo ayo aimbove akaoma kugadzirisa.
Mashandisirwo anogona kushandisa masaini emifananidzo nemavhidhiyo ayo aimbove akaoma kugadzirisa. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Real-World Implementation
Kuisa foto mu 16x16 grid yecodebook tokens kuitira kuti transformer igone kutevedzera uye kuigadzira patsva.
Kubatanidza VQGAN neCLIP kutungamira kugadzira iyo surreal 'VQGAN + CLIP' AI art iyo yakaenda kuhutachiona muna 2021.
Kudzvanya mapikicha kuita compact discrete makodhi ekuchengetera kwakanaka kana kudzika kwekudzidzira kudzidziswa
Kushanda sechiratidzo chechiratidzo mukati mehombe-yakavakirwa majenareta seMaskGIT uye multimodal transformers.
Maitiro Ekuita
VQGAN uye Codebook Image Synthesis mukuita
Kuisa foto mu 16x16 grid yecodebook tokens kuitira kuti transformer igone kutevedzera uye kuigadzira patsva.
Kuisa pikicha muiyo 16x16 gidhi yecodebook tokens kuitira kuti transformer igone kuenzanisira uye kuigadzira patsva Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa nemitengo yekukanganisa nekufamba kwenguva.
VQGAN uye Codebook Image Synthesis mukuita
Kubatanidza VQGAN neCLIP kutungamira kugadzira iyo surreal 'VQGAN + CLIP' AI art iyo yakaenda kuhutachiona muna 2021.
Kubatanidza VQGAN neCLIP nhungamiro yekugadzira iyo surreal 'VQGAN + CLIP' AI art iyo yakaenda kuhutachiona muna 2021 Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura mhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
VQGAN uye Codebook Image Synthesis mukuita
Kudzvanya mapikicha kuita compact discrete makodhi ekuchengetera kwakanaka kana kudzika kwekudzidzira kudzidziswa.
Kudzvanya mapikicha kuita compact discrete makodhi ekuchengetera kwakanaka kana kudzika kudzika kwekudzidzira kudzidziswa Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
VQGAN uye Codebook Image Synthesis mukuita
Kushanda semufananidzo tokenizer mukati mehombe-yakavakirwa majenareta seMaskGIT uye multimodal transformers.
Kushanda sechiratidziro chemufananidzo mukati mahombe-akavakirwa majenareta seMaskGIT uye multimodal transformers Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
Njodzi & Guardrails
Kodzero dzemifananidzo uye kubvumirwa kunogona kuve njodzi dzepamutemo kana provenance isina kujeka.
Kuita kwemuenzaniso kunogona kusiyanisa kupenya, huwandu hwevanhu, uye nharaunda.
Manyepo enhema anogona kusacherechedzwa kunze kwekunge zvikumbaridzo zvekuvimba zvikatariswa.
Implementation Roadmap
Tsanangura maitiro ekugamuchirwa echokwadi, kurangarira, uye mutengo wekukanganisa.
Tsanangura maitiro ekugamuchirwa echokwadi, kurangarira, uye mutengo wekukanganisa. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Edzai nedata rinoenderana nemamiriro chaiwo ekugadzira.
Edzai nedata rinoenderana nemamiriro chaiwo ekugadzira. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Wedzera ongororo yemunhu kune yakaderera-kusavimbika kana yakakwirira-inokanganisa kufanotaura.
Wedzera ongororo yemunhu kune yakaderera-kusavimbika kana yakakwirira-inokanganisa kufanotaura. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Tevera modhi kudonha uye simbisa mushure mekuchinja kwekamera kana dataset.
Tevera modhi kudonha uye simbisa mushure mekuchinja kwekamera kana dataset. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.