Visual AI GUIDE

Autoregressive Image Generation

Autoregressive mufananidzo chizvarwa chinovaka mapikicha chidimbu chimwe panguva, achifanotaura chiratidzo chega chega kubva kune zvese zvakagadzirwa pamberi payo.

Overview

Autoregressive mufananidzo chizvarwa chinovaka mapikicha chidimbu chimwe panguva, achifanotaura chiratidzo chega chega kubva kune zvese zvakagadzirwa pamberi payo. Izvo zvine basa nekuti iwo anotevera-tokeni muchina anopa simba mamodheru emitauro anogona kuburitsa anowirirana, anodzoreka mifananidzo.

Autoregressive Image Generation ndeyekombuta-kuona mafambiro anodudzira kana kuburitsa midhiya yekuona yekuongorora, mashandiro, uye kugadzira.

Deep Dive

Autoregressive image generation inobata mufananidzo senhevedzano uye inofanotaura chinhu nechinhu, apo chimwe nechimwe chinhu chitsva chakarongedzwa pane zvese zvakapfuura. Basa rekutanga sePixelRNN uye PixelCNN akafanotaura mifananidzo imwe mbishi pixel panguva, kuongorora mutsara nemutsara, iyo yainonoka asi nedzidziso yakachena. Masisitimu emazuva ano panzvimbo pekutanga adzvanya chifananidzo mugidhi redhisiki tokeni uchishandisa VQ-VAE-maitiro encoder, ipapo Transformer inofanotaura iwo ma tokens kuruboshwe-kurudyi. OpenAI's DALL-E 1 uye Google's Parti vakatevera resipi iyi, vachigadzira ma tokeni emifananidzo akaiswa pamashoko echimbichimbi asati adhikodha kuti adzokere kumapikiseli. Mukana wakakura ndiwo chaiwo mukana wekuenzanisira uye chivakwa chakabatana chakagovaniswa nemutauro. Mari yacho inoteedzana, inononoka sampling.

Technical Insight

Iyo modhi inogadzira mukana wekubatana kwese tokens kuita chigadzirwa chemamiriro: p(x) = chigadzirwa chep(x_i chakapihwa x_1...x_{i-1}). Shanduko ine causal (yakafukidzwa) kutarisisa inosimbisa kuti chinzvimbo chega chega chinongoona ma tokeni ekutanga. Panguva yekudzidziswa inofanotaura chiratidzo chega chega ichitevedzana vachishandisa kumanikidza mudzidzisi, asi pakufungidzira inofanira kuenzanisa tokeni imwe panguva, ichidyisa imwe neimwe mukati. Bhuku rekodhi rakadzidzwa rinodhinda rinodzokera kumapeche emufananidzo, ayo decoder inokwidza sampuli kuita mapixel ekupedzisira.

Mastering Autoregressive Image Generation

Autoregressive mufananidzo chizvarwa chinovaka mapikicha chidimbu chimwe panguva, achifanotaura chiratidzo chega chega kubva kune zvese zvakagadzirwa pamberi payo. Izvo zvine basa nekuti iwo anotevera-tokeni muchina anopa simba mamodheru emitauro anogona kuburitsa anowirirana, anodzoreka mifananidzo. Autoregressive Image Generation ndeyekombuta-kuona mafambiro anodudzira kana kuburitsa midhiya yekuona yekuongorora, mashandiro, uye kugadzira. Kuti uvake kunzwisisa kwakadzama, bata Autoregressive Image Generation semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvinodikanwa, jekesa fungidziro, uye patsanura izvo zvinogona kuitwa nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.

Mukuita, zvikwata zvakasimba zvinoshandisa Autoregressive Image Generation chiyero chechokwadi nezvinhu zvekushanda semhando yedata, kusiyana kwemwenje, uye kuenderana kwemazita. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.

Visual AI inogona kuita otomatiki yekuongorora, yekuona, uye yekumaka mabasa pachiyero. Panguva imwecheteyo, kodzero dzeMufananidzo uye kubvumirwa kunogona kuve njodzi dzepamutemo kana hunhu husina kujeka. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.

Strategic Impact

Visual AI inogona kuita otomatiki yekuongorora, yekuona, uye yekumaka mabasa pachiyero.

Visual AI inogona kuita otomatiki yekuongorora, yekuona, uye yekumaka mabasa pachiyero. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Zvikwata zvekugadzira zvinogona prototype pfungwa nekukurumidza nekudzokororwa kwemaoko mashoma.

Zvikwata zvekugadzira zvinogona prototype pfungwa nekukurumidza nekudzokororwa kwemaoko mashoma. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Mashandisirwo anogona kushandisa masaini emifananidzo nemavhidhiyo ayo aimbove akaoma kugadzirisa.

Mashandisirwo anogona kushandisa masaini emifananidzo nemavhidhiyo ayo aimbove akaoma kugadzirisa. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Ramangwana reAutoregressive Image Generation

Speed ​​​​ndiyo nzvimbo yepakati yekurwira. Matekiniki akadai seanofanana uye akavharidzirwa-chiratidzo decoding (MaskGIT, Muse) anoburitsa tokeni dzakawanda panguva imwe chete, uye yekufungidzira decoding yakakweretwa kubva kumhando dzemitauro iri kuchinjirwa kumifananidzo. Vatsvagiri vari kubatanidza zvinyorwa uye zvidhori zvemufananidzo mune imwechete autoregressive musana kuitira kuti modhi imwe igone kuverenga nekudhirowa, sezvinoonekwa mumultimodal system. Tarisira pfungwa dzeautoregressive uye dzekuparadzira kuti dzirambe dzichisangana, nemhando dzakasanganiswa dzinobata kudzoreka kwetokens uye kunaka kwekupararira.

Real-World Implementation

DALL-E 1 yakagadzira mapikicha nekufembera otomatiki gidhi yemadhisiki emifananidzo tokeni kubva pane zvinyorwa zvinyorwa.

Google's Parti yakayera autoregressive text-to-image Transformer kusvika 20 bhiriyoni paramita kune zvine hudzamu, zvekukurumidza-zvakatendeka zviratidziro.

PixelCNN nePixelRNN vakaratidza chizvarwa chepixel-ne-pixel uye zvichiri kushandiswa senzira dzekudzidzisa dzezvingabvira-zvakavakirwa modhi.

MaskGIT neMuse vanoshandisa parallel masked-token decoding kumhanyisa token-based image synthesis uchichengeta autoregressive-style kudzidziswa.

Maitiro Ekuita

Autoregressive Image Generation mukuita

DALL-E 1 yakagadzira mapikicha nekufembera otomatiki gidhi yemadhisiki emifananidzo tokeni kubva pane zvinyorwa zvinyorwa.

DALL-E 1 inogadzirwa mifananidzo neautoregressive kufungidzira gidhi yedhisiki yemifananidzo tokeni kubva kune chinyorwa chinyorwa Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa nemitengo yekukanganisa nekufamba kwenguva.

Autoregressive Image Generation mukuita

Google's Parti yakayera autoregressive text-to-image Transformer kusvika 20 bhiriyoni paramita kune zvine hudzamu, zvekukurumidza-zvakatendeka zviratidziro.

Google's Parti yakayera autoregressive text-to-image Transformer kusvika 20 bhiriyoni paramita kune yakadzama, yekukurumidza-akatendeka zviitiko Matimu anowanzo kuwana mhedzisiro iri nani kana vachinge vatsanangura zvemhando yepamusoro kumberi, kuchengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuongorora zvese zviri zviviri mutengo wekubudirira uye nekukanganisa kwekufamba kwenguva.

Autoregressive Image Generation mukuita

PixelCNN nePixelRNN vakaratidza chizvarwa chepixel-ne-pixel uye zvichiri kushandiswa senzira dzekudzidzisa dzezvingabvira-zvakavakirwa modhi.

PixelCNN nePixelRNN vakaratidza chizvarwa chepixel-ne-pixel uye vachiri kushandiswa senzira dzekudzidzisa dzezvingabvira-zvakavakirwa modhi Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura hunhu hwepamberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

Autoregressive Image Generation mukuita

MaskGIT neMuse vanoshandisa parallel masked-token decoding kumhanyisa token-based image synthesis uchichengeta autoregressive-style kudzidziswa.

MaskGIT neMuse vanoshandisa parallel masked-token decoding kumhanyisa token-based image synthesis uchichengeta autoregressive-maitiro ekudzidzisa Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura mhando yepamusoro kumberi, chengeta nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

Njodzi & Guardrails

!

Kodzero dzemifananidzo uye kubvumirwa kunogona kuve njodzi dzepamutemo kana provenance isina kujeka.

!

Kuita kwemuenzaniso kunogona kusiyanisa kupenya, huwandu hwevanhu, uye nharaunda.

!

Manyepo enhema anogona kusacherechedzwa kunze kwekunge zvikumbaridzo zvekuvimba zvikatariswa.

Implementation Roadmap

1

Tsanangura maitiro ekugamuchirwa echokwadi, kurangarira, uye mutengo wekukanganisa.

Tsanangura maitiro ekugamuchirwa echokwadi, kurangarira, uye mutengo wekukanganisa. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

2

Edzai nedata rinoenderana nemamiriro chaiwo ekugadzira.

Edzai nedata rinoenderana nemamiriro chaiwo ekugadzira. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

3

Wedzera ongororo yemunhu kune yakaderera-kusavimbika kana yakakwirira-inokanganisa kufanotaura.

Wedzera ongororo yemunhu kune yakaderera-kusavimbika kana yakakwirira-inokanganisa kufanotaura. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

4

Tevera modhi kudonha uye simbisa mushure mekuchinja kwekamera kana dataset.

Tevera modhi kudonha uye simbisa mushure mekuchinja kwekamera kana dataset. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

Ramba Uchiongorora