Uhlolojikelele
Isizukulwane sesithombe esizenzakalelayo sakha izithombe zibe ucezu olulodwa ngesikhathi, zibikezela ithokheni ngayinye kuyo yonke into ekhiqizwe ngaphambi kwayo. Kubalulekile ngoba imishini efanayo elandelayo yamamodeli olimi anamandla angakhiqiza izithombe ezihambisanayo, ezilawulekayo.
I-Autoregressive Image Generation ingeyokugeleza komsebenzi okubonwa ngekhompyutha okuhumusha noma okukhiqiza imidiya ebonakalayo ukuze ihlaziywe, isebenze, futhi isungule.
I-Deep Dive
Isizukulwane sesithombe esizenzakalelayo siphatha isithombe njengokulandelana futhi sisibikezele i-elementi ngayinye, lapho i-elementi entsha ngayinye ibekwe esimweni kuzo zonke ezidlule. Umsebenzi wangaphambi kwesikhathi onjenge-PixelRNN ne-PixelCNN ubikezele izithombe ngephikseli eyodwa eluhlaza ngesikhathi, iskena umugqa ngomugqa, obekunensa kodwa kuhlanzekile ngokombono. Amasistimu wesimanje esikhundleni salokho aqale acindezele isithombe kugridi yamathokheni ahlukene kusetshenziswa isishumeki sesitayela se-VQ-VAE, bese i-Transformer ibikezela lawo mathokheni ukusuka kwesokunxele kuye kwesokudla. I-DALL-E 1 ka-OpenAI kanye ne-Google's Parti balandele le recipe, bakhiqiza amathokheni esithombe abekwe emyalezweni wombhalo ngaphambi kokuwasusa amakhodi awabuyisele kumaphikseli. Inzuzo enkulu wukumodela okungenzeka kanye nezakhiwo ezihlanganisiwe ezabiwe nolimi. Izindleko ziyalandelana, zithatha isampula kancane.
I-Technical Insight
Imodeli ihlanganisa amathuba ahlangene awo wonke amathokheni abe umkhiqizo wemibandela: p(x) = umkhiqizo we-p(x_i onikezwe x_1...x_{i-1}). I-Transformer ene-causal (masked) ukunaka iphoqelela ukuthi indawo ngayinye ibona amathokheni angaphambili kuphela. Ngesikhathi sokuqeqeshwa ibikezela yonke ithokheni ngokuhambisana kusetshenziswa ukuphoqelela kukathisha, kodwa lapho kucatshangwa khona kufanele isampula ithokheni eyodwa ngesikhathi, inikeze ithokheni ngayinye. I-codebook efundiwe ibeka amamephu amathokheni abuyela kumapeshi esithombe, idikhoda eliwathathayo libe ngamaphikseli wokugcina.
Ukufundisa i-Autoregressive Image Generation
Isizukulwane sesithombe esizenzakalelayo sakha izithombe zibe ucezu olulodwa ngesikhathi, zibikezela ithokheni ngayinye kuyo yonke into ekhiqizwe ngaphambi kwayo. Kubalulekile ngoba imishini efanayo elandelayo yamamodeli olimi anamandla angakhiqiza izithombe ezihambisanayo, ezilawulekayo. I-Autoregressive Image Generation ingeyokugeleza komsebenzi okubonwa ngekhompyutha okuhumusha noma okukhiqiza imidiya ebonakalayo ukuze ihlaziywe, isebenze, futhi isungule. Ukuze wakhe ukuqonda okujulile, phatha i-Autoregressive Image Generation njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela oyifunayo, ucacise ukucabanga, futhi uhlukanise lokho isistimu engakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.
Empeleni, amaqembu aqinile asebenzisa ukunemba kwebhalansi ye-Autoregressive Image Generation namaqiniso okusebenza njengekhwalithi yedatha, ukuhluka kokukhanya, nokuvumelana kwamalebula. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.
I-Visual AI ingakwazi ukuhlola, ukutholwa, nokumaka imisebenzi esikalini. Ngesikhathi esifanayo, amalungelo ezithombe kanye nemvume kungaba ubungozi bomthetho uma ukutholakala kungacacile. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.
I-Strategic Impact
I-Visual AI ingakwazi ukuhlola, ukutholwa, nokumaka imisebenzi esikalini.
I-Visual AI ingakwazi ukuhlola, ukutholwa, nokumaka imisebenzi esikalini. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Amathimba aqanjiwe angakwazi ukulinganisa imiqondo ngokushesha ngezibuyekezo ezimbalwa ezenziwa mathupha.
Amathimba aqanjiwe angakwazi ukulinganisa imiqondo ngokushesha ngezibuyekezo ezimbalwa ezenziwa mathupha. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Imisebenzi ingasebenzisa amasiginali wesithombe nawevidiyo obekunzima ukuwenza ngaphambilini.
Imisebenzi ingasebenzisa amasiginali wesithombe nawevidiyo obekunzima ukuwenza ngaphambilini. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Ukuqaliswa Komhlaba Wangempela
I-DALL-E 1 ikhiqize izithombe ngokubikezela ngokuzenzakalelayo igridi yamathokheni esithombe ahlukene asuka kumagama-ncazo wombhalo.
Google's Parti sikale i-autoregressive text-to-image Transformer yaba amapharamitha ayizigidi eziyizinkulungwane ezingama-20 ukuze uthole izigcawu ezinemininingwane, ezithembekile.
I-PixelCNN ne-PixelRNN zibonise isizukulwane se-pixel-by-pixel eluhlaza futhi zisasetshenziswa njengezisekelo zokufundisa zamamodeli asekelwe okungenzeka.
I-MaskGIT kanye ne-Muse zisebenzisa i-parallel masked-token decoding ukusheshisa ukuhlanganiswa kwesithombe okususelwe kumathokheni kuyilapho kugcinwa ukuqeqeshwa kwesitayela esizenzakalelayo.
Amaphethini Okusebenzisa
Autoregressive Image Generation in practice
I-DALL-E 1 ikhiqize izithombe ngokubikezela ngokuzenzakalelayo igridi yamathokheni esithombe ahlukene asuka kumagama-ncazo wombhalo.
I-DALL-E 1 ekhiqizwe izithombe ngokubikezela ngokuzenzakalelayo igridi yamathokheni esithombe ahlukahlukene kusukela kumagama-ncazo wombhalo Amaqembu ngokuvamile athola imiphumela engcono uma echaza imikhawulo yekhwalithi ngaphambili, egcina indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelela kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
Autoregressive Image Generation in practice
Google's Parti sikale i-autoregressive text-to-image Transformer yaba amapharamitha ayizigidi eziyizinkulungwane ezingama-20 ukuze uthole izigcawu ezinemininingwane, ezithembekile.
Google's Parti sikale i-autoregressive text-to- image Transformer kuya kumapharamitha ayizigidi eziyizinkulungwane ezingama-20 ukuze uthole izigcawu ezinemininingwane, ezithembekile Amaqembu ngokuvamile athola imiphumela engcono uma echaza ikhwalithi ephezulu ngaphambili, agcina indlela yokukhuphuka komuntu yamakesi aphambili, futhi alandelele kokubili izinzuzo zokukhiqiza ngokuhamba kwesikhathi futhi alandelele kokubili izinzuzo zokukhiqiza ngokuhamba kwesikhathi.
Autoregressive Image Generation in practice
I-PixelCNN ne-PixelRNN zibonise isizukulwane se-pixel-by-pixel eluhlaza futhi zisasetshenziswa njengezisekelo zokufundisa zamamodeli asekelwe okungenzeka.
I-PixelCNN ne-PixelRNN zibonise isizukulwane se-pixel-by-pixel eluhlaza futhi zisasetshenziswa njengezisekelo zokufundisa zamamodeli asekelwe okungenzeka Amaqembu ngokuvamile athola imiphumela engcono uma echaza imingcele yekhwalithi ngaphambili, egcina indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
Autoregressive Image Generation in practice
I-MaskGIT kanye ne-Muse zisebenzisa i-parallel masked-token decoding ukusheshisa ukuhlanganiswa kwesithombe okususelwe kumathokheni kuyilapho kugcinwa ukuqeqeshwa kwesitayela esizenzakalelayo.
I-MaskGIT kanye ne-Muse zisebenzisa ukuqoshwa kwethokheni elifihlekile elihambisanayo ukuze kusheshiswe ukuhlanganiswa kwezithombe ezisuselwe kumathokheni kuyilapho kugcinwa ukuqeqeshwa kwesitayela esizenzakalelayo Amaqembu ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, agcine indlela yokukhuphuka komuntu ngamacala asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
Izingozi & Guardrails
Amalungelo ezithombe kanye nemvume kungaba ubungozi bezomthetho uma ukuvela kungacacile.
Ukusebenza kwemodeli kungahluka kukho konke ukukhanya, izibalo zabantu, kanye nezindawo.
Okuhle okungelona iqiniso kungase kungabonakali ngaphandle uma izinga lokuzethemba liqashelwa.
Ukuqalisa Umhlahlandlela
Chaza indlela yokwamukela yokunemba, ukukhumbula, nezindleko zamaphutha.
Chaza indlela yokwamukela yokunemba, ukukhumbula, nezindleko zamaphutha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Hlola ngedatha efana nezimo zangempela zokukhiqiza.
Hlola ngedatha efana nezimo zangempela zokukhiqiza. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Engeza isibuyekezo somuntu ukuze uthole ukuzethemba okuphansi noma izibikezelo zomthelela omkhulu.
Engeza isibuyekezo somuntu ukuze uthole ukuzethemba okuphansi noma izibikezelo zomthelela omkhulu. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Landelela ukukhukhuleka kwemodeli bese uqinisekisa kabusha ngemva kwezinguquko zekhamera noma zesethi yedatha.
Landelela ukukhukhuleka kwemodeli bese uqinisekisa kabusha ngemva kwezinguquko zekhamera noma zesethi yedatha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.