I-VISual AI GUIDE

Ama-Diffusion Transformers

Ama-Diffusion Transformers (DiTs) ashintshanisa i-U-Net ye-convolutional enhliziyweni yezithombe namajeneretha wevidiyo ukuze athole umgogodla we-Transformer.

Uhlolojikelele

Ama-Diffusion Transformers (DiTs) ashintshanisa i-U-Net ye-convolutional enhliziyweni yezithombe namajeneretha wevidiyo ukuze athole umgogodla we-Transformer. Lesi sakhiwo sinikeza amandla amasistimu aholayo afana ne-Stable Diffusion 3 kanye ne-OpenAI's Sora, futhi ikhula kahle kakhulu njengoba wengeza ukubala.

I-Diffusion Transformers ingeyokugeleza kokusebenza kombono wekhompyutha okuhumusha noma okukhiqiza imidiya ebonakalayo ukuze ihlaziywe, isebenze, futhi isungule.

I-Deep Dive

Amamodeli okusabalalisa akhiqiza izithombe ngokuqala emsindweni omsulwa futhi ngokuphindaphindayo awusho ube yisithombe esibumbene. Iminyaka eminingi inethiwekhi eyenza lokho kukhipha umsindo kwakuyi-U-Net, i-convolutional architecture. I-Diffusion Transformer, eyethulwe ngabakwaPeebles no-Xie ngo-2022, ithatha indawo ye-U-Net ngeTransformer. Isithombe siqale sicindezelwe sibe yindawo ecashile, sihlukaniseke sibe iziqephu ezincane, futhi isiqeshana ngasinye siba uphawu, njengamagama emodeli yolimi. I-Transformer ibe isicubungula lawa mathokheni ngokuzinaka esinyathelweni ngasinye sokwenza umsindo. Okutholakele okubalulekile kwaba ukuthi ukusebenza kwe-DiT kuba ngcono ngokubikezelwa njengoba ukhulisa usayizi wemodeli futhi unciphisa usayizi wesichibi, ulandela imithetho ehlanzekile yokukala. Lokhu kukhula kungakho amasistimu wombhalo uye kuvidiyo kanye neziphetho eziphezulu zombhalo uye esithombeni ziye kakhulu zithuthele ku-Transformer backbones.

I-Technical Insight

Ukuqamba okusha okubalulekile yindlela ama-DiTs afaka ngayo isimo njengesinyathelo sesikhathi nokwaziswa kombhalo. Kunokuhlanganisa okulula, zisebenzisa i-adaptive layer normalization (i-adaLN), lapho inethiwekhi ibikezela isikali kanye namapharamitha okushintsha ezendlalelo zokujwayelekile kusukela kusignali yokumisa. Okuhlukile kwe-adaLN-zero kuqalisa lokhu ukuze ibhulokhi ngayinye iqale njengomsebenzi wobunikazi, ukuzinzisa ukuqeqeshwa. Amapheshana anwetshwa abe amathokheni, acutshungulwe ngamabhulokhi e-Transformer ajwayelekile ngokuzinaka, abese ehlanganiswa futhi ekhoda abuyiselwe abe ngamaphikseli.

I-Mastering Diffusion Transformers

Ama-Diffusion Transformers (DiTs) ashintshanisa i-U-Net ye-convolutional enhliziyweni yezithombe namajeneretha wevidiyo ukuze athole umgogodla we-Transformer. Lesi sakhiwo sinikeza amandla amasistimu aholayo afana ne-Stable Diffusion 3 kanye ne-OpenAI's Sora, futhi ikhula kahle kakhulu njengoba wengeza ukubala. I-Diffusion Transformers ingeyokugeleza kokusebenza kombono wekhompyutha okuhumusha noma okukhiqiza imidiya ebonakalayo ukuze ihlaziywe, isebenze, futhi isungule. Ukuze wakhe ukuqonda okujulile, phatha ama-Diffusion Transformers njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela efiselekayo, ucacise ukucabanga, futhi uhlukanise lokho uhlelo olungakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.

Empeleni, amaqembu aqinile asebenzisa ukunemba kwebhalansi ye-Diffusion Transformers namaqiniso okusebenza njengekhwalithi yedatha, ukuhluka kokukhanya, nokuvumelana kwamalebula. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.

I-Visual AI ingakwazi ukuhlola, ukutholwa, nokumaka imisebenzi esikalini. Ngesikhathi esifanayo, amalungelo ezithombe kanye nemvume kungaba ubungozi bomthetho uma ukutholakala kungacacile. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.

I-Strategic Impact

I-Visual AI ingakwazi ukuhlola, ukutholwa, nokumaka imisebenzi esikalini.

I-Visual AI ingakwazi ukuhlola, ukutholwa, nokumaka imisebenzi esikalini. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Amathimba aqanjiwe angakwazi ukulinganisa imiqondo ngokushesha ngezibuyekezo ezimbalwa ezenziwa mathupha.

Amathimba aqanjiwe angakwazi ukulinganisa imiqondo ngokushesha ngezibuyekezo ezimbalwa ezenziwa mathupha. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Imisebenzi ingasebenzisa amasiginali wesithombe nawevidiyo obekunzima ukuwenza ngaphambilini.

Imisebenzi ingasebenzisa amasiginali wesithombe nawevidiyo obekunzima ukuwenza ngaphambilini. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Ikusasa Lama-Diffusion Transformers

Ama-Diffusion Transformers aba umgogodla ozenzakalelayo wemidiya ekhiqizayo. Idizayini yabo esekwe kumathokheni ibenza babe ngokwemvelo ekuhlanganiseni izithombe, ividiyo, ngisho nesizukulwane se-multimodal ngaphansi kwesakhiwo esisodwa esingaka. Ucwaningo luphokophela kuvidiyo ende, ukulungiswa okuphezulu, nokunaka okuphumelelayo ukuze kuncishiswe izindleko ze-quadratic zamathokheni amaningi. Lindela ukuhlangana phakathi kolimi namamodeli ombono, lapho amaresiphi afanayo e-Transformer esikalini nengqalasizinda kusebenza kokubili, kusheshisa inqubekelaphambili kumamodeli omhlaba kanye nevidiyo esebenzisanayo.

Ukuqaliswa Komhlaba Wangempela

OpenAI's Sora isebenzisa i-Transformer backbone phezu kwamapeshi esikhathi sasemkhathini ukuze kukhiqizwe amavidiyo anobude beminithi, athembeke kakhulu ngeziyalezo zombhalo.

I-Stable Diffusion 3 isebenzisa i-multimodal Diffusion Transformer (MMDiT) ukuze iqondanise kangcono izithombe ezikhiqiziwe nezincazelo zombhalo ezinemininingwane.

Abacwaningi bakala i-DiT ibe izigidigidi zamapharamitha futhi babone ikhwalithi yesithombe ithuthuka ngokubikezela, beqondisa izinqumo zebhajethi yekhompyutha.

Isitudiyo sisebenzisa imodeli esekwe ku-DiT ukuze sinwebe iziqeshana ezimfushane, siphathe amafreyimu evidiyo engeziwe njengamathokheni e-patch ukuze enze umsindo.

Amaphethini Okusebenzisa

Ama-Diffusion Transformers ekusebenzeni

OpenAI's Sora isebenzisa i-Transformer backbone phezu kwamapeshi esikhathi sasemkhathini ukuze kukhiqizwe amavidiyo anobude beminithi, athembeke kakhulu ngeziyalezo zombhalo.

OpenAI's Sora sisebenzisa i-Transformer backbone phezu kwamapeshi esikhathi sasemkhathini ukuze kukhiqizwe amavidiyo athatha iminithi ubude, athembekile kakhulu ngokwaziswa ngombhalo Amaqembu ngokuvamile athola imiphumela engcono uma echaza imingcele yekhwalithi ngaphambili, agcina indlela yokukhuphuka komuntu ukuze azuze izinzuzo ngokuhamba kwesikhathi, futhi alandelele izindleko zombili zokukhiqiza.

Ama-Diffusion Transformers ekusebenzeni

I-Stable Diffusion 3 isebenzisa i-multimodal Diffusion Transformer (MMDiT) ukuze iqondanise kangcono izithombe ezikhiqiziwe nezincazelo zombhalo ezinemininingwane.

I-Stable Diffusion 3 isebenzisa i-multimodal Diffusion Transformer (MMDiT) ukuze iqondanise kangcono izithombe ezikhiqiziwe nezincazelo zombhalo ezinemininingwane Amaqembu ngokuvamile athola imiphumela engcono uma echaza imingcele yekhwalithi ngaphambili, egcina indlela yokukhuphuka komuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Ama-Diffusion Transformers ekusebenzeni

Abacwaningi bakala i-DiT ibe izigidigidi zamapharamitha futhi babone ikhwalithi yesithombe ithuthuka ngokubikezela, beqondisa izinqumo zebhajethi yekhompyutha.

Abacwaningi bakala i-DiT ibe izigidigidi zamapharamitha futhi babone ikhwalithi yesithombe ithuthuka ngokubikezela, ukuqondisa izinqumo zebhajethi yekhompuyutha Amathimba ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, agcina indlela yokukhuphuka kwabantu yamacala asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Ama-Diffusion Transformers ekusebenzeni

Isitudiyo sisebenzisa imodeli esekwe ku-DiT ukuze sinwebe iziqeshana ezimfushane, siphathe amafreyimu evidiyo engeziwe njengamathokheni e-patch ukuze enze umsindo.

Isitudiyo sisebenzisa imodeli esekwe ku-DiT ukuze sinwebe iziqeshana ezimfushane, siphathe amafreyimu evidiyo engeziwe njengamathokheni engeziwe ukuze enze i-denoise Amaqembu ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, egcina indlela yokukhuphuka komuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Izingozi & Guardrails

!

Amalungelo ezithombe kanye nemvume kungaba ubungozi bezomthetho uma ukuvela kungacacile.

!

Ukusebenza kwemodeli kungahluka kukho konke ukukhanya, izibalo zabantu, kanye nezindawo.

!

Okuhle okungelona iqiniso kungase kungabonakali ngaphandle uma izinga lokuzethemba liqashelwa.

Ukuqalisa Umhlahlandlela

1

Chaza indlela yokwamukela yokunemba, ukukhumbula, nezindleko zamaphutha.

Chaza indlela yokwamukela yokunemba, ukukhumbula, nezindleko zamaphutha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

2

Hlola ngedatha efana nezimo zangempela zokukhiqiza.

Hlola ngedatha efana nezimo zangempela zokukhiqiza. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

3

Engeza isibuyekezo somuntu ukuze uthole ukuzethemba okuphansi noma izibikezelo zomthelela omkhulu.

Engeza isibuyekezo somuntu ukuze uthole ukuzethemba okuphansi noma izibikezelo zomthelela omkhulu. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

4

Landelela ukukhukhuleka kwemodeli bese uqinisekisa kabusha ngemva kwezinguquko zekhamera noma zesethi yedatha.

Landelela ukukhukhuleka kwemodeli bese uqinisekisa kabusha ngemva kwezinguquko zekhamera noma zesethi yedatha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

Qhubeka Uhlole