Overview
Diffusion Transformers (DiTs) inoshandura iyo convolutional U-Net pamwoyo wemufananidzo uye vhidhiyo jenareta kune Transformer musana. Ichi chivakwa chine masimba ekutungamirira masisitimu akaita seStable Diffusion 3 uye OpenAI's Sora, uye inoyera zvinoshamisa paunowedzera komputa.
Diffusion Transformers ndeyekombuta-yekuona workflows inodudzira kana kugadzira midhiya yekuona yekuongorora, mashandiro, uye kugadzira.
Deep Dive
Diffusion modhi inogadzira mapikicha nekutanga kubva paruzha rwakachena uye nekudzokorodza kuita ruzha mumufananidzo wakabatana. Kwemakore network ichiita iyo denoising yaive U-Net, inogadziriswa dhizaini. Iyo Diffusion Transformer, yakaunzwa naPeebles na Xie muna 2022, inotsiva U-Net neTransformer. Mufananidzo wacho unotanga wakatsikirirwa munzvimbo yakanyarara, yakakamurwa kuita zvigamba zvidiki, uye chigamba chimwe nechimwe chinova chiratidzo, senge mazwi emhando yemutauro. Iyo Transformer inozogadzirisa aya ma tokens nekuzvitarisa wega pane imwe neimwe nhanho yedenoising. Chinhu chakakosha chekuwana ndechekuti DiT performance inovandudza kufanofungidzira sezvaunowedzera saizi yemuenzaniso uye kuderedza saizi yechigamba, uchitevera yakachena kuyera mitemo. Uku scalability ndosaka mameseji-kune-vhidhiyo uye yepamusoro-yekupedzisira mameseji-kune-mufananidzo masisitimu akatamira zvakanyanya kuTransformer backbones.
Technical Insight
Chinhu chakakosha kuvandudza majekiseni eDiTs mamiriro senge nguva uye mameseji ekukurumidza. Panzvimbo pekubatanidza kwakapfava, vanoshandisa adaptive layer normalization (adaLN), uko network inofanotaura chiyero uye chekuchinja maparamita ezvakajairwa zvikamu kubva pachiratidzo chekugadzirisa. Musiyano weadaLN-zero unotangisa izvi kuitira kuti chivharo chimwe nechimwe chitange sechiziviso, kudzikamisa kudzidziswa. Zvimedu zvinopepetwa kuita tokeni, zvinogadziriswa neyakajairwa Transformer mabhuroko nekuzvitarisa, obva aunganidzwa zvakare uye akadhindwa kumashure kuita pixels.
Mastering Diffusion Transformers
Diffusion Transformers (DiTs) inoshandura iyo convolutional U-Net pamwoyo wemufananidzo uye vhidhiyo jenareta kune Transformer musana. Ichi chivakwa chine masimba ekutungamirira masisitimu akaita seStable Diffusion 3 uye OpenAI's Sora, uye inoyera zvinoshamisa paunowedzera komputa. Diffusion Transformers ndeyekombuta-yekuona workflows inodudzira kana kugadzira midhiya yekuona yekuongorora, mashandiro, uye kugadzira. Kuvaka kunzwisisa kwakadzama, bata Diffusion Transformers semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvinodikanwa, kujekesa fungidziro, uye patsanura izvo zvinogona kuitwa nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.
Mukuita, zvikwata zvakasimba zvinoshandisa Diffusion Transformers kuenzanisa kurongeka nehuchokwadi hwekushanda semhando yedata, kusiyana kwemwenje, uye kuenderana kwemazita. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.
Visual AI inogona kuita otomatiki yekuongorora, yekuona, uye yekumaka mabasa pachiyero. Panguva imwecheteyo, kodzero dzeMufananidzo uye kubvumirwa kunogona kuve njodzi dzepamutemo kana hunhu husina kujeka. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.
Strategic Impact
Visual AI inogona kuita otomatiki yekuongorora, yekuona, uye yekumaka mabasa pachiyero.
Visual AI inogona kuita otomatiki yekuongorora, yekuona, uye yekumaka mabasa pachiyero. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Zvikwata zvekugadzira zvinogona prototype pfungwa nekukurumidza nekudzokororwa kwemaoko mashoma.
Zvikwata zvekugadzira zvinogona prototype pfungwa nekukurumidza nekudzokororwa kwemaoko mashoma. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Mashandisirwo anogona kushandisa masaini emifananidzo nemavhidhiyo ayo aimbove akaoma kugadzirisa.
Mashandisirwo anogona kushandisa masaini emifananidzo nemavhidhiyo ayo aimbove akaoma kugadzirisa. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Real-World Implementation
OpenAI's Sora inoshandisa Transformer musana pamusoro pezvigamba zvemuchadenga kugadzira mavhidhiyo akareba-maminiti, akavimbika kubva mukurudziro yemavara.
Yakagadzikana Diffusion 3 inotora multimodal Diffusion Transformer (MMDiT) kuti ienzanise zvirinani mifananidzo inogadzirwa ine yakadzama yezvinyorwa tsananguro.
Vatsvaguri vanoyera DiT kusvika kumabhiriyoni emaparamita uye vanocherekedza mhando yemufananidzo ichinatsiridza kufanofungidzira, vachitungamira komputa-bhajeti sarudzo.
Iyo studio inoshandisa iyo DiT-yakavakirwa modhi kuti iwedzere zvipfupi zvipfupi, inobata akawedzera vhidhiyo mafuremu seyekuwedzera chigamba tokens kuita denoise.
Maitiro Ekuita
Diffusion Transformers mukuita
OpenAI's Sora inoshandisa Transformer musana pamusoro pezvigamba zvemuchadenga kugadzira mavhidhiyo akareba-maminiti, akavimbika kubva mukurudziro yemavara.
OpenAI's Sora inoshandisa Transformer backbone pamusoro pezvigamba zvemuchadenga kugadzira maminiti-akareba, mavhidhiyo akavimbika kubva kumashoko ekurudziro Zvikwata zvinowanzowana mibairo iri nani kana vachinge vatsanangura zvemhando yepamusoro kumberi, kuchengetedza nzira yekukwira kwevanhu yekuwedzera kwechigadzirwa, uye kuteedzera mutengo wemhomho yekubudirira.
Diffusion Transformers mukuita
Yakagadzikana Diffusion 3 inotora multimodal Diffusion Transformer (MMDiT) kuti ienzanise zvirinani mifananidzo inogadzirwa ine yakadzama yezvinyorwa tsananguro.
Yakagadzika Diffusion 3 inotora multimodal Diffusion Transformer (MMDiT) kuti ienzanise zvirinani mifananidzo yakagadzirwa ine yakadzama tsananguro yemavara Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
Diffusion Transformers mukuita
Vatsvaguri vanoyera DiT kusvika kumabhiriyoni emaparamita uye vanocherekedza mhando yemufananidzo ichinatsiridza kufanofungidzira, vachitungamira komputa-bhajeti sarudzo.
Vatsvaguri vanoyera DiT kusvika kumabhiriyoni emaparamita uye vanocherekedza mhando yemufananidzo ichinatsiridza kufanofungidzira, kutungamira komputa-bhajeti sarudzo Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
Diffusion Transformers mukuita
Iyo studio inoshandisa iyo DiT-yakavakirwa modhi kuti iwedzere zvipfupi zvipfupi, inobata akawedzera vhidhiyo mafuremu seyekuwedzera chigamba tokens kuita denoise.
Studio inoshandisa DiT-yakavakirwa modhi kuti iwedzere zvipfupi, inobata akawedzera mavhidhiyo mafuremu seyekuwedzera chigamba tokens kuti denoise Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye kukanganisa mutengo nekufamba kwenguva.
Njodzi & Guardrails
Kodzero dzemifananidzo uye kubvumirwa kunogona kuve njodzi dzepamutemo kana provenance isina kujeka.
Kuita kwemuenzaniso kunogona kusiyanisa kupenya, huwandu hwevanhu, uye nharaunda.
Manyepo enhema anogona kusacherechedzwa kunze kwekunge zvikumbaridzo zvekuvimba zvikatariswa.
Implementation Roadmap
Tsanangura maitiro ekugamuchirwa echokwadi, kurangarira, uye mutengo wekukanganisa.
Tsanangura maitiro ekugamuchirwa echokwadi, kurangarira, uye mutengo wekukanganisa. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Edzai nedata rinoenderana nemamiriro chaiwo ekugadzira.
Edzai nedata rinoenderana nemamiriro chaiwo ekugadzira. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Wedzera ongororo yemunhu kune yakaderera-kusavimbika kana yakakwirira-inokanganisa kufanotaura.
Wedzera ongororo yemunhu kune yakaderera-kusavimbika kana yakakwirira-inokanganisa kufanotaura. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Tevera modhi kudonha uye simbisa mushure mekuchinja kwekamera kana dataset.
Tevera modhi kudonha uye simbisa mushure mekuchinja kwekamera kana dataset. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.