Uhlolojikelele
Amamodeli we-Vision-Language-Action (VLA) amanethiwekhi amakhulu e-neural athatha izithombe zekhamera kanye nomyalelo obhaliwe kanye nemiyalo yemoto yerobhothi ephuma ngokuqondile. Zibalulekile ngoba ziletha umqondo ojwayelekile ojwayelekile wamamodeli esisekelo emishinini ebonakalayo, ivumela imodeli eyodwa ilawule irobhothi emisebenzini eminingi esikhundleni sokubhala ngesandla ukuziphatha ngakunye.
I-Vision-Language-Action Models ye-Robotics ingeyokugeleza komsebenzi okubonwa ngekhompyutha okuhumusha noma okukhiqiza imidiya ebonakalayo ukuze ihlaziywe, isebenze, futhi isungule.
I-Deep Dive
Imodeli ye-VLA ihlanganisa ukusakaza okuthathu: umbono (ozimele bekhamera), ulimi (umgomo ofana nokuthi 'faka inkomishi kusinki'), kanye nesenzo (ama-engeli ahlangene, i-gripper evulekile/yokuvaleka, noma isivinini somphumela). Google I-RT-2 ye-DeepMind yayiyingqopha-mlando: ithathe imodeli yolimi lombono eqeqeshwe ezithombeni zewebhu nombhalo, yase ilungiswa ngokuhlanganyela emigwaqweni yamarobhothi ukuze inethiwekhi efanayo iphendule ngokuthi 'sithelo sini lesi?' futhi ikhipha izenzo ezifakwe uphawu njengombhalo. Vula amamodeli afana ne-OpenVLA (amapharamitha angu-7B) kanye ne-Physical Intelligence's pi-0 elandelwayo. Okubaluleke kakhulu, lawa mamodeli abonisa ukudluliswa 'okuphuthumayo': ulwazi lwewebhu (ukubona uphawu lomkhiqizo, ukuqonda 'elincane') luholela ekukhohlisweni, ngakho irobhothi lihlanganisa izinto nemiyalelo elingakaze liyibone ngesikhathi sokuqeqeshwa kwerobhothi.
I-Technical Insight
Ama-VLA amaningi ahlukanisa izenzo eziqhubekayo zibe amathokheni ukuze isiguquli sikwazi ukubikezela ngokuzenzakalelayo, njengamagama. I-RT-2 ibeka imephu yesenzo ngasinye kowodwa wemigqomo engu-256 futhi iyikhipha njengeyunithi yezinhlamvu yombhalo. Amadizayini amasha afana ne-pi-0 anamathisela inhloko 'yochwepheshe besenzo' ehlukanisayo noma ehambisana nokugeleza kumgogodla oqandisiwe wolimi lokubona, ekhiqiza izingxenye ezibushelelezi zemvamisa ephezulu (isb., 50 Hz) esikhundleni sezinyathelo ezihlukile, ezithuthukisa ubuciko.
I-Mastering Vision-Language-Action Models for Robotics
Amamodeli we-Vision-Language-Action (VLA) amanethiwekhi amakhulu e-neural athatha izithombe zekhamera kanye nomyalelo obhaliwe kanye nemiyalo yemoto yerobhothi ephuma ngokuqondile. Zibalulekile ngoba ziletha umqondo ojwayelekile ojwayelekile wamamodeli esisekelo emishinini ebonakalayo, ivumela imodeli eyodwa ilawule irobhothi emisebenzini eminingi esikhundleni sokubhala ngesandla ukuziphatha ngakunye. I-Vision-Language-Action Models ye-Robotics ingeyokugeleza komsebenzi okubonwa ngekhompyutha okuhumusha noma okukhiqiza imidiya ebonakalayo ukuze ihlaziywe, isebenze, futhi isungule. Ukuze wakhe ukuqonda okujulile, phatha amamodeli wesenzo solimi lwe-Vision-Language-Robotics njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela efiselekayo, ucacise ukucabanga, futhi uhlukanise lokho isistimu engakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.
Empeleni, amaqembu aqinile asebenzisa amamodeli we-Vision-Language-Action Models webhalansi yeRobhothi namaqiniso okusebenza njengekhwalithi yedatha, ukuhluka kokukhanya, nokuvumelana kwamalebula. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.
I-Visual AI ingakwazi ukuhlola, ukutholwa, nokumaka imisebenzi esikalini. Ngesikhathi esifanayo, amalungelo ezithombe kanye nemvume kungaba ubungozi bomthetho uma ukutholakala kungacacile. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.
I-Strategic Impact
I-Visual AI ingakwazi ukuhlola, ukutholwa, nokumaka imisebenzi esikalini.
I-Visual AI ingakwazi ukuhlola, ukutholwa, nokumaka imisebenzi esikalini. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Amathimba aqanjiwe angakwazi ukulinganisa imiqondo ngokushesha ngezibuyekezo ezimbalwa ezenziwa mathupha.
Amathimba aqanjiwe angakwazi ukulinganisa imiqondo ngokushesha ngezibuyekezo ezimbalwa ezenziwa mathupha. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Imisebenzi ingasebenzisa amasiginali wesithombe nawevidiyo obekunzima ukuwenza ngaphambilini.
Imisebenzi ingasebenzisa amasiginali wesithombe nawevidiyo obekunzima ukuwenza ngaphambilini. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Ukuqaliswa Komhlaba Wangempela
I-RT-2 ilawula Google irobhothi lasekhishini ukuze 'lisuse ubhanana liye kunombolo 3' lisebenzisa amadijithi eliwafunde embhalweni wewebhu, hhayi amademo erobhothi
I-OpenVLA, imodeli ye-7B yomthombo ovulekile, elungiswe kahle ngamalebhu ukuze iqalise ukukhetha nokubeka i-tabletop ezingalweni ezibiza kancane.
I-Physical Intelligence's pi-0 ilondolo egoqayo kanye nokusula itafula ngokuhlanganisa amakhono amaningi angaphansi emyalweni owodwa.
Ingalo ye-warehouse itshele 'khetha into ebuthakathaka kakhulu' futhi isho ukuthi iyiphi into evela ekubukekeni kwayo
Amaphethini Okusebenzisa
Umbono-Ulimi-Amamodeli Wesenzo Wamarobhothi asebenzayo
I-RT-2 ilawula Google irobhothi lasekhishini ukuze 'lisuse ibhanana liye kunombolo 3' isebenzisa amadijithi eliwafunde kumbhalo wewebhu, hhayi amademo erobhothi.
I-RT-2 elawula Google irobhothi lasekhishini ukuze 'lisuse ubhanana liye kunombolo-3' lisebenzisa amadijithi eliwafunde embhalweni wewebhu, hhayi amademo erobhothi Amaqembu ngokuvamile athola imiphumela engcono uma echaza ikhwalithi ephezulu ngaphambili, agcina indlela yokukhuphuka kwabantu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
Umbono-Ulimi-Amamodeli Wesenzo Wamarobhothi asebenzayo
I-OpenVLA, imodeli ye-7B yomthombo ovulekile, elungiswe kahle ngamalebhu ukuze iqalise ukukhetha nokubeka i-tabletop ezingalweni ezibiza kancane.
I-OpenVLA, imodeli ye-7B yomthombo ovulekile, ecushwe kahle amalebhu ukuze asebenzise ukukhetha nokubeka i-tabletop pick-and-place ezingalweni ezibiza kancane Amaqembu ngokuvamile athola imiphumela engcono uma echaza imingcele yekhwalithi ngaphambili, agcina indlela yokukhuphuka kwabantu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
Umbono-Ulimi-Amamodeli Wesenzo Wamarobhothi asebenzayo
I-Physical Intelligence's pi-0 ilondolo egoqayo kanye nokusula itafula ngokuhlanganisa amakhono amancane amaningi emyalweni owodwa.
Ilondolo egoqwayo ye-Physical Intelligence's pi-0 kanye nokusula itafula ngokuhlanganisa amakhono amancane amaningi avela esifundweni esisodwa Amathimba ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, agcine indlela yokukhuphuka yomuntu yamacala asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
Umbono-Ulimi-Amamodeli Wesenzo Wamarobhothi asebenzayo
Ingalo ye-warehouse itshele 'khetha into ebuthakathaka kakhulu' futhi isho ukuthi iyiphi into evela ekubukekeni kwayo.
Ingalo yesitolo itshele 'khetha into ebuthakathaka kakhulu' futhi yasho ukuthi iyiphi into evela ekubukekeni kwayo okubonakalayo Amathimba ngokuvamile athola imiphumela engcono uma echaza ikhwalithi ephezulu ngaphambili, egcina indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
Izingozi & Guardrails
Amalungelo ezithombe kanye nemvume kungaba ubungozi bezomthetho uma ukuvela kungacacile.
Ukusebenza kwemodeli kungahluka kukho konke ukukhanya, izibalo zabantu, kanye nezindawo.
Okuhle okungelona iqiniso kungase kungabonakali ngaphandle uma izinga lokuzethemba liqashelwa.
Ukuqalisa Umhlahlandlela
Chaza indlela yokwamukela yokunemba, ukukhumbula, nezindleko zamaphutha.
Chaza indlela yokwamukela yokunemba, ukukhumbula, nezindleko zamaphutha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Hlola ngedatha efana nezimo zangempela zokukhiqiza.
Hlola ngedatha efana nezimo zangempela zokukhiqiza. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Engeza isibuyekezo somuntu ukuze uthole ukuzethemba okuphansi noma izibikezelo zomthelela omkhulu.
Engeza isibuyekezo somuntu ukuze uthole ukuzethemba okuphansi noma izibikezelo zomthelela omkhulu. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Landelela ukukhukhuleka kwemodeli bese uqinisekisa kabusha ngemva kwezinguquko zekhamera noma zesethi yedatha.
Landelela ukukhukhuleka kwemodeli bese uqinisekisa kabusha ngemva kwezinguquko zekhamera noma zesethi yedatha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.