Overview
Multi-Agent Reinforcement Learning (MARL) inodzidzisa vamiririri vakati wandei vanogovera nharaunda, imwe neimwe ichichinja maitiro ayo vamwe vachichinjawo. Izvo zvine basa nekuti matambudziko mazhinji epasirese - traffic, misika, zvikwata zvemarobhoti - anosanganisira vazhinji vanoita sarudzo, kwete mumwe.
Multi-Agent Reinforcement Kudzidza inogara mukati meiyo AI toolkit. Paunonzwisisa, mamwe maAI misoro inova nyore kuongorora uye kuenzanisa.
Deep Dive
Mune imwe-agent yekusimbisa kudzidza, mumiriri mumwe anodzidza mutemo nekuwedzera mubairo munzvimbo yakagadziriswa. MARL inowedzera mamwe maagent, uye izvo zvinoshandura zvese: kubva pakuona kwemumiriri wega wega, nharaunda haina kumira nekuti vamwe vanoramba vachichinja marongero avo. Maagents anogona kushandirapamwe (kugovera mubairo wechikwata, semarobhoti anotamba nhabvu), kukwikwidza (zero-sum, senge poker kana kutsvaga-kunzvenga), kana kusanganiswa. Vatsvagiri vanoshandisa maformalism akadai seMarkov mitambo (stochastic mitambo) iyo inojairisa iyo imwechete-mumiriri Markov Chisarudzo Maitiro. Mibairo ine mukurumbira inosanganisira DeepMind's AlphaStar kusvika kuna Grandmaster muStarCraft II uye OpenAI zvikwata zvishanu zvakunda nyanzvi yeDota 2, zvese zvichivimba nehuwandu hwevamiririri vakadzidziswa kurwisa mumwe nemumwe kuburikidza nekuzvitamba.
Technical Insight
Dambudziko guru nderekusamira-mira: sezvo mumiririri wese anovandudza mutemo wake, vamwe vanotarisana nechinangwa chekufamba, saka kudzidza kwakazvimirira kwega kunogona kutadza kuungana. Iyo yakakurumbira gadziriso ndeyepakati kudzidziswa ine decentralized execution (CTDE), inoshandiswa nemaalgorithms seMADDPG uye QMIX. Panguva yekudzidziswa, mutsoropodzi anoona zvinoonekwa nevamiririri vese uye zviito kuti vaverenge magirayendi akatsiga, asi pakuendesa mumiriri wega wega anoita achishandisa zvitarisiko zvake zvepanzvimbo - kubatanidza kudzidza kwakarongeka nekushanda, kushanda kwakazvimirira.
Mastering Multi-Agent Reinforcement Kudzidza
Multi-Agent Reinforcement Learning (MARL) inodzidzisa vamiririri vakati wandei vanogovera nharaunda, imwe neimwe ichichinja maitiro ayo vamwe vachichinjawo. Izvo zvine basa nekuti matambudziko mazhinji epasirese - traffic, misika, zvikwata zvemarobhoti - anosanganisira vazhinji vanoita sarudzo, kwete mumwe. Multi-Agent Reinforcement Kudzidza inogara mukati meiyo AI toolkit. Paunonzwisisa, mamwe maAI misoro inova nyore kuongorora uye kuenzanisa. Kuvaka kunzwisisa kwakadzama, bata Multi-Agent Reinforcement Kudzidza semuenzaniso wekushanda, kwete chinhu chimwe chete: tsanangura zvinodikanwa, kujekesa fungidziro, uye patsanura izvo zvingaitwe nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.
Mukuita, zvikwata zvakasimba zvinoshandisa Multi-Agent Reinforcement Kudzidza zvinovaka mamodheru akasimba ekutanga, wozonyora iwo modhi kune zvimhingamipinyi zvekugadzira. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.
Inokubatsira kuparadzanisa zvakajeka zvichemo zvehunyanzvi kubva mumutauro wekushambadzira. Panguva imwecheteyo, Zvikwata zvakasiyana zvinogona kushandisa izwi rimwechete zvakasiyana, saka tsanangura nzvimbo nekukasira. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.
Strategic Impact
Inokubatsira kuparadzanisa zvakajeka zvichemo zvehunyanzvi kubva mumutauro wekushambadzira.
Inokubatsira kuparadzanisa zvakajeka zvichemo zvehunyanzvi kubva mumutauro wekushambadzira. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Iwe unogona kubvunza zvirinani kuita mibvunzo usati washandisa mari kana nguva.
Iwe unogona kubvunza zvirinani kuita mibvunzo usati washandisa mari kana nguva. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Zvikwata zvine nzwisiso yakagovaniswa inoita zvirinani chigadzirwa, mutemo, uye sarudzo dzekudzidza.
Zvikwata zvine nzwisiso yakagovaniswa inoita zvirinani chigadzirwa, mutemo, uye sarudzo dzekudzidza. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Real-World Implementation
Kubatanidza zvikepe zvemarobhoti ekuchengetera zvinhu kuitira kuti vafambise mapakeji pasina kudhumhana kana kudhumhana mumikoto.
Traffic-signal control uko mharadzano yega yega mumiriri ari kudzidza kuderedza kuzara kweguta
Mutambo wekudzidzisa AI se OpenAI shanu (Dota 2) uye AlphaStar (StarCraft II) kuburikidza nekuzvitamba pakati pevazhinji vamiririri.
Kugadzirisa mabhidhi uye mhinduro yekuda pakati pemabhatiri akaparadzirwa uye dzimba mune smart magetsi grid
Maitiro Ekuita
Multi-Agent Reinforcement Kudzidza mukuita
Kubatanidza zvikepe zvemarobhoti ekuchengetera zvinhu kuitira kuti vafambise mapakeji pasina kudhumhana kana kudhumhana mumikoto.
Kuronganisa zvikwata zvemarobhoti ekuchengetera zvinhu kuitira kuti vafambise mapakeji pasina kudhumhana kana kudhumhana mumikoto Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura zvikumbaridzo zvemhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
Multi-Agent Reinforcement Kudzidza mukuita
Traffic-signal control uko mharadzano yega yega mumiriri ari kudzidza kuderedza kuzara kweguta.
Traffic-signal control uko mharadzano yega yega mumiriri anodzidzira kuderedza kuungana kweguta rese Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura hunhu kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
Multi-Agent Reinforcement Kudzidza mukuita
Mutambo wekudzidzisa AI senge OpenAI shanu (Dota 2) neAlphaStar (StarCraft II) kuburikidza nekuzvitamba pakati pevazhinji vamiririri.
Mutambo wekudzidzisa AI senge OpenAI shanu (Dota 2) neAlphaStar (StarCraft II) kuburikidza nekuzvitamba pakati pevamiririri vazhinji Zvikwata zvinowanzowana mibairo iri nani kana vachinge vatsanangura mabhindauko emhando kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa pakubereka uye mutengo wekukanganisa nekufamba kwenguva.
Multi-Agent Reinforcement Kudzidza mukuita
Kugadzirisa mabhidhi uye mhinduro yekuda pakati pemabhatiri akaparadzirwa uye dzimba mune smart magetsi grid.
Kugadzirisa mabhidhi uye mhinduro yekuda pakati pemabhatiri akagoverwa uye dzimba mune yakangwara magetsi grid Matimu anowanzo kuwana mhedzisiro kana atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
Njodzi & Guardrails
Zvikwata zvakasiyana zvinogona kushandisa izwi rimwechete zvakasiyana, saka tsanangura nzvimbo nekukurumidza.
Benchmarks inogona kutaridzika yakasimba nepo chaiyo-yenyika kuita isina kuenzana.
Kuregeredza mhando yedata uye zvirongwa zvekuongorora zvinowanzogadzira mhedzisiro isina kusimba.
Implementation Roadmap
Tanga netsanangudzo yemutauro wakajeka yemhedzisiro yaunoda.
Tanga netsanangudzo yemutauro wakajeka yemhedzisiro yaunoda. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Sarudza metric imwe yekubudirira uye imwe yekutadza mamiriro usati waedzwa.
Sarudza metric imwe yekubudirira uye imwe yekutadza mamiriro usati waedzwa. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Mhanya mutyairi mudiki ane data remumiriri, kwete demo rakakwenenzverwa.
Mhanya mutyairi mudiki ane data remumiriri, kwete demo rakakwenenzverwa. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Chinyorwa uko Multi-Agent Reinforcement Kudzidza kunobatsira uye uko nzira dzakareruka dziri nani.
Chinyorwa uko Multi-Agent Reinforcement Kudzidza kunobatsira uye uko nzira dzakareruka dziri nani. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.