Uhlolojikelele
I-Odds Ratio Preference Optimization (i-ORPO) iyindlela yokuhlela kahle efundisa imodeli yolimi ukuziphatha okuhle nokuthandwayo komuntu ekuphumeleleni okukodwa kokuqeqeshwa. Ibalulekile ngoba yeqa imodeli yomvuzo ehlukile evamile nemodeli yereferensi, okwenza ukuqondanisa kushibhe futhi kube lula.
I-Odds Ratio Preference Optimization iyingxenye yesitaki solimi-AI esisetshenziselwa ukufunda, ukukhiqiza, ukuhlukanisa, nokuguqula umbhalo nenkulumo ngezikali.
I-Deep Dive
I-ORPO, eyethulwe ngu-Hong, u-Lee, no-Thorne ngo-2024, ihlanganisa ukulungisa okugadiwe nokuqondanisa okuthandwayo kube isinyathelo esisodwa. Amapayipi amaningi okuqondanisa aqale enze i-SFT ezibonelweni ezinhle, bese esebenzisa indlela yesibili efana ne-RLHF noma i-DPO edinga ikhophi efriziwe yemodeli (inkomba) kanye namapheya athandwayo agciniwe. I-ORPO isusa ngokuphelele imodeli yesithenjwa. Ukulahlekelwa kwayo kungeza isikhathi sokujeziswa enjongweni evamile yethokheni elandelayo: kuphakamisa izingqinamba imodeli eyabela impendulo ekhethiwe (ekhethwayo) kuyilapho yehlisa amathuba aleyo enqatshiwe. Ngenxa yokuthi isebenzisa isilinganiso sezinkinga kunegebe eliqinile lamathuba elogi, inhlawulo ithambile, ngakho imodeli ifunda ukuthanda izimpendulo ezinhle ngaphandle kokukhohlwa ngokuyinhlekelele isizukulwane esikhuluma kahle.
I-Technical Insight
Ukulahlekelwa kwe-ORPO ukulahlekelwa kwe-SFT cross-entropy kanye ne-log-sigmoid enesisindo yesilinganiso samaphutha elogi phakathi kwezimpendulo ezikhethiwe nezinqatshiwe. Odds alingana no-p/(1-p), ngakho isilinganiso siqhathanisa ukuthi maningi kangakanani amathuba okuthi imodeli ithole impendulo enhle uma iqhathaniswa nembi. Ukusebenzisa izingqinamba esikhundleni samathuba angavuthiwe kugcina ukugqama kuthambile, okuvimbela ukucindezelwa ngokweqile kwamathokheni anqatshiwe angehlisa isithunzi imodeli engabhekiselwe kuyo.
I-Mastering Odds Ratio Preference Optimization
I-Odds Ratio Preference Optimization (i-ORPO) iyindlela yokuhlela kahle efundisa imodeli yolimi ukuziphatha okuhle nokuthandwayo komuntu ekuphumeleleni okukodwa kokuqeqeshwa. Ibalulekile ngoba yeqa imodeli yomvuzo ehlukile evamile nemodeli yereferensi, okwenza ukuqondanisa kushibhe futhi kube lula. I-Odds Ratio Preference Optimization iyingxenye yesitaki solimi-AI esisetshenziselwa ukufunda, ukukhiqiza, ukuhlukanisa, nokuguqula umbhalo nenkulumo ngezikali. Ukuze wakhe ukuqonda okujulile, phatha i-Odds Ratio Preference Optimization njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela efiselekayo, ucacise ukucabanga, futhi uhlukanise lokho isistimu engakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.
Empeleni, amaqembu aqinile asebenzisa imiyalo yokuklama ye-Odds Ratio Preference Optimization, ukubuyisa, nokubuyekeza ama-loops njengohlelo olulodwa lokuxhumana oludidiyelwe. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.
Ukugeleza komsebenzi wolimi kungahamba ngokushesha ngaphandle kokudela ukuvumelana. Ngesikhathi esifanayo, amaqiniso Akhohliwe angafaka imibiko buthule, ukugeleza kosekelo, noma imiphumela yocwaningo. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.
I-Strategic Impact
Ukugeleza komsebenzi wolimi kungahamba ngokushesha ngaphandle kokudela ukuvumelana.
Ukugeleza komsebenzi wolimi kungahamba ngokushesha ngaphandle kokudela ukuvumelana. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Yandisa ukufinyelela kuzo zonke izilimi nezitayela zokuxhumana.
Yandisa ukufinyelela kuzo zonke izilimi nezitayela zokuxhumana. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Amaqembu angachitha isikhathi esiningi ekwahluleleni kuyilapho i-automation isingatha impinda.
Amaqembu angachitha isikhathi esiningi ekwahluleleni kuyilapho i-automation isingatha impinda. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Ukuqaliswa Komhlaba Wangempela
Ukuhlela kahle imodeli yengxoxo ye-7B yomthombo ovulekile kumapheya athandwayo ngaphandle kokulayisha ikhophi yesibili eyireferensi, ukunciphisa inkumbulo ye-GPU
Isiqalo sokuqondisa umsizi wokwesekwa kwamakhasimende ukuze akhethe izimpendulo ezinesizotha, eziphathelene nenqubomgomo ekugijimeni okukodwa kokuqeqeshwa esikhundleni se-SFT-bese-DPO
Abacwaningi abaqhathanisa i-ORPO ne-DPO kudathasethi efanayo ukuze babonise ukuqondana okuqhathanisekayo nekhompuyutha ephansi
Ukulungisa imodeli eyisisekelo esizindeni esikhethekile (isb., ukubhala okusemthethweni) lapho amapheya ezibonelo ezinhle nezimbi atholakalayo kodwa isabelomali semodeli yomvuzo singekho.
Amaphethini Okusebenzisa
I-Odds Ratio Preference Optimization in practice
Ukulungisa kahle imodeli yengxoxo ye-7B yomthombo ovulekile kumapheya athandwayo ngaphandle kokulayisha ikhophi yesibili yesithenjwa, kuhhafu inkumbulo ye-GPU.
Ukulungisa kahle imodeli yengxoxo ye-7B yomthombo ovulekile kumapheya okuncamelayo ngaphandle kokulayisha ikhophi yesibili eyireferensi, ukuhhafulanisa amaThimba enkumbulo ye-GPU ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, agcine indlela yokukhuphuka kwabantu yamacala asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
I-Odds Ratio Preference Optimization in practice
Isiqalo esiqondisa umsizi wokwesekwa kwamakhasimende ukuze akhethe izimpendulo ezinesizotha, eziphathelene nenqubomgomo ekugijimeni okukodwa kokuqeqeshwa esikhundleni se-SFT-bese-DPO.
Isiqalisi esiqondisa umsizi wokwesekwa kwamakhasimende ukuze akhethe izimpendulo ezinesizotha, eziphathelene nenqubomgomo ekugijimeni okukodwa kokuqeqeshwa esikhundleni samaQembu e-SFT-bese-DPO ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, egcina indlela yokukhuphuka kwabantu yamacala asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
I-Odds Ratio Preference Optimization in practice
Abacwaningi abaqhathanisa i-ORPO ne-DPO kudathasethi efanayo ukuze babonise ukuqondana okuqhathanisekayo nekhompuyutha ephansi.
Abacwaningi abaqhathanisa i-ORPO ne-DPO kudathasethi efanayo ukuze babonise ukuqondana okuqhathanisekayo namaQembu ekhompuyutha aphansi ngokuvamile athola imiphumela engcono uma echaza izilinganiso zekhwalithi ngaphambili, egcina indlela yokukhuphuka yomuntu yamacala asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
I-Odds Ratio Preference Optimization in practice
Ukulungisa imodeli eyisisekelo esizindeni esikhethekile (isb., ukubhala okusemthethweni) lapho amapheya ezibonelo ezinhle nezimbi atholakalayo kodwa isabelomali semodeli yomvuzo singekho.
Ukujwayelanisa imodeli eyisisekelo esizindeni esikhethekile (isb., ukubhala okusemthethweni) lapho amapheya ezibonelo ezinhle nezimbi atholakala khona kodwa isabelomali semodeli yomvuzo singesona Amathimba ngokuvamile athola imiphumela engcono lapho echaza imikhawulo yekhwalithi ngaphambili, agcina indlela yokukhuphuka kwabantu yamacala asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
Izingozi & Guardrails
Amaqiniso akhonjiwe angafaka ngokuthula imibiko, ukugeleza kosekelo, noma imiphumela yocwaningo.
Ukuzwela okusheshayo kungadala imiphumela engahambisani kuzo zonke izicelo ezifanayo.
Idatha yombhalo ebucayi ingase idalulwe uma izilawuli zokufinyelela zibuthakathaka.
Ukuqalisa Umhlahlandlela
Chaza ifomethi yokuphumayo, ithoni, namazinga wekhwalithi ngaphambi kokukhishwa.
Chaza ifomethi yokuphumayo, ithoni, namazinga wekhwalithi ngaphambi kokukhishwa. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Izimpendulo eziyisisekelo ngemithombo ethembekile noma nini lapho ukunemba kubalulekile.
Izimpendulo eziyisisekelo ngemithombo ethembekile noma nini lapho ukunemba kubalulekile. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Gcina indawo yokuhlola isibuyekezo somuntu ukuze uthole imiphumela ephezulu.
Gcina indawo yokuhlola isibuyekezo somuntu ukuze uthole imiphumela ephezulu. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Landela amaphethini okuhluleka futhi uqeqeshe kabusha imiyalo noma ukuhamba komsebenzi njalo.
Landela amaphethini okuhluleka futhi uqeqeshe kabusha imiyalo noma ukuhamba komsebenzi njalo. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.