Mutauro AI GUIDE

Odds Ratio Preference Optimization

Odds Ratio Preference Optimization (ORPO) inzira yekumisikidza zvakanaka inodzidzisa mhando yemutauro maitiro akanaka uye zvido zvevanhu mukupasa kumwe chete.

Overview

Odds Ratio Preference Optimization (ORPO) inzira yekumisikidza zvakanaka inodzidzisa mhando yemutauro maitiro akanaka uye zvido zvevanhu mukupasa kumwe chete. Izvo zvine basa nekuti inosvetuka yakajairwa yakapatsanurwa modhi uye referenzi modhi, zvichiita kuti kurongeka kudhure uye kuve nyore.

Odds Ratio Preference Optimization chikamu chemutauro-AI stack inoshandiswa kuverenga, kugadzira, kuronga, uye kushandura zvinyorwa uye kutaura pamwero.

Deep Dive

ORPO, yakaunzwa naHong, Lee, uye Thorne muna 2024, inosanganisa yakatariswa kugadzirisa uye kurongeka kwekuda kuita nhanho imwe. Mapaipi akawanda ekuenzanisa anotanga aita SFT pamienzaniso yakanaka, wozomhanyisa nzira yechipiri seRLHF kana DPO inoda kopi yakaomeswa nechando yemuenzaniso (referensi) pamwe neakachengetwa mapeya ekuda. ORPO inobvisa iyo referensi modhi zvachose. Kurasikirwa kwayo kunowedzera nguva yechirango kune yakajairwa-inotevera-chiratidzo chinangwa: inosimudza kusawirirana iyo modhi inopa kune yakasarudzwa (inofarirwa) mhinduro uku ichisundira pasi kusawirirana kweyakarambwa. Nekuti inoshandisa iyo odds reshiyo pane yakasimba log-mukana gap, chirango chakapfava, saka modhi inodzidza kufarira mhinduro dzakanaka pasina njodzi yekukanganwa chizvarwa chinotsetseka.

Technical Insight

Kurasika kweORPO ndiko kurasikirwa kweSFT muchinjiko-entropy pamwe nekuremerwa kwelogi-sigmoid yereshiyo yelogi odds pakati pemhinduro dzakasarudzwa uye dzakarambwa. Odds yakaenzana p/(1-p), saka reshiyo inoenzanisa kuti modhi yacho ingangowana mhinduro yakanaka sei maringe neiyo yakaipa. Kushandisa zvipingamupinyi pachinzvimbo chemukana wakasvibira kunochengeta musiyano wakapfava, izvo zvinodzivirira kudzvinyirira zvakanyanya kwezviratidzo zvakarambwa zvinogona kudzikisira modhi isingatariswe.

Mastering Odds Ratio Preference Optimization

Odds Ratio Preference Optimization (ORPO) inzira yekumisikidza zvakanaka inodzidzisa mhando yemutauro maitiro akanaka uye zvido zvevanhu mukupasa kumwe chete. Izvo zvine basa nekuti inosvetuka yakajairwa yakapatsanurwa modhi uye referenzi modhi, zvichiita kuti kurongeka kudhure uye kuve nyore. Odds Ratio Preference Optimization chikamu chemutauro-AI stack inoshandiswa kuverenga, kugadzira, kuronga, uye kushandura zvinyorwa uye kutaura pamwero. Kuti uvake kunzwisisa kwakadzama, bata Odds Ratio Preference Optimization semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvinodikanwa, kujekesa fungidziro, uye patsanura izvo zvinogona kuitwa nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.

Mukuita, zvikwata zvakasimba zvinoshandisa Odds Ratio Preference Optimization dhizaini zvinokurudzira, kudzoreredza, uye kuongorora zvishwe seimwe yakabatanidzwa yekutaurirana system. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.

Mutauro workflows inogona kufamba nekukurumidza pasina kupira kuenderana. Panguva imwecheteyo, chokwadi cheHallucified chinogona kupinda chinyararire mishumo, kuyerera kwetsigiro, kana kutsvagisa zvinobuda. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.

Strategic Impact

Mutauro workflows inogona kufamba nekukurumidza pasina kupira kuenderana.

Mutauro workflows inogona kufamba nekukurumidza pasina kupira kuenderana. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Inopamhidzira kupinda mumitauro yese nemataera ekutaurirana.

Inopamhidzira kupinda mumitauro yese nemataera ekutaurirana. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Zvikwata zvinogona kupedza nguva yakawanda pakutonga uku otomatiki ichibata kudzokorora.

Zvikwata zvinogona kupedza nguva yakawanda pakutonga uku otomatiki ichibata kudzokorora. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Ramangwana reOdds Ratio Preference Optimization

ORPO iri kuwana traction nekuti inocheka ndangariro uye komputa nekudonhedza referensi modhi, iyo inokwezva zvikwata kunyatso-tung pahardware shoma. Tarisira kuti ionekwe kazhinji muakavhurika-sosi yekubika uye senge yakasarudzika sarudzo mumaraibhurari seHugging Face TRL. Basa remangwana rinogona kurongedza uremu hwe lambda otomatiki, kusanganisa ORPO nezvimwe zvinangwa-zvisina referensi, uye kuiwedzera kune multimodal uye yakakura kwazvo modhi uko kubata makopi maviri mundangariro kunodhura.

Real-World Implementation

Kunyatsogadzirisa yakavhurika-sosi 7B chat modhi pane zvaunofarira vaviri pasina kurodha yechipiri referensi kopi, nepakati GPU memory.

Kutanga kuenzanisa mutengi-rutsigiro mubatsiri kuti asarudze zvine hunhu, pane-policy mhinduro mune imwe dzidziso inomhanya panzvimbo yeSFT-ipapo-DPO.

Vatsvagiri vanofananidza ORPO vachipesana neDPO pane imwecheteyo dataset kuratidza kuenzanirana kuenderana neyakaderera compute

Kuchinjisa modhi yehwaro kune yakasarudzika dura (semuenzaniso, kunyora zviri pamutemo) uko yakanaka uye yakaipa mienzaniso miviri iripo asi mubairo-modhiyo bhajeti haisi.

Maitiro Ekuita

Odds Ratio Preference Optimization mukuita

Kunyatsogadzirisa yakavhurika-sosi 7B chat modhi pane zvaunofarira vaviri pasina kurodha yechipiri referensi kopi, nepakati GPU memory.

Kunyatsogadzirisa yakavhurika-sosi 7B yekutaura modhi pane zvaunofarira vaviri pasina kurodha yechipiri referensi kopi, hafu yeGPU ndangariro Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye kukanganisa mutengo nekufamba kwenguva.

Odds Ratio Preference Optimization mukuita

Kutanga kubatanidza mutengi-rutsigiro mubatsiri kuti asarudze zvine hunhu, pane-policy mhinduro mune imwe dzidziso inomhanya panzvimbo yeSFT-ipapo-DPO.

Kutanga kubatanidza mutengi-rutsigiro mubatsiri kuti asarudze zvine hunhu, pane-policy mhinduro mune imwe dzidziso inomhanya panzvimbo yeSFT-ipapo-DPO Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye tarisa zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

Odds Ratio Preference Optimization mukuita

Vatsvagiri vanofananidza ORPO vachipesana neDPO pane imwecheteyo dataset kuratidza kuenzanirana kuenderana neyakaderera compute.

Vatsvaguri vanofananidza ORPO neDPO pane imwecheteyo dataset kuratidza kuenzanirana neakaderera komputa Matimu anowanzo kuwana mhedzisiro iri nani kana vatsanangura hunhu hwepamberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

Odds Ratio Preference Optimization mukuita

Kuchinjisa modhi yepasi kune yakasarudzika domain (semuenzaniso, kunyora zviri pamutemo) uko yakanaka uye yakaipa mienzaniso miviri iripo asi mubairo-modhiyo bhajeti haisi.

Kuchinjisa modhi yehwaro kune yakasarudzika dura (semuenzaniso, kunyorwa kwemutemo) uko yakanaka uye yakaipa mienzaniso miviri iripo asi mubairo-modhiyo bhajeti haisi Matimu anowanzo kuwana mibairo iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye tarisa zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

Njodzi & Guardrails

!

Chokwadi chehuroyi chinogona kupinda chinyararire mishumo, kuyerera kwetsigiro, kana tsvakiridzo.

!

Kunzwa nekukasira kunogona kugadzira mhedzisiro isingaenderane pane zvikumbiro zvakafanana.

!

Sensitive text data inogona kuburitswa kana zvidhiraivho zvisina kusimba.

Implementation Roadmap

1

Tsanangura chimiro chekubuda, toni, uye mhando zviyero usati waburitsa.

Tsanangura chimiro chekubuda, toni, uye mhando zviyero usati waburitsa. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

2

Mhinduro dzepasi neakavimbika masosi pese pazvine basa.

Mhinduro dzepasi neakavimbika masosi pese pazvine basa. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

3

Chengetedza ongororo yekuongorora yemunhu kune yakakwira-stake zvinobuda.

Chengetedza ongororo yekuongorora yemunhu kune yakakwira-stake zvinobuda. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

4

Tevera maitiro ekutadza uye dzidzisazve kukurudzira kana mafambiro ebasa nguva nenguva.

Tevera maitiro ekutadza uye dzidzisazve kukurudzira kana mafambiro ebasa nguva nenguva. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

Ramba Uchiongorora