Theoretic

KTO: Model Alignment as Prospect Theoretic Optimization

一、引言本报告介绍了一种基于前景理论（Prospect Theory）的大型语言模型对齐方法 ——KTO（Kahneman-Tversky Optimization）。该方法通过设计人类感知损失函数（HALO），直接最大化模型生成的效用

ALIGNMENT Model KTO Optimization Theoretic

admin 5月前

66 0
[NIPS2017] A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning 笔记

文章目录前言Background and Related WorkNeural Fictitious Self-PlayPolicy-Space Response OraclesMeta-Strategy SolversDeep Cogni

笔记 GAME Theoretic Unified Reinforcement

admin 5月前

59 0