Linux大棚 – 不忘初心的技术博客,浮躁时代的安静角落
  •  首页
  •  技术日记
  •  编程
  •  旅游
  •  数码
  •  登录
  1. 标签
  2. Theoretic
  • KTO: Model Alignment as Prospect Theoretic Optimization

    一、引言 本报告介绍了一种基于前景理论(Prospect Theory)的大型语言模型对齐方法 ——KTO(Kahneman-Tversky Optimization)。该方法通过设计人类感知损失函数(HALO),直接最大化模型生成的效用
    ALIGNMENT Model KTO Optimization Theoretic
    admin 4月前
    56 0
  • [NIPS2017] A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning 笔记

    文章目录前言Background and Related WorkNeural Fictitious Self-PlayPolicy-Space Response OraclesMeta-Strategy SolversDeep Cogni
    笔记 GAME Theoretic Unified Reinforcement
    admin 4月前
    43 0
CopyRight © 2022 All Rights Reserved 豫ICP备2021025688号-21
Processed: 0.016 , SQL: 9