Home
Resumé
Light
Dark
Automatic
Computer Vision
ReTRM: Re-Thinking the Reward Model Training Paradigm
This paper introduces a novel reward model training method, specifically an application of linear direct sum decomposition and SVD decomposition, which separates the inference space from the latent space to train the reward.