Computer Vision

ReTRM: Re-Thinking the Reward Model Training Paradigm

This paper introduces a novel reward model training method, specifically an application of linear direct sum decomposition and SVD decomposition, which separates the inference space from the latent space to train the reward.

ReTRM: Re-Thinking the Reward Model Training Paradigm