Home
Resumé
Light
Dark
Automatic
Large Language Model
ReTRM: Re-Thinking the Reward Model Training Paradigm
This paper introduces a novel reward model training method, specifically an application of linear direct sum decomposition and SVD decomposition, which separates the inference space from the latent space to train the reward.
From Policy Gradient (PG) to Group Sequence Policy Optimization (GSPO): Some Thoughts on RLHF
This article will introduce the policy gradient algorithm from the optimization objective of LLM-RL and gradually transition to the GSPO algorithm. It will also point out the current problems of RLHF and summarize the solutions summarized from some experiments I have conducted.
HARP: Hallucination Detection Via Reasoning Subspace Projection
This is a very interesting work on LLM hallucination detection. It decomposes the hidden state into a linear sum of the semantic space and the reasoning space, using the idea of direct sum decomposition in linear algebra. It is very interesting.
Service material writing system based on RAG and NL2SQL fusion architecture
RAG process design and DeepSeek fine-tuning
LLM reasoning system based on FastAPI and vLLM
vHPR is an LLM reasoning system built on FastAPI and vLLM. Its primary purpose is to evaluate LLM benchmarks. The system provides a complete benchmark evaluation API and performance monitoring system.