Large Language Model

ReTRM: Re-Thinking the Reward Model Training Paradigm

This paper introduces a novel reward model training method, specifically an application of linear direct sum decomposition and SVD decomposition, which separates the inference space from the latent space to train the reward.

ReTRM: Re-Thinking the Reward Model Training Paradigm

From Policy Gradient (PG) to Group Sequence Policy Optimization (GSPO): Some Thoughts on RLHF

This article will introduce the policy gradient algorithm from the optimization objective of LLM-RL and gradually transition to the GSPO algorithm. It will also point out the current problems of RLHF and summarize the solutions summarized from some experiments I have conducted.

From Policy Gradient (PG) to Group Sequence Policy Optimization (GSPO): Some Thoughts on RLHF

HARP: Hallucination Detection Via Reasoning Subspace Projection

This is a very interesting work on LLM hallucination detection. It decomposes the hidden state into a linear sum of the semantic space and the reasoning space, using the idea of direct sum decomposition in linear algebra. It is very interesting.

HARP: Hallucination Detection Via Reasoning Subspace Projection

Service material writing system based on RAG and NL2SQL fusion architecture

RAG process design and DeepSeek fine-tuning

Service material writing system based on RAG and NL2SQL fusion architecture

LLM reasoning system based on FastAPI and vLLM

vHPR is an LLM reasoning system built on FastAPI and vLLM. Its primary purpose is to evaluate LLM benchmarks. The system provides a complete benchmark evaluation API and performance monitoring system.

LLM reasoning system based on FastAPI and vLLM