Reinforcement Learning for Large Language Models: Group Relative Policy Optimization (GRPO) - FeynmanWiki