Odysseus: Stable RL Training of VLMs for Long-Horizon Game Decision-Making - FeynmanWiki