Play Now
Paintball Wars
shell shockers Shell Shockers
evio Ev.io
modern commando combat Modern Commando Combat
war of soldiers War Of Soldiers
pixel gun apocalypse 6 Pixel Gun Apocalypse 6
time shooter Time Shooter
strike breakout Strike Breakout
heavy combat Heavy Combat
block team deathmatch Block Team Deathmatch
forward assault remix Forward Assault Remix
po ba polygonal battlefield Po.Ba ( Polygonal Battlefield )
polyblicy Polyblicy
sniper 3d Sniper 3D
squid game sniper Squid Game Sniper
fort craft Fort Craft
super hero league online Super Hero League Online
galaxy attack alien shooter Galaxy Attack Alien Shooter
crazy goat hunter Crazy Goat Hunter
command strike fps Command Strike FPS
helicopter rescue Helicopter Rescue
last tank attack Last Tank Attack
dinosaurs jurassic survival world Dinosaurs Jurassic Survival World
cs online CS Online
shoot and run Shoot And Run
the sniper code The Sniper Code
dead zed Dead Zed
squid sniper game Squid Sniper Game
real shooting fps strike Real Shooting FPS Strike
call of ops 3 Call of Ops 3
toon soldiers Toon Soldiers
extreme pixel gun apocalypse 3 Extreme Pixel Gun Apocalypse 3
halloween pocket sniper 3d Halloween Pocket Sniper 3D
shooting blocky combat swat gungame survival Shooting Blocky Combat Swat GunGame Survival
combat zombie warfare Combat Zombie Warfare
winter clash 3d Winter Clash 3D
rebels clash Rebels Clash
stickman sniper 3d Stickman Sniper 3D
stickman sniper 3 Stickman Sniper 3
clash of tanks Clash of Tanks
xtreme paintball wars Xtreme Paintball Wars
apple shooter Apple Shooter
wild west zombie clash Wild West Zombie Clash
archer hero Archer Hero
among shooter online Among Shooter Online
mr bullet online Mr Bullet Online
idle hero counter terrorist Idle Hero: Counter Terrorist
mr autogun online Mr Autogun Online
infinity battlefield ops Infinity Battlefield Ops
mountain operation Mountain Operation
agent alpha Agent Alpha

2.8m Gmail.txt May 2026

: Uses 22k data pairs focusing on textual accuracy (

The paper addresses the "SFT plateau," a phenomenon where Supervised Fine-Tuning (SFT) performance on Large Language Models (LLMs) stops improving even as the dataset size increases [11, 22]. The authors use a specific of chart-to-code data to demonstrate this limitation and propose Multimodal Structured Reinforcement Learning (MSRL) as a solution [11, 22]. 2. Methodology Supervised Fine-Tuning (SFT) Phase : Baseline Model : Qwen2.5-VL-7B-Instruct [11, 22]. 2.8M GMAIL.txt

: The SFT stage requires 60 hours of training on 16 H800 GPUs . The RL stages take an additional 34 hours on 24 H800 GPUs [11]. : Uses 22k data pairs focusing on textual

) used in the RL stages or the used to measure the success of the 2.8M dataset? ) used in the RL stages or the

To break the plateau, the authors implement a two-stage Reinforcement Learning (RL) process [11].