RL Training - Search News

Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

Ai2 updates its Olmo 3 family of models to Olmo 3.1 following additional extended RL training to boost performance.

Nvidia bets on open infrastructure for the agentic AI era with Nemotron 3

The company is positioning its new offerings as a business-ready way for enterprises to build domain-specific agents without first needing to create foundation models.

12d

Macaron AI's Mind Lab Sets New Benchmark with Trillion Parameter RL at 10% Cost, Now Integrated Into NVIDIA Megatron

Macaron AI – a startup known for its Personal AI Agent – is betting on this very idea. Today, Macaron AI is officially ...

NextBigFuture

Microsoft and China AI Research Possible Reinforcement Pre-Training Breakthrough

Reinforcement Pre-Training (RPT) is a new method for training large language models (LLMs) by reframing the standard task of predicting the next token in a sequence as a reasoning problem solved using ...

Seeking Alpha

Nvidia's $10 Trillion+ Roadmap: Reinforcement Learning And Synthetic Data

AI scaling faces diminishing returns due to the growing scarcity of high-quality, high-entropy data from the internet, pushing the industry towards richer, synthetic data. Nvidia is strategically ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results