- Published on
Llama-4 doesn't disappoint!
- Authors
- Name
- AbnAsia.org
- @steven_n_t
- Ease of deployment is now a more important OSS feature than sheer size. There's emphasis that Llama 4 Scout can run on a single H100, as opposed to Llama-3-401B, which was powerful but ultimately had lesser adoption. Mixture of Expert is a good way forward for OSS strategy.
- A new technique called MetaP tunes training hyperparameters in a smart way. Not many details, but I bet it's something close to Bayesian optimization in Ax, an open-source framework from Meta that performs adaptive experiments (like A/B testing) with limited trial budget.
- Post-training strategy is to down-weight SFT/DPO and up-weight RL, because SFT can over-constrain the model and reduce exploration.
- Earlier model checkpoint can serve as a critic for its later self. For example, the model filters out easy prompts for the next iteration, and keeps getting better at the filtering as it trains.
- Llama 4 Behemoth is trained w/ FP8, 32K GPUs, and 30T tokens. It has to prune out 95% of SFT data compared to 50% for smaller models. Basically training data is too easy for the large model.
- The tricks to enable 10M context seem quite simple: (1) remove positional embedding from every other attention layer. It's from a paper that introduces NoPE (No Positional Embedding), clever name lol; (2) adjust softmax attention by context size.
- Grok is now the SOTA standard for LLM social bias! Quote: "Llama 4 performs significantly better than Llama 3 and is comparable to Grok" on political leaning and refusal to answer.
Congrats to the team on another stellar release!
Author
AiUTOMATING PEOPLE, ABN ASIA was founded by people with deep roots in academia, with work experience in the US, Holland, Hungary, Japan, South Korea, Singapore, and Vietnam. ABN Asia is where academia and technology meet opportunity. With our cutting-edge solutions and competent software development services, we're helping businesses level up and take on the global scene. Our commitment: Faster. Better. More reliable. In most cases: Cheaper as well.
Feel free to reach out to us whenever you require IT services, digital consulting, off-the-shelf software solutions, or if you'd like to send us requests for proposals (RFPs). You can contact us at [email protected]. We're ready to assist you with all your technology needs.
© ABN ASIA