Published on

What makes Llama 3 good


"The release of Llama 3 was very sparse in terms of technical/research details. I tried to extract the hidden details from the release blog post to answer common questions. 👀

Why is Llama 3 better than Llama 2?

  • Scaled up pertaining 7x from 2T Tokens to 15T on sequences of 8,192 tokens.

Improved data quality with new filtering including heuristic filters, NSFW filters, semantic deduplication (👀) approaches, and text classifiers to predict data quality.

  • Used Llama 2 to generate synthetic training data to train text-quality classifiers.

  • Extensive experiments to find the best data mix from different sources

What changes were made to Llama 3?

  • Used attention-mask to ensure self-attention does not cross documents. This wasn’t done for Llama2. (or OpenAI GPT-3)

  • Increased input sequence length from 4096 to 8192

  • New Tokenizer with a 128k vocabulary, leading to a reduction of 15% in needed tokens compared to Llama 2 to generate the same text. It will also improve multilingualism for continued pertaining or future versions. (Thats is why 7B became 8B → bigger embedding layer)

  • All model sizes use grouped query attention (GQA)

How was Llama 3 Instruct trained? Others:

  • Used a combination of supervised fine-tuning (SFT), rejection sampling (RS), proximal policy optimization (PPO), and direct policy optimization (DPO)

Training on preference rankings enables the model to improve on getting the right answer in reasoning

  • Fine-tuning data includes public datasets as well as over 10M human-annotated examples. Unclear how the distribution is between the Reward Model and the Instruct Model

  • high-quality prompts and preference rankings (good Reward Model) are key

  • My Guess: 1️⃣ SFT → 2️⃣ Rejection Sampling → ( 3️⃣ DPO → 4️⃣ PPO) where 3️⃣ & 4️⃣ is repeated/iterated

  • My Guess: A good Reward Model was the key for Llama 3 Instruct to become that good.


  • 5% of the pretraining dataset is non-English/code data in 30 languages.

  • Even after 15T tokens, the model performance improved log-linearly 🤯

  • Human evaluation was performant with 1,800 promotes in 12 different topics

Let's hope Meta will release a paper and with the Reward Models in the near future.🤞🏻"

What makes Llama 3 good


AiUTOMATING PEOPLE, ABN ASIA was founded by people with deep roots in academia, with work experience in the US, Holland, Hungary, Japan, South Korea, Singapore, and Vietnam. ABN Asia is where academy and technology meet opportunity. With our cutting-edge solutions and competent software development services, we're helping businesses level up and take on the global scene. Our commitment: Faster. Better. More reliable. In most cases: Cheaper as well.

Feel free to reach out to us whenever you require IT services, digital consulting, off-the-shelf software solutions, or if you'd like to send us requests for proposals (RFPs). You can contact us at We're ready to assist you with all your technology needs.