39 C
New Delhi
Saturday, May 25, 2024

Are the Latest Open LLMs Good? Comparison of DPO and PPO.

More from Author

In Short:

In April 2024, several major transformer-based LLM model releases occurred, including Mixtral, Llama 3, Phi-3, and OpenELM. A new study compared the effectiveness of DPO and PPO for LLM alignment, finding that PPO generally outperforms DPO. Additionally, other interesting research papers in April covered various topics related to large language models and their applications. Overall, it has been an exceptional month for LLM research and new model releases.

April 2024: Major LLM Releases

In April 2024, prominent transformer-based LLM models were released by Mixtral, Meta AI, Microsoft, and Apple. Let’s delve into the details of these new releases and the impact they are expected to have on the industry.

Mixtral 8x22B by Mistral AI

Mixtral, the latest mixture-of-experts (MoE) model by Mistral AI, is designed to enhance transformer architectures with 8 expert layers. Released under an Apache 2.0 open-source license, Mixtral 8x22B follows the success of its predecessor, Mixtral 8x7B, released in January 2024.

Key Highlights:

  • Designed to improve modeling performance on the Measuring Massive Multitask Language Understanding (MMLU) benchmark
  • Utilizes a unique MoE approach with 8 expert layers for enhanced performance

Meta AI’s Llama 3 Models

The Llama 3 models from Meta AI represent a significant advancement in openly available LLM technology. Building on the success of Llama 2, these models boast improved vocabulary size and grouped-query attention for enhanced performance.

Key Features:

  • Larger dataset training size of 15 trillion tokens
  • Available in 8B and 70B size ranges

Microsoft’s Phi-3 LLM

Microsoft’s Phi-3 LLM, based on the Llama architecture, has garnered attention for its impressive performance despite being trained on fewer tokens than Llama 3. Notably, Phi-3 emphasizes dataset quality over quantity for optimal results.

Key Points:

  • Outperforms Llama 3 with a smaller dataset size of 3.3 trillion tokens
  • Utilizes heavily filtered web data and synthetic data for enhanced performance

Apple’s OpenELM LLM Family

Apple introduces the OpenELM suite, offering small LLM models designed for deployment on mobile devices. With efficient training and inference frameworks, OpenELM aims to provide accessible LLM solutions for a wide range of applications.

Key Features of OpenELM:

  • Available in 270M, 450M, 1.1B, and 3B sizes
  • Introduces instruct-version models trained with rejection sampling and direct preference optimization

DPO vs. PPO for LLM Alignment

A comprehensive study delves into the effectiveness of Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO) in LLM alignment via reinforcement learning with human feedback. The study suggests that PPO generally outperforms DPO, with specific considerations for optimal implementation based on the task requirements.

Main Findings:

  • PPO exhibits superior performance compared to DPO when applied correctly
  • DPO may struggle with out-of-distribution data, requiring additional supervision for optimal results


With the advancements in major LLM releases and research studies, April 2024 marks a significant month for the development of transformer-based models. Each model discussed offers unique strengths and applications, contributing to the evolving landscape of language modeling technologies.

- Advertisement -spot_img

More articles


Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

- Advertisement -spot_img

Latest article