Published on

Most modern LLMs use the Mixture of Experts architecture

Authors

"Most modern LLMs use the Mixture of Experts architecture ...and it is quite trivial to implement! This implementation is pretty similar to the one you can find in Mistral 7B.

An ""expert,"" in that case, is just a simple feed-forward network, and we have a router that is in charge of routing the tokens to the right expert. To route, we just have a linear layer that maps from hidden states to probabilities associated with each expert, and we just select the experts related to those probabilities. The resulting hidden states are just the weighted sum of the output of those experts. That's it!"

Most modern LLMs use the Mixture of Experts architecture

Author

AiUTOMATING PEOPLE, ABN ASIA was founded by people with deep roots in academia, with work experience in the US, Holland, Hungary, Japan, South Korea, Singapore, and Vietnam. ABN Asia is where academia and technology meet opportunity. With our cutting-edge solutions and competent software development services, we're helping businesses level up and take on the global scene. Our commitment: Faster. Better. More reliable. In most cases: Cheaper as well.

Feel free to reach out to us whenever you require IT services, digital consulting, off-the-shelf software solutions, or if you'd like to send us requests for proposals (RFPs). You can contact us at [email protected]. We're ready to assist you with all your technology needs.

ABNAsia.org

© ABN ASIA

AbnAsia.org Software