MultiRay is Meta’s new platform for running large-scale, state-of-the-art AI models. Models hosted on the MultiRay platform convert raw input (text, images) from posts, comments, and other content in Meta apps into an “embedding,” a more machine learning–friendly, general-purpose intermediate format that can be shared across multiple machine learning (ML) models. By sharing the embedding, we eliminate the compute-intensive processing necessary for understanding the raw input, reducing the complexity and energy needed to perform such tasks as bullying detection.
By doing this, we can save significantly on the compute power necessary to run advanced AI models across a wide range of use cases, since the general-purpose embedding is computed only once. This works well with other important efficiency optimizations, such as AI accelerators, which MultiRay uses extensively.
MultiRay provides two state-of-the-art models today: TextRay, a text understanding model, and PostRay, a text+image multimodal understanding model. These models are used by over 125 teams across Meta and support up to 20M queries per second (QPS) while serving 800B queries per day.
Why it matters:
Application to complex, real-world tasks has driven state-of-the-art AI models to become large and more energy-intensive to run. With MultiRay in place, teams at Meta can rapidly build on these cutting-edge foundational models in an affordable manner. Teams across Meta can reuse a centrally computed intermediate embedding at a fraction of the cost of building and training a full system themselves. Additionally, the reduced complexity means they can do it faster and with less need for specialized expertise in natural language processing or computer vision.
Take a deeper dive:
Learn more about how MultiRay is optimizing efficiency for large-scale AI models.