It’s a frigid winter day as Jess wakes up on her first-ever solo vacation. As she opens her suitcase, her heart sinks. In all her excitement, she left her winter coat at home. On a moment’s notice, her AI lifestyle assistant suggests several wool trench parkas from nearby shops. Not only do they pair perfectly with the pink polka-dot scarf she packed, but they also match her personal style: fitted tailoring; quirky details, like floral piping; natural fabrics; and bargain prices.
The thing is, now Jess owns two winter coats. She opens her digital closet, finds her old coat, and within seconds, the same lifestyle assistant posts a listing that showcases the coat on a 360-degree virtual avatar, so potential buyers can see exactly how it looks on themselves. It also automatically offers customers styling tips. By the time she’s finished getting ready, she already has a few messages from potential buyers.
Facebook AI researchers and engineers are building the technology to — one day — let you do all this in one holistic, intelligent system. It means teaching machines what people easily learn: understanding the different items a closet or an apartment would have, how a garment relates to an accessory, and how an online product might look in real life — and doing this for millions of people around the world. Today we’re sharing new details on the cutting-edge AI techniques we’ve built to get us there. Our new system, GrokNet, can understand precise specifics about what’s in nearly any photo. We’ve also built technology that can automatically turn a 2D phone video into an interactive 360-degree view. We’re now one step closer to our vision of making anything shoppable while personalizing to individual taste.
Pixel-perfect computer vision
Shopping is generally challenging for AI because it’s so subjective — people’s preferences can shift depending on the time of day, the occasion, the weather, and what they saw on their feed, along with many other ever-changing factors. But an even more fundamental challenge is that machines see visuals as just pixels.
Advancements in segmentation (identifying which pixels make up different objects) are thanks to technology pioneers like Tamara Berg, a Facebook Research Scientist and an award-winning, previously tenured professor at the University of North Carolina at Chapel Hill. She’s combined her deep expertise in the field of computer vision and her lifelong passion for fashion to invent new ways for computers to recognize various things in pictures, including clothing.
“When I first started working on clothing segmentation, about a decade ago, computer systems weren’t very accurate in identifying clothes in photos,” Berg says. “Today, we’ve achieved more than 80 percent accuracy — we’ve come a long way.”
Clothing is especially challenging for AI to identify because bodies have many shapes, and pieces of clothing are often layered or hidden behind hair. A new clothing parsing technique that Berg and her team researched called “instance mask projection” helps computers virtually see clothing despite occlusions. It works by first predicting a box around each item of clothing in an image, providing a rough segmentation for the item, and then precisely labeling each pixel with its clothing type, based on examples learned from curated training data sets.
“I want to build something that ultimately feels like having one of your best friends with you whenever you get dressed, shop, want a recommendation, or simply need some inspiration,” Berg says. That’s why she is prototyping an intelligent digital closet, which lets you take photos of your outfits and digitize each item within seconds. The digital closet can provide not only outfit suggestions based on planned activities or weather but also fashion inspiration based on products and styles that you like, so you can shop in the context of what you already own.
Scanning products in billions of photos
Research Scientist Sean Bell set out to build a universal product recognition model that can help billions of people shop any photo. This is a huge AI challenge because optimizing for one category of items sometimes reduces the effectiveness of another. “An important insight that ultimately helped make this possible was that each fine-grained category has an inherent level of difficulty, and easier tasks don’t need as much labeled data for training as harder tasks,” Bell says.
This new system, called GrokNet, performs better than previous computer vision systems. It’s deployed on Marketplace and automatically suggests attributes such as colors and materials when sellers upload a photo of an item for sale, which makes posting a listing much easier. And on the buyer’s side, predicted detailed descriptions provided by Bell’s AI system allow you to search Marketplace not just for black chairs, for example, but specifically for a black leather sectional sofa — even if the seller didn’t explicitly add those details to the description.
We’re also using GrokNet to test automatic product tagging suggestions on Facebook Pages to help people discover items from businesses they like. When Page admins upload a photo, GrokNet can suggest potential products to tag by visually matching between items in the photo and the Page's product catalog. Product tags make it easier for people to find the items featured in content in their feed. So, in the future, let’s say you’re eyeing a pair of sneakers, but only the brand is tagged. Normally, you might have to search through the brand’s website to find the product. Our AI system would automatically scan through the product catalog and identify the exact pair of shoes to enable a more convenient shopping experience. With AI-powered product tagging, businesses will be able to more easily showcase entire catalogs of products to billions of people worldwide within seconds.
And we'll be able to use GrokNet to help customers easily find exactly what they're looking for, receive personalized suggestions from storefronts on what products are most relevant to them, which products are compatible, how they’re being worn, and then click through to purchase when they find things they like in their feeds.
Transforming ordinary photos into 3D-like views
Product descriptions and tags are useful, but any shopper knows the real test is seeing what an item actually looks like. This inability to see the condition of a product, size, and scale is one of the biggest frictions of online shopping today. And not all sellers have access to expensive video equipment to market their products.
To address this, we’re introducing Rotating View, a state-of-the-art 3D-like photo capability that allows anyone with a camera on their phone to capture multi-dimensional panoramic views of their listings on Marketplace. This feature allows any seller with a camera phone to turn regular 2D video into a 3D-like interactive view. We’ve started testing this feature on Marketplace for iOS Sellers to start.
And to enhance product listings a step further, we already offer features on Facebook and Instagram shopping that infuse augmented reality (AR) effects into advertisements and product display pages so that shoppers can try on products virtually before they buy. We drew from our Spark AR platform, which allows any brand or business to design its own effects, overlaying 3D objects on real-life environments and people using computer vision-powered tracking. We’re excited to expand on this offering to support more businesses and products in the future.
The road to AI-powered shopping
These efforts in building AI to make shopping easier are part of our vision to build machines that can learn, understand, and plan the way people do.
Still, there’s a long road ahead to a fully integrated AI-first experience. We need systems to learn the relationship between items — that a red scarf would make an outfit pop more than a string of pearls would, and that denim jeans are great for dinner with friends but not for a formal wedding. Systems need to learn creative expression and anticipate our needs to predict the right gift based on the context — what to buy for a best friend’s baby shower versus a coworker’s housewarming.
Truly intelligent AI of the future will combine AI with other leading-edge technology of augmented reality and virtual reality, and leverage our social media community interactions to make the online shopping experience better for everyone around the world.
Meta creates breakthrough technologies and advances AI to connect people to what matters and to help keep communities safe.