For instance, we already use AI to help detect harmful misinformation, hate speech, and manipulated images. As a company, we continuously invest in AI to help keep people safe on our platforms. In particular, we invest heavily in research and technology to better understand people’s ages across our platforms. Today, we’re sharing new advancements for our adult classifier — an AI model we’ve developed to help detect whether someone is a teen or an adult.
As we invest in AI to better understand people’s ages, we are carefully working through the complexities of how we apply this technology. There are important risks we need to consider:
- Correctly categorizing teens is important to help prevent them from accessing adult features, like Facebook Dating or Mentorship, and to enable us to put the appropriate safeguards in place.
- Correctly categorizing adults is important not only because it allows them to access services and features that are appropriate for them, but also because it helps mitigate risks and child safety issues that could arise on platforms where adults and teens are both present. We don’t allow adults to message teens that don’t follow them, for example.
We are uniquely positioned to tackle this challenge, thanks to the research and work we’ve done to develop privacy-preserving AI systems. In addition, we have a long history of working collaboratively with industry partners through the Open Compute Project, as well as with experts and academics from NYU, Carnegie Mellon, UCSF, and the University of Illinois, and others to help everyone innovate and improve together.
Why is this so hard?
The job of our adult classifier is to help determine whether someone is an adult (18 and over) or a teen (13–17). When people first sign up to use our services, we ask them to enter their birth date. But people aren’t always accurate (or honest), and we’ve seen in practice that misrepresenting age is a common problem across the industry.
With billions of people around the world using our services, we need a scalable way to understand what it looks like when the age someone provides us doesn’t match their actual age. What are the clues, or signals, we have that would help us determine whether someone inaccurately shared their age with us? We need signals that are common across different ages and countries and that we can use in a systematic and privacy-protective way.
Determining what information is useful and proportionate, training an AI model on that information, and having teams in place to help verify the information and implement the right privacy and data protection guardrails is a complex process that requires time and significant resources. And that’s just the initial step. For a model to stay current, we need to continually retrain it on the latest information and check that its age detections are accurate. Over time, if we identify other signals that could help improve the accuracy, we’ll test those and retrain the model to include them.
No matter how accurate an AI system is, it can occasionally get calculations wrong. When that happens, we know it’s important to have a way for people to verify their age manually. Unfortunately, that process is not as simple as a store clerk checking someone’s ID card. Not everyone has a government ID, particularly young people, and many people face financial and social challenges in obtaining IDs in their home countries. Even those who have IDs may not feel comfortable uploading their information online. That’s why we’ve created a menu of options, giving people several ways to verify their age.
How we’re using AI to address this challenge
To develop our adult classifier, we first train an AI model on signals such as profile information, like when a person’s account was created and interactions with other profiles and content. For example, people in the same age group tend to interact similarly with certain types of content. From those signals, the model learns to make calculations about whether someone is an adult or a teen.
To evaluate the performance of the model, we develop an “evaluation dataset.” That dataset is created by having teams manually review certain data points that we believe to be strong signals of age, such as birthday posts. Identifying details are removed before these posts are shared with the team to make a determination about the age of the person who posted it. Once the team has made that determination, they label the data with a note indicating whether the post was made by an adult or a teen. These labeled data points then make up our evaluation dataset.
We then evaluate our classifier on a country-by-country basis. Before applying the classifier to a new country, we look at its performance across several criteria, including overall accuracy and accuracy across different groups of people. For example, since we use interactions with content as a signal, we look at how our model performs for people who have not been on our platform for very long and therefore have not yet interacted with much content. But the work is not done once the classifier is up and running. To check that our determinations are up-to-date, we regularly rerun the classifier to include the latest information.
Each time we retrain the model, we check its age detections against the labeled evaluation dataset to measure the model’s accuracy. We have a sophisticated framework to ensures that our evaluation dataset is representative of the people using our services and that our model accuracy metrics are generalizable to the population of people using our services.
Our adult classifier has significantly improved our ability to provide age-appropriate experiences to the people who use our services, but there is room to improve on this work. We are continuously testing new types of signals that might improve our ability to detect whether someone is a teen or adult. Whether we include a new signal type in our model is determined on a case-by-case basis; a rigorous review process assesses the data proportionality of the signal and identifies any privacy and data protection guardrails we should put into place. As we learn more about which signals make the greatest impact, we retrain the model to keep it as current and as accurate as possible. For example, we are testing AI models that use natural language processing to help determine whether a user is an adult or a teen based on writing styles common to adults or teens.
Developing human-reviewed evaluation sets for multiple languages is an important but time- and resource-intensive exercise. To get labeled data points in a language, we need to hire teams that are fluent in that language, and it can take years to get enough labels to meet the minimum criteria to feel confident that the evaluation dataset is representative of the country. This is a challenge we’re working on now and will continue to address. Our goal is to expand the use of our AI more widely across Meta technologies and in more countries globally.
We know that the more we do to solve these challenges, the more we’ll be able to help protect the people using our services. We hope that sharing our efforts to better understand age encourages others to help build on these solutions, and we can all improve together.