Airbnb's Attribute Prioritization System: Machine Learning for Extracting Guest Preferences from Unstructured Text
Airbnb’s Attribute Prioritization System uses a machine‑learning pipeline called LATEX to extract and map guest‑mentioned amenities, activities and places from reviews, messages and tickets, then predicts and ranks the most important attributes per listing, giving hosts personalized suggestions to improve listings and match traveler needs.
Airbnb aims to create comfortable experiences by understanding guest needs and matching them with hosts. Guest feedback is a valuable resource.
They built the Attribute Prioritization System (APS) to deeply understand guest requirements such as facility requests, comments, and support tickets, and how these are influenced by location, type, price, and travel purpose.
Using a scalable data‑driven platform, APS provides personalized suggestions to hosts about which property attributes to acquire, showcase, and verify, and helps guests see listings that best match their travel goals.
The core of APS is a machine‑learning pipeline called LATEX (Listing Attribute Extraction). LATEX processes large volumes of unstructured text (messages, reviews, support tickets, listing descriptions) in two steps:
Named Entity Recognition (NER) module, built with a textCNN model fine‑tuned on internally annotated data, extracts key phrases belonging to five categories: amenities, activities, events, specific attractions, and generic places.
Entity Mapping module maps the extracted phrases to standardized property attributes using an unsupervised approach that computes cosine similarity between phrase embeddings and attribute label embeddings, producing a confidence score.
Frequency counts of mapped attributes across different text sources are aggregated; attributes mentioned more often are considered more important.
Personalized attribute rankings are computed for each listing by predicting the mention frequency of each attribute based on location, type, capacity, and luxury level, then scoring importance to produce a ranked list.
Examples show that for a mountain cabin (capacity 6, $50/night) the top attributes are hot tub, fireplace, lake view, mountain view, grill, and kayak, whereas for a city apartment the top attributes are parking, restaurant, convenience store, and subway station.
To handle sparse data for niche listing categories, a Bayesian inference model and a supervised neural network (WiDeText) are used to estimate expected frequencies and the probability that a guest will confirm an attribute.
The system enables hosts to receive actionable suggestions such as adding frequently asked‑for amenities, highlighting well‑reviewed features, and clarifying potential sources of guest confusion.
Future work includes surfacing the most important facilities directly on listing detail pages, expanding the model to more languages and data sources, and providing hosts with additional prompts (e.g., dynamic pricing, discount strategies) to improve listing attractiveness.
Airbnb Technology Team
Official account of the Airbnb Technology Team, sharing Airbnb's tech innovations and real-world implementations, building a world where home is everywhere through technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.