Embedding-based neural retrieval is a prevalent approach to address the semantic gap problem which often arises in product search on tail queries. In contrast, popular queries typically lack context and have a broad intent where additional context from users historical interaction can be helpful. In this paper, researchers share their novel approach to address both: the semantic gap problem followed by an end to end trained model for personalized semantic retrieval. They propose learning a unified embedding model incorporating graph, transformer and term-based embeddings end to end and share their design choices for optimal tradeoff between performance and efficiency. They share their learnings in feature engineering, hard negative sampling strategy, and application of transformer model, including a novel pre-training strategy and other tricks for improving search relevance and deploying such a model at industry scale. Their personalized retrieval model significantly improves the overall search experience, as measured by a 5.58% increase in search purchase rate and a 2.63% increase in site-wide conversion rate, aggregated across multiple A/B tests - on live traffic.
Jha et al. Our loss from negative examples is a weighted sum of individual from each mining strategy and we linearly update the weights during training. We warmup training with only uniform negatives and linearly decay loss weight from uniform negatives and increase weight for hard negatives as we found that it is ideal for convergence and performance.
id: a421a3df4f59e59f15f1a6138c1623fe - page: 4
4.5 Loss Function Threshold-based pruning is a widely adopted approach in the retrieval layer, employed to eliminate irrelevant candidates. Though pairwise loss functions are widely used they arent suited for threshold based pruning. Our approach incorporates a hinge loss framework to establish a threshold during model training phase itself. Since our training data consists of different kinds of interactions types and each interaction type represents a different degree of relevance, we employ a multi part hinge loss where each part is associated with a different threshold. Given output score as and true label as , our loss function can be expressed as: (, ) = (cid:32) I[ == ] ( min(0, )) (cid:33) + I (0, ) where is set of all positive interactions and is the threshold corresponding to it and I, are indicator variable and threshold for negative samples respectively. 5 ANN-BASED PRODUCT BOOSTING
id: 99a90a390ee104675e0fa78bb42fc488 - page: 4
When forming a candidate set of products for a query, it is beneficial to retrieve candidates that are both semantically relevant to the query and appealing to customers. Within Etsy, our inverted index candidate retrieval system associates products with a queryindependent quality score" (), and employs multiplicative boosting to compute a candidate score as the product of its quality score and query-listing relevance score (, ), or (, ) = (, ) (). This quality score can account for properties such as high product rating, product freshness, and shop conversion rate that are known to increase engagement independently of query-listing relevance. We implement additive boosting within ANN-based semantic retrieval by enriching our model-derived product vectors with additional numerical features and add corresponding feature weights to query vectors. Given original product embedding and query embedding , we create hydrated vectors = concat([; ()]) and = concat([; ]) where () i
id: b3e456b8f8140944c56b62cc206de3e1 - page: 4
We then model candidate score as (, ) = dot(, ) = dot(, ) + dot( (), ). For serving, we simply index hydrated product vectors rather than original product embeddings, and query our index with a hydrated query vector.
id: 7a8255e7523f3fb938431664a3787626 - page: 4