Open-domain question answering requires retrieval systems able to cope with the diverse and varied nature of questions, providing accurate answers across a broad spectrum of query types and topics. To deal with such topic heterogeneity through a unique model, we propose DESIRE-ME, a neural information retrieval model that leverages the Mixture-of-Experts framework to combine multiple specialized neural models. We rely on Wikipedia data to train an effective neural gating mechanism that classifies the incoming query and that weighs the predictions of the different domain-specific experts correspondingly. This allows DESIRE-ME to specialize adaptively in multiple domains. Through extensive experiments on publicly available datasets, we show that our proposal can effectively generalize domain-enhanced neural models. DESIRE-ME excels in handling open-domain questions adaptively, boosting by up to 12% in NDCG@10 and 22% in P@1, the underlying state-of-the-art dense retrieval model.
Metrics and baselines. We assess the results of the experiments using: MAP100, MRR100, R100, NDCG10, NDCG3 and P1. While NDCG10 and R100 are commonly used on BEIR benchmarks, the additional metrics allow us to have a deeper understanding of the potential improvement of DESIRE-ME at small cutoffs. We also report statistically significant differences according to a Bonferroni corrected two-sided paired Students t-tests with p-value < 0.001. We rely on the ranx library for evaluation. To simplify comparative evaluations and to give the possibility of computing other evaluation metrics, all the runs are made publicly available on ranxhub7 . We compare DESIRE-ME variants integrated within the following different state-of-the-art dense retrieval models8: COCO-DR, COCO-DRXL, and Contriever against the following baselines, for each dense retrieval model: Base. The original dense retrieval model without MoE in a zero-shot sce-
id: f3d337652943b3a2b29a5c234ade6b90 - page: 9
Fine-tuned. We fine-tuned the base models with the training data with a batch size of 32 and a learning rate of 106 for 10 epochs. All the other training hyper-parameters are taken from their original settings. Random gating (RND-G). We use randomly generated weights to merge specializers outputs. This baseline is introduced to assess the benefits of our supervised gating function. All other DESIRE-ME settings are unchanged.
id: 120783291b2b3db367136f91cbfccb4c - page: 9
4.2 Results and Discussion Answering RQ1. To answer RQ1, we conduct multiple experiments using the NQ, HotpotQA, and FEVER datasets to assess DESIRE-ME capability to enhance the effectiveness of the underlying dense retrieval model. The results on the three datasets are reported in Table 2, Table 3, and Table 4, respectively. Table 2 reports the results of the experiments conducted with the NQ dataset. The figures reported in the table show that fine-tuning the base model using the training data does not yield any benefit and that the integration of DESIREME into the different dense retrieval systems always results in a remarkable improvement of the performances. Irrespective of the metrics considered and the dense retriever used, our solution boosts the base models of a statistically significant margin. The Contriever relative improvement reaches an astonishing 12% in 7 8 Available on HuggingFace: COCO-DR, COCO-DRXL and Contriever. 9 10
id: 15ec8e9ce660230604899c1928c63383 - page: 9
Kasela et al. Table 2. Results on the NQ dataset. In italic the best results per model, in boldface the best results overall. Symbol * indicates a statistically significant difference over Base, Fine-tuned and RND-G. Retriever Variant MAP100 MRR100 R100 NDCG10 P1 NDCG3 BM25 0.292 0.295 0.758 0.339 0.198 0.268 COCO-DR 0.441 0.433 0.434 DESIRE-ME 0.463 * Base Fine-tuned RND-G 0.455 0.446 0.448 0.477 * 0.923 0.942 0.926 0.941 0.504 0.501 0.499 0.526 * 0.325 0.310 0.313 0.339 * 0.424 0.411 0.417 0.448 * Contriever 0.432 0.427 0.441 DESIRE-ME 0.493 * Base Fine-tuned RND-G 0.446 0.438 0.457 0.511 * 0.927 0.940 0.928 0.941 0.498 0.497 0.510 0.559 * 0.311 0.295 0.320 0.379 * 0.414 0.406 0.426 0.480 * COCO-DRXL 0.480 0.465 0.488 DESIRE-ME 0.510 *
id: aeffda9eab239893e725fd35e241bbc4 - page: 10