PolyDPR
Introduction
-
What is the name of the poly-DPR paper?
Improving Biomedical Information Retrieval with Neural Retrievers
-
What are the main contributions of the poly -DPR paper?
- each context is represented by k vectors
- still employ MIPS
- TempQG: template based question generation method
- can generate a large number of in domain questions
- 2 pretraining tasks: ETM and RSM
- each context is represented by k vectors
Method
-
Poly DPR uses
colBERT style
maxsim -
TempQG uses a seq2seq model with a
passage and template
as input - TempQG templates are generated by
masking low frequency tagged biological entities
- also do DPR with text input to select
-
PolyDPR pretraining task: Extended Title Mapping retreives an abstract based on
title + concatenated top tfidf words
Results
- Poly-DPR results: best performance with most
granular sub document representations
- obvious tradeoff with time
Conclusions
- good experimental evidence about using subdocument representations vs encoding entire document
- Compare against T5query uses in GPL
Reference
@article{DBLP:journals/corr/abs-2201-07745,
author = {Man Luo and
Arindam Mitra and
Tejas Gokhale and
Chitta Baral},
title = {Improving Biomedical Information Retrieval with Neural Retrievers},
journal = {CoRR},
volume = {abs/2201.07745},
year = {2022},
url = {https://arxiv.org/abs/2201.07745},
eprinttype = {arXiv},
eprint = {2201.07745},
timestamp = {Fri, 21 Jan 2022 13:57:15 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2201-07745.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}