Visual Question Answering (VQA) Task

From GM-RKB
Jump to navigation Jump to search

A Visual Question Answering (VQA) Task is a QA task that is also a vision-and-language task.



References

2022

  • (Schwenk et al., 2022) ⇒ Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, Kenneth Marino, and Roozbeh Mottaghi. (2022). “A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge." In: European Conference on Computer Vision, pages 146-162. Cham: Springer Nature Switzerland.
    • QUOTE: "The Visual Question Answering (VQA) task aspires to provide a meaningful testbed for the development of AI models that can jointly reason over visual and natural language inputs. Despite a proliferation of VQA datasets, this goal is hindered by a set of common limitations. These include a reliance on relatively simplistic questions that are repetitive in both concepts and linguistic structure, little world knowledge needed outside of the paired image, and limited reasoning required to arrive at the correct answer."

2015