| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Thu, 27 Oct 2022 03:03:23 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"6359f4fb-2485"
expires: Mon, 29 Dec 2025 22:44:58 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: DA68:36A0B4:95BD51:A8118F:69530212
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 22:34:58 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210090-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767047699.747997,VS0,VE205
vary: Accept-Encoding
x-fastly-request-id: 0be30ffa617eb20338846e535ff48255cf96baa8
content-length: 2498
HybridQA
Meet HybridQA!
A large-scale multi-hop question answering dataset over heterogenesous information of both structured tabular and unstructured textual forms.
Why Hybrid Question Answering?
HIGH-QUALITY
Mechanical Turk;
Strict Quality Control
LARGE-SCALE
13k Wikipedia Tables;
293K hyperlinked passages;
70K natural questions.
HYRBID
Semantic Understanding;
Symbolic Reasoning.
Open-Domain
Reasoning over open domain Wikitables
Explore
We have designed an interface for you to view the data, please click here to explore the dataset and have fun!
Example
In the task, you are given a Wikipedia table with its hyperlinked passages, the goal is to answer a multi-hop question which involves informatino from both information forms (structured and unstructured data):
Download (Train/Test Data, Code)
Reasoning Types
The questions require multi-hops between two information forms, the most typic reasoning patterns are demonstrated as follows:
Paper
Please cite our paper as below if you use the Hybrid dataset.
@article{chen2020hybridqa,
title={HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data},
author={Chen, Wenhu and Zha, Hanwen and Chen, Zhiyu and Xiong, Wenhan and Wang, Hong and Wang, William},
journal={Findings of EMNLP 2020},
year={2020}
}
Copyright © UCSB NLP Group
3530 Phelps Hall
University of California, Santa Barbara
Santa Barbara, CA 93106-5110
Hybrid is produced by USCB NLP Lab.
The dataset is under a Creative Commons Attribution 4.0 International License.
Contact the author by wenhuchen@cs.ucsb.edu.