| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Wed, 22 Apr 2020 23:08:55 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"5ea0ce87-28ae"
expires: Mon, 29 Dec 2025 01:03:52 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: D2E3:3655F2:8266D7:926E09:6951D11E
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 00:53:52 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210075-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766969633.522772,VS0,VE207
vary: Accept-Encoding
x-fastly-request-id: ae366fb1fa9654228918bf5f33bc3d6887bcc854
content-length: 2862
TabFact
Meet TABFACT!
A large-scale dataset with 16k Wikipedia tables as evidence for 118k human annotated statements to study fact verification with semi-structured evidence.
Why TABFACT?
HIGH-QUALITY
Mechanical Turk + Post filtering
LARGE-SCALE
16k Wikipedia tables as evidence for 118k human annotated statements for verification.
LOGIC-BASED
Natural language inference based on logic reasoning.
Open-Domain
Reasoning over open domain Wikitables
Explore
We have designed an interface for you to view the data, please click here to explore the dataset and have fun!
Example
In the task, you are given a Wikipedia table with its caption, the goal is to distinguish which statements are entailed by the table and which are refuted by it, an example is shown below:
Download (Train/Test Data, Code)
Statistics
The statements are collected through two channels, a simpler one and complex one. They involve reasoning of different difficulty levels. We demonstrate the proportion of higher-order semantics in the annotated statements for the two channels as follows:
Paper
Please cite our paper as below if you use the TabFact dataset.
@inproceedings{2019TabFactA,
title={TabFact : A Large-scale Dataset for Table-based Fact Verification},
author={Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou and William Yang Wang},
booktitle = {International Conference on Learning Representations (ICLR)},
address = {Addis Ababa, Ethiopia},
month = {April},
year = {2020}
}
Acknowledgement
We sincerely acknowledge Ice Pasupat for releasing his complex table-QA dataset
and Victor Zhong for his WikiSQL dataset, our work is deeply inspired by these brilliant papers. We also thank Jiawei Wu and Xin Wang for sharing their website template.
Copyright © UCSB NLP Group
3530 Phelps Hall
University of California, Santa Barbara
Santa Barbara, CA 93106-5110
TabFact is produced by USCB NLP Lab.
The dataset is under a Creative Commons Attribution 4.0 International License.
Contact the TabFact author by wenhuchen@cs.ucsb.edu.