Leveraging Data Recasting to Enhance Tabular Reasoning

About

Creating challenging tabular inference data is essential for learning complex reasoning. Prior work has mostly relied on two data generation strategies. The first is human annotation, which yields linguistically diverse data but is difficult to scale. The second category for creation is synthetic generation, which is scalable and cost effective but lacks inventiveness. In this research, we present a framework for semi-automatically recasting existing tabular data to make use of the benefits of both approaches. We utilize our framework to build tabular NLI instances from five datasets that were initially intended for tasks like table2text creation, tabular Q/A, and semantic parsing. We demonstrate that recasted data could be used as evaluation benchmarks as well as augmentation data to enhance performance on tabular NLI tasks. Furthermore, we investigate the effectiveness of models trained on recasted data in the zero-shot scenario, and analyse trends in performance across different recasted datasets types.

Framework

Pipeline for generating recasted NLI data. We first create entailments and contradictions from the given base annotation. We then create a counterfactual table taking a contradiction to be the new base annotation. subscriptOG represents the “Original” table and subscriptCF represents the “Counterfactual” table. Note that Base EntailmentOG contradicts TableCF and Base EntailmentCF contradicts TableOG. This pair will always exhibit this property, but there can be statements which entail (or contradict) both OG and CF tables.

Source Datasets and Recasted Datasets

Using the framework outlined above, we recast the five datasets listed below. All datasets utilise open-domain Wikipedia tables, comparable to TabFact. In addition, these datasets and TabFact share reasoning kinds such as counting, minimum/- maximum, ranking, superlatives, comparatives, and uniqueness, among others. Source datasets and statistics for various recasted datasets are gieven below. QA-TNLI combines recasted data from both FeTaQA and WikiTableQuestions. Test splits are created by randomly sampling 10% samples from each dataset.

Source Dataset Generated Dataset
Dataset Task NLI-Dataset Entail | Contradict | Total
WikiTableQuestions (Pasupat and Liang, 2015a) Table Question Answering QA-TNLI 32k | 77k | 109k
FeTaQA (Nan et al., 2022) Table Question Answering
WikiSQL (Zhong et al., 2017) Table Semantic Parsing WikiSQL-TNLI 300k | 385k | 685k
Squall (Shi et al., 2020b) Table Semantic Parsing Squall-TNLI 105k | 93k | 198k
ToTTo (Parikh et al., 2020) Table To Text Generation ToTTo-TNLI 493k | 357k | 850k