Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News

This repository contains a PyTorch implementation of the paper Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News accepted at EMNLP 2020. If you find this implementation or the paper helpful, please consider citing:

@InProceedings{tanDIDAN2020,
     author={Reuben Tan and Bryan A. Plummer and Kate Saenko},
     title={Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News},
     booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
     year={2020} }

Dependencies

Python 3.6
Pytorch version 1.2.0

Download NeuralNews Dataset

Please follow the instructions here (https://cs-people.bu.edu/rxtan/projects/didan/) to download the NeuralNews dataset. In particular, download this file (https://drive.google.com/file/d/1vD4DtyJOIjRzchPtCQu-KPrUjgTiWSmo/view?usp=drive_link) and place it into the data folder.

Preprocess Data

Image Features

For each image, we extract 36 region features using a Faster-RCNN model (https://github.com/peteanderson80/bottom-up-attention) that is pretrained on Visual Genome. The region features for each image is stored separately as a .npy file.

Language Features

To convert the articles and captions into the required input format, please go to https://github.com/nlpyang/PreSumm/blob/master/README.md and carry out steps 3 to 5 of data preparation.

Named Entities

We use the SpaCY python library to parse the articles and captions to detect named entities. We store this information as dictionary where the keys are the article names and the values are sets of detected name entities.

Required Arguments

captioning_dataset_path: Path to GoodNews captioning dataset json file
fake_articles: Path to generated articles
image_representations_dir: Directory which contains the object representations of images
real_articles_dir: Directory which contains the preprocessed Torch text files for real articles
fake_articles_dir: Directory which contains the preprocessed Torch text files for generated articles
real_captions_dir: Directory which contains the preprocessed Torch text files for real captions
ner_dir: Directory which contains a dictionary of named entities for each article and caption

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
models		models
README.md		README.md
didan_dataloader.py		didan_dataloader.py
didan_model.py		didan_model.py
didan_train.py		didan_train.py
motivational.png		motivational.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News

Dependencies

Download NeuralNews Dataset

Preprocess Data

Image Features

Language Features

Named Entities

Required Arguments

About

Uh oh!

Releases

Packages

Languages

rxtan2/DIDAN

Folders and files

Latest commit

History

Repository files navigation

Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News

Dependencies

Download NeuralNews Dataset

Preprocess Data

Image Features

Language Features

Named Entities

Required Arguments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages