Compare Two Csv Files Using Python
Last Updated :
30 Apr, 2025
We are given two files and our tasks is to compare two CSV files based on their differences in Python. In this article, we will see some generally used methods for comparing two CSV files and print differences.
file1.csv contains
Name,Age,City
John,25,New York
Emily,30,Los Angeles
Michael,40,Chicago
file2.csv contains
Name,Age,City
John,25,New York
Michael,45,Chicago
Emma,35,San Francisco
Using compare()
compare() method in pandas is used to compare two DataFrames and return the differences. It highlights only the rows and columns where the values differ, making it ideal for structured data comparison.
Python
import pandas as pd
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
# Compare DataFrames
res = df1.compare(df2)
print(res)
Output
Using compare()Explanation: It first reads file1.csv and file2.csv into two separate DataFrames, df1 and df2. The compare() method is then applied to identify differences between the two DataFrames.
Using set operations
This method reads both files line-by-line and stores their content as sets. Using set difference (a - b) allows you to quickly identify lines that are present in one file but not the other.
Python
with open('file1.csv') as f1, open('file2.csv') as f2:
a = set(f1.readlines())
b = set(f2.readlines())
print(a - b)
print(a - b)
Output
Using set operationsExplanation: It first opens file1.csv and file2.csv, reads their contents line by line and stores them as sets a and b. The difference a - b is then printed to show lines present in file1.csv but not in file2.csv.
Using difflib
Python’s difflib module provides detailed differences between files, similar to Unix's diff command. It can generate unified or context diffs showing what was added, removed, or changed.
Python
import difflib
with open('file1.csv') as f1, open('file2.csv') as f2:
d = difflib.unified_diff(f1.readlines(), f2.readlines(), fromfile='file1.csv', tofile='file2.csv')
for line in d:
print(line, end='')
Output
Using difflibExplanation: It opens file1.csv and file2.csv, reads their contents, and uses difflib.unified_diff() to generate a line-by-line comparison. The output shows added, removed or changed lines between the two files in a unified diff format.