Tutorial

How to Compare Two Lists and Find What's Different

May 23, 20266 min readBy Sam A.
How to Compare Two Lists and Find What's Different

Comparing two lists means identifying which items appear in one list but not the other, which appear in both, and which have been added or removed. The result is three sets: items only in List A, items only in List B, and items in both.

This comes up constantly in data work: which emails unsubscribed since last month, which SKUs dropped from a catalog update, which users are in the test group but not the control group. The task is simple to describe and surprisingly annoying to do correctly in most tools.

Here's how to do it in a browser, in Excel, in Python, and in SQL.

Method 1: Compare Two Lists in a Browser (Fastest for One-Off Tasks)

When to use it: you have two lists you need to diff once — not as part of an automated pipeline. Paste, compare, copy the result, move on.

How to do it:

  1. Go to dedup.ing/compare-lists.
  2. Paste List A into the left textarea.
  3. Paste List B into the right textarea.
  4. Click Compare.

The tool returns three sets immediately:

  • Only in A — items that were removed or not carried over to B
  • Only in B — items that are new in B, not present in A
  • In both — items that appear in both lists unchanged

Each set has its own Copy and Download button. All processing runs in your browser — nothing is uploaded.

Options worth knowing:

  • Case-insensitive — treats Alice and alice as the same item. Enable this for email addresses, names, and any data where capitalization varies by source.
  • Trim whitespace — catches differences caused by a trailing space rather than a genuine value change.

Practical tip: always enable both options unless you have a specific reason to care about exact case. The most common cause of false "differences" in a list comparison is a capitalization inconsistency between sources, not a genuine data change.

Method 2: Compare Two Lists in Excel

When to use it: your data is already in a spreadsheet and you want to stay there.

Excel doesn't have a dedicated list-comparison feature. The standard approach uses COUNTIF to check whether each value in one list appears in the other.

Step-by-step:

Assume List A is in column A and List B is in column B, each starting at row 2.

To find items in A that are NOT in B (removed):

=IF(COUNTIF($B:$B, A2)=0, "Only in A", "In both")

Add this formula in column C, next to each row in List A. Filter column C for "Only in A".

To find items in B that are NOT in A (added):

=IF(COUNTIF($A:$A, B2)=0, "Only in B", "In both")

Add this in column D, next to each row in List B. Filter column D for "Only in B".

Important: Excel's COUNTIF is case-insensitive by default. Alice and alice will match. If you need case-sensitive comparison, use EXACT() inside a SUMPRODUCT:

=IF(SUMPRODUCT(--(EXACT($B:$B, A2)))=0, "Only in A", "In both")

Limitation: this approach works well up to around 50,000 rows. Above that, recalculation slows significantly. For large datasets, use Python or a browser tool with file upload.

Method 3: Compare Two Lists in Python

When to use it: you're comparing lists programmatically, as part of a script or pipeline, or your lists have more than 50,000 rows.

Python's built-in set type makes list comparison a three-liner:

python
list_a = set(open("list_a.txt").read().splitlines())
list_b = set(open("list_b.txt").read().splitlines())

only_in_a = list_a - list_b        # removed
only_in_b = list_b - list_a        # added
in_both   = list_a & list_b        # shared

For case-insensitive comparison:

python
list_a = set(line.strip().lower() for line in open("list_a.txt"))
list_b = set(line.strip().lower() for line in open("list_b.txt"))

The .strip() call handles leading and trailing whitespace at the same time.

For CSV files where you're comparing one column:

python
import csv

def read_column(filepath, column_name):
    with open(filepath) as f:
        return set(row[column_name].strip().lower() for row in csv.DictReader(f))

emails_a = read_column("export_jan.csv", "email")
emails_b = read_column("export_feb.csv", "email")

only_in_jan = emails_a - emails_b
only_in_feb = emails_b - emails_a
in_both     = emails_a & emails_b

set operations run in O(n) time, so this handles millions of rows without slowing down.

Method 4: Compare Two Lists in SQL

When to use it: your lists live in a database and you want to compare them without exporting.

Items in table A but not table B:

sql
SELECT a.value
FROM   list_a a
WHERE  a.value NOT IN (SELECT value FROM list_b);

Or, more efficiently with a LEFT JOIN:

sql
SELECT a.value
FROM   list_a a
LEFT JOIN list_b b ON LOWER(a.value) = LOWER(b.value)
WHERE  b.value IS NULL;

The LOWER() call makes the comparison case-insensitive. Without it, most databases (PostgreSQL in particular) will treat Alice and alice as different values and produce incorrect results.

Full three-way split:

sql
-- Only in A
SELECT 'only_in_a' AS source, a.value
FROM list_a a LEFT JOIN list_b b ON LOWER(a.value) = LOWER(b.value)
WHERE b.value IS NULL

UNION ALL

-- Only in B
SELECT 'only_in_b', b.value
FROM list_b b LEFT JOIN list_a a ON LOWER(b.value) = LOWER(a.value)
WHERE a.value IS NULL

UNION ALL

-- In both
SELECT 'in_both', a.value
FROM list_a a INNER JOIN list_b b ON LOWER(a.value) = LOWER(b.value);

Choosing the Right Method

Your situationBest method
One-off comparison, any formatBrowser tool
Data already in a spreadsheet, <50K rowsExcel COUNTIF
Automated pipeline or >50K rowsPython sets
Lists live in a databaseSQL LEFT JOIN
Need a shareable diff reportBrowser tool (download result)

The browser tool is the right default for most manual tasks. Use code when you need to repeat the comparison on updated data.

The Most Common Mistakes

Not normalizing case before comparing. User@Domain.com and user@domain.com are the same email address. If your comparison is case-sensitive and your sources have inconsistent capitalization, you'll see false differences everywhere. Enable case-insensitive matching or lowercase both lists before comparing.

Not trimming whitespace. A value that ends with a space is a different string from one that doesn't. This is the second most common cause of phantom differences in list comparisons. Trim before you compare.

Comparing the wrong column in a CSV. If you have a file with headers and you compare the entire row rather than the key column, a change in any field (even a date or a phone number) will flag the row as different. Isolate the identifier column first.

Treating "in both" as identical. Two items that share the same key value may differ in other fields. A list comparison tells you the key exists in both lists — it doesn't tell you the rest of the row is the same. If you need a full row-level diff, that's a separate operation.

Frequently Asked Questions

What does it mean when an item is "only in A"?
The item exists in List A but not in List B. Depending on your context, this means it was removed, it didn't make the cut, or it simply wasn't in the second source. It doesn't tell you why it's missing — that's a question for your data.

Can I compare two CSV files, not just plain-text lists?
Yes. The dedup.ing/compare-lists tool accepts CSV files via drag-and-drop or file upload. In the browser tool, use Advanced mode to select which column defines the comparison key. In Python, use csv.DictReader and read only the relevant column.

How do I find items that appear in both lists but have changed?
A standard list comparison only checks whether a key value is present in both lists. It doesn't compare the rest of the row. For a full row-level diff — where you want to know that user_id 1042 exists in both but their email address changed — you need a dedicated diff tool or a SQL query that joins on the key and compares the other columns explicitly.

What's the difference between a list comparison and deduplication?
Deduplication removes repeated items within a single list. List comparison finds differences between two separate lists. They're related operations — both depend on accurate matching — but they answer different questions. Deduplicate first if either list has internal duplicates; otherwise, duplicates in one list will inflate your "in both" or "only in A/B" counts.

Does the browser tool store my lists?
No. All processing runs in your browser. Your data is never uploaded to a server. If you use the optional shareable result URL feature (Pro tier), you explicitly opt in and the result is stored for 30 days before automatic deletion.

My two lists have different column orders in CSV. Does that matter?
Only if you're comparing entire rows rather than a specific column. If you select the key column (email, ID, name) for comparison, column order in the file doesn't matter. If you're doing a row-level comparison, normalize the column order first.

Get new tutorials in your inbox.

No spam, just useful updates when we ship something new or write something worth reading.

Related articles