Sunday, May 4, 2008

Compare Directories using Diff in Linux

To compare 2 files, we use the diff command. How do we compare 2 directories? Specifically, we want to know what files/subdirectories are common, what are only in 1 directory but not the other.

Unix old-timers may remember the dircmp command. Alas, that command is not available in Linux. In Linux, we use the same diff command to compare directories as well as files.

$ diff  ~peter ~george
Only in /home/peter: announce.doc
diff /home/peter/.bashrc /home/george/.bashrc
76,83d72
<
< # Customization by Peter
< export LESS=-m
< export GREP_OPTIONS='--color=always'
< shopt -s histappend
< shopt -s cmdhist
< export PROMPT_COMMAND="history -a;$PROMPT_COMMAND"
< #echo keycode 58 = Escape |loadkeys -
Only in /home/george: .mcoprc
Only in /home/peter: .metacity
Only in /home/george: .newsticker-images
Only in /home/peter: .notifier.conf
Only in /home/george: targets.txt
Only in /home/peter: .xsession-errors


Without any option, diffing 2 directories will tell you which files only exist in 1 directory and not the other, and which are common files. Files that are common in both directories (e.g., .bashrc in the above listing) are diffed to see if and how the file contents differ.

If you are NOT interested in file differences, just add the -q (or --brief) option.

diff -q ~peter ~george  |sort
Files /home/peter/.bashrc and /home/george/.bashrc differ
Only in /home/george: .mcoprc
Only in /home/george: .newsticker-images
Only in /home/george: targets.txt
Only in /home/peter: .metacity
Only in /home/peter: .notifier.conf
Only in /home/peter: .xsession-errors
Only in /home/peter: announce.doc


diff orders its output alphabetically by file/subdirectory name. I prefer to group them by whether they are common, and whether they only exist
in the first or second directory. That is why I piped the output of diff through sort in the above command.

Note that by default diff does not reach into the subdirectories to compare the files and subdirectories at that level. To change its behavior to recursively go down subdirectories, add -r.

diff -qr ~peter ~george  |sort

36 comments:

Anonymous said...

Any way to do this across an SSH tunnel?

Peter Leung said...

phyzome

Assuming both files reside in remote machine.

ssh user@123.123.123.123 diff -rq dir1 dir2


Peter

Anonymous said...

I was hoping for something to compare remote to local.

I'll just mount the remote filesystem using SSH-fuse or whatever.

Anonymous said...

This is wrong on some systems.... For example on Ubuntu using diff (GNU diffutils) 2.8.1, you must use the -r switch to get the full comparison.

Anonymous said...

Thanks for the article...
I just backed up 30+ gig of data. I wanted to insure I had a good copy.
I created an empty file in the middle of the directory structure of the backup.
I found it as a file on backup, but not original, and three files that were not the same.
Appreciate the guidance!

Brian Schimmel said...

@phyzome (he won't need it anymore, but maybe someone else finds it useful):

If you want to compare files between local and remote machines, look into "rsync". As the name sais, it's purpose is to sync them, but you can use it without actually changing anything.

Especially if you are interested in missing/additional files AND changed files, rsync is cool, because it can compare the file contents without transfering the files over the wire, which makes it very fast.

Rob Staveley (Tom) said...

In a situation, where I have had to compare remote directories on separate sites quickly and where time stamp was no good, I have used find with cksum to good effect.

find . -exec cksum {} \;

Directing the output to a text file for the two directories then left me two text files to compare.

Tim said...

Do you guys have a recommendation what to do in the following case: I have two potentially different directory trees, say A and B, and I want to know which files somewhere in A but not in B and vice verse.

There are programs like "fdupes" that look for duplicate files across directory trees but they also look for duplicates within A and B. This can be rather painful if A and/or B contains a lot of files.

In my situation I have two directory trees with pictures which I want to unify.

volty said...

@Tim

fdupes -f A B

files from A will appear only if more than 1 equals is in A

Anonymous said...

@Tim

You can use rsync too.

rsync -avn source-dir/ target-dir/

This will list the files that are different (or new) in source-dir compared to target-dir.

If you run it without the -n option, it will copy the missing (or different) files over to the target.

Anonymous said...

my directory has objects files . but i nedd to diff between files other than object files.. can some help me..

Anonymous said...

The rsync command I posted above compares files based on last modification date and size, if you want to compare the CONTENT of the files, you can make rsync use checksums (-c):

rsync -avcn source-dir/ target-dir

This will list the files whose content is different in two directories. Again, if you want to replace the files that are differentin target-dir with the files from source-dir, remove the -n option.

Parapat's Notes said...

Very nice article. It very useful way for comparison. Thanks a lot.. :)

realtechtalk said...

This is a very useful post and will give people peace of mind when backing up files.

I'm using the diff -r command before deleting the original copy of some backups that I've transferred to a new RAID array.

Keep in mind I did run rsync twice to do this, but this data is important to me so I want to be sure nothing is missing and that the file integrity is in tact.

AlliXSenoS said...

if you want the list of colliding filenames:

diff -sq dir1 dir2 | grep -v "Only in"

Anonymous said...

White text on black background sucks. But thanks for the post.

Anonymous said...

If you're looking to compare the contents of two directories, and want to see how they are the same (not how they are different), and getting cmpdir isn't an option, try this:

diff -q -y -s | grep "are identical$"

It's ugly and IO intensive, but works for me. The diff is running with the "-s" option to also output items that are identical. The grep then only picks out the stuff that is identical and doesn't display differences.

-A

Anonymous said...

Thank you, it helped a lot

Anonymous said...

I use vimdiff to diff a local file to a remote. i know this is old discussion but eh...

scp://user@host//file/path/to/file/file.txt ~/local/file/path/file.txt

Oliver said...

Your simple and clear explanation was a great help, just what I needed while I was having nightmares in front of the threatening black shell.

I sincerely thank you :)

PS : maybe you could update it with an easy way to export to a text file the results of a diff, but that's a detail.

Pedro Bezunartea López said...

@Oliver

That's certainly an easy one, just redirect the stdout to a file with ">":

diff -rq dir1 dir2 > dir1-dir2-diffs

:)

Unknown said...

I just ran across this today. I was trying to compare directories and got a "command not found" response. I appreciate the information.

I just wish they had just created a command alias with documentation that the command had been deprecated. Imagine the script errors that happen when people pull out an old script or try to build from old sources.

Anonymous said...

what to do if i want to copy the files which are different to a third directory??

chk said...

Thank you! Very useful!!

Lee V. Mangold said...

A useful trick I use a lot is:

diff -qr dir1 dir2 | grep -v .svn | sort

That excludes all my .svn directories, which are really irrelevant if you're doing this..

Marc Online said...

Thanks man. This post helped me with comparing two directories with lots files.

Sincerely,
Marc_Online_

Purvi said...

Great productive tip. Saved me time comparing two directories.

BASTA! said...

Is there a way to get a machine-readable rather than human-readable output, particularly for "Files X and Y differ" lines? My problem with this format is that 'X' and 'Y' are not delimited in any way, so if 'X' happens to be 'foo and bar' then there is no way to parse these lines unambiguously.

So, how do I?

Evan Donovan said...

If you want to just compare what files and subdirectories are different in one directory from another, ignoring differences within files and common subdirectories/files, you could add the following to your .bashrc or .bash_profile:

dircmp() { diff -q "$@" | grep -v "^Files" | grep -v "^Common"; }

Anonymous said...

Lots of interesting tips here, but I am still confused. I tried the following commands under Ubuntu 12.04 to compare a directory (recursively) to a copy on a mounted SMB share:

rsysn -avn

rsysn -avnc

diff -rq

in increasing order of execution time.
The rsync commands produced identical lists containing several directories and files. The diff command, however, did not report any differences. How is this possible?

Micheal Johnson said...

I normally use ls | sort (or ls -l if I want to include file permissions, ownership, sizes and modification/access times in my comparison, or ls -R if I want to do a recursive comparison). So I use it like this:

ls first_directory | sort > ~/tempfile1
ls second_directory | sort > ~/tempfile2
diff ~/tempfile1 ~/tempfile2

Of course you have to pipe it through sort as well because otherwise the lines sometimes come out in a different order even if they are the same. I have never had any problems with using this method but it is nice to have a slightly more elegant way of doing this!

Anonymous said...

It's a really useful nice post. For those who are interested I added two tools that are really
powerful.

Tools like "rsync" and "diff" are really great. I would also mention "unison" and the new shining "syncany".

UNISON: To sync to folders with unison simply do a:
unison folderA folderB

In contrast to rsync unison does a sync in both directions. Further more unison includes rsync and works on windows, linux and mac. Unison is a very mature and stable software.

SYNCANY:Use
sy status
to show differences
sy up
to upoload differences
sy down
to download differences

syncany has also version control capilities. Nevertheless syncany is in Alpha status as in 2016.

Both tools are for advanced usage and a simple "diff -rq foderA folderB" does the job. With rsync, unison and syncany you can automate tasks, share folders like dropbox and sync them over the internet.

Greetings and hopefully it helped you.

mbless.de said...

Diff folders by checksums, between 'local' and SSH 'remote':

rsync --dry-run -aci --delete /local/path/ -e "ssh -i ~/.ssh/sshkey" user@domain.de:/remote/path/

Do a simulation (-n, --dry-run) only, as if you were archiving (-a, --archive),
comparison based checksum (-c. --checksum), output a change-summary for all updates
( -i, --itemize-changes), as if you were mirroring to destination (--delete).

Unknown said...

we can use diff -rq path1(dir)/ path2(dir) > list.txt
all the compared list will be saved in list.txt

Anonymous said...

Thanks, it works.

Anonymous said...

Thank you very much, you saved me a lot of time !