Sunday, May 4, 2008

Compare Directories using Diff in Linux

To compare 2 files, we use the diff command. How do we compare 2 directories? Specifically, we want to know what files/subdirectories are common, what are only in 1 directory but not the other.

Unix old-timers may remember the dircmp command. Alas, that command is not available in Linux. In Linux, we use the same diff command to compare directories as well as files.

$ diff  ~peter ~george
Only in /home/peter: announce.doc
diff /home/peter/.bashrc /home/george/.bashrc
76,83d72
<
< # Customization by Peter
< export LESS=-m
< export GREP_OPTIONS='--color=always'
< shopt -s histappend
< shopt -s cmdhist
< export PROMPT_COMMAND="history -a;$PROMPT_COMMAND"
< #echo keycode 58 = Escape |loadkeys -
Only in /home/george: .mcoprc
Only in /home/peter: .metacity
Only in /home/george: .newsticker-images
Only in /home/peter: .notifier.conf
Only in /home/george: targets.txt
Only in /home/peter: .xsession-errors


Without any option, diffing 2 directories will tell you which files only exist in 1 directory and not the other, and which are common files. Files that are common in both directories (e.g., .bashrc in the above listing) are diffed to see if and how the file contents differ.

If you are NOT interested in file differences, just add the -q (or --brief) option.

diff -q ~peter ~george  |sort
Files /home/peter/.bashrc and /home/george/.bashrc differ
Only in /home/george: .mcoprc
Only in /home/george: .newsticker-images
Only in /home/george: targets.txt
Only in /home/peter: .metacity
Only in /home/peter: .notifier.conf
Only in /home/peter: .xsession-errors
Only in /home/peter: announce.doc


diff orders its output alphabetically by file/subdirectory name. I prefer to group them by whether they are common, and whether they only exist
in the first or second directory. That is why I piped the output of diff through sort in the above command.

Note that by default diff does not reach into the subdirectories to compare the files and subdirectories at that level. To change its behavior to recursively go down subdirectories, add -r.

diff -qr ~peter ~george  |sort

36 comments:

  1. Any way to do this across an SSH tunnel?

    ReplyDelete
  2. phyzome

    Assuming both files reside in remote machine.

    ssh user@123.123.123.123 diff -rq dir1 dir2


    Peter

    ReplyDelete
  3. I was hoping for something to compare remote to local.

    I'll just mount the remote filesystem using SSH-fuse or whatever.

    ReplyDelete
  4. This is wrong on some systems.... For example on Ubuntu using diff (GNU diffutils) 2.8.1, you must use the -r switch to get the full comparison.

    ReplyDelete
  5. Thanks for the article...
    I just backed up 30+ gig of data. I wanted to insure I had a good copy.
    I created an empty file in the middle of the directory structure of the backup.
    I found it as a file on backup, but not original, and three files that were not the same.
    Appreciate the guidance!

    ReplyDelete
  6. @phyzome (he won't need it anymore, but maybe someone else finds it useful):

    If you want to compare files between local and remote machines, look into "rsync". As the name sais, it's purpose is to sync them, but you can use it without actually changing anything.

    Especially if you are interested in missing/additional files AND changed files, rsync is cool, because it can compare the file contents without transfering the files over the wire, which makes it very fast.

    ReplyDelete
  7. In a situation, where I have had to compare remote directories on separate sites quickly and where time stamp was no good, I have used find with cksum to good effect.

    find . -exec cksum {} \;

    Directing the output to a text file for the two directories then left me two text files to compare.

    ReplyDelete
  8. Do you guys have a recommendation what to do in the following case: I have two potentially different directory trees, say A and B, and I want to know which files somewhere in A but not in B and vice verse.

    There are programs like "fdupes" that look for duplicate files across directory trees but they also look for duplicates within A and B. This can be rather painful if A and/or B contains a lot of files.

    In my situation I have two directory trees with pictures which I want to unify.

    ReplyDelete
  9. @Tim

    fdupes -f A B

    files from A will appear only if more than 1 equals is in A

    ReplyDelete
  10. @Tim

    You can use rsync too.

    rsync -avn source-dir/ target-dir/

    This will list the files that are different (or new) in source-dir compared to target-dir.

    If you run it without the -n option, it will copy the missing (or different) files over to the target.

    ReplyDelete
  11. my directory has objects files . but i nedd to diff between files other than object files.. can some help me..

    ReplyDelete
  12. The rsync command I posted above compares files based on last modification date and size, if you want to compare the CONTENT of the files, you can make rsync use checksums (-c):

    rsync -avcn source-dir/ target-dir

    This will list the files whose content is different in two directories. Again, if you want to replace the files that are differentin target-dir with the files from source-dir, remove the -n option.

    ReplyDelete
  13. Very nice article. It very useful way for comparison. Thanks a lot.. :)

    ReplyDelete
  14. This is a very useful post and will give people peace of mind when backing up files.

    I'm using the diff -r command before deleting the original copy of some backups that I've transferred to a new RAID array.

    Keep in mind I did run rsync twice to do this, but this data is important to me so I want to be sure nothing is missing and that the file integrity is in tact.

    ReplyDelete
  15. if you want the list of colliding filenames:

    diff -sq dir1 dir2 | grep -v "Only in"

    ReplyDelete
  16. White text on black background sucks. But thanks for the post.

    ReplyDelete
  17. If you're looking to compare the contents of two directories, and want to see how they are the same (not how they are different), and getting cmpdir isn't an option, try this:

    diff -q -y -s | grep "are identical$"

    It's ugly and IO intensive, but works for me. The diff is running with the "-s" option to also output items that are identical. The grep then only picks out the stuff that is identical and doesn't display differences.

    -A

    ReplyDelete
  18. Thank you, it helped a lot

    ReplyDelete
  19. I use vimdiff to diff a local file to a remote. i know this is old discussion but eh...

    scp://user@host//file/path/to/file/file.txt ~/local/file/path/file.txt

    ReplyDelete
  20. Your simple and clear explanation was a great help, just what I needed while I was having nightmares in front of the threatening black shell.

    I sincerely thank you :)

    PS : maybe you could update it with an easy way to export to a text file the results of a diff, but that's a detail.

    ReplyDelete
  21. @Oliver

    That's certainly an easy one, just redirect the stdout to a file with ">":

    diff -rq dir1 dir2 > dir1-dir2-diffs

    :)

    ReplyDelete
  22. I just ran across this today. I was trying to compare directories and got a "command not found" response. I appreciate the information.

    I just wish they had just created a command alias with documentation that the command had been deprecated. Imagine the script errors that happen when people pull out an old script or try to build from old sources.

    ReplyDelete
  23. what to do if i want to copy the files which are different to a third directory??

    ReplyDelete
  24. Thank you! Very useful!!

    ReplyDelete
  25. A useful trick I use a lot is:

    diff -qr dir1 dir2 | grep -v .svn | sort

    That excludes all my .svn directories, which are really irrelevant if you're doing this..

    ReplyDelete
  26. Thanks man. This post helped me with comparing two directories with lots files.

    Sincerely,
    Marc_Online_

    ReplyDelete
  27. Great productive tip. Saved me time comparing two directories.

    ReplyDelete
  28. Is there a way to get a machine-readable rather than human-readable output, particularly for "Files X and Y differ" lines? My problem with this format is that 'X' and 'Y' are not delimited in any way, so if 'X' happens to be 'foo and bar' then there is no way to parse these lines unambiguously.

    So, how do I?

    ReplyDelete
  29. If you want to just compare what files and subdirectories are different in one directory from another, ignoring differences within files and common subdirectories/files, you could add the following to your .bashrc or .bash_profile:

    dircmp() { diff -q "$@" | grep -v "^Files" | grep -v "^Common"; }

    ReplyDelete
  30. Lots of interesting tips here, but I am still confused. I tried the following commands under Ubuntu 12.04 to compare a directory (recursively) to a copy on a mounted SMB share:

    rsysn -avn

    rsysn -avnc

    diff -rq

    in increasing order of execution time.
    The rsync commands produced identical lists containing several directories and files. The diff command, however, did not report any differences. How is this possible?

    ReplyDelete
  31. I normally use ls | sort (or ls -l if I want to include file permissions, ownership, sizes and modification/access times in my comparison, or ls -R if I want to do a recursive comparison). So I use it like this:

    ls first_directory | sort > ~/tempfile1
    ls second_directory | sort > ~/tempfile2
    diff ~/tempfile1 ~/tempfile2

    Of course you have to pipe it through sort as well because otherwise the lines sometimes come out in a different order even if they are the same. I have never had any problems with using this method but it is nice to have a slightly more elegant way of doing this!

    ReplyDelete
  32. It's a really useful nice post. For those who are interested I added two tools that are really
    powerful.

    Tools like "rsync" and "diff" are really great. I would also mention "unison" and the new shining "syncany".

    UNISON: To sync to folders with unison simply do a:
    unison folderA folderB

    In contrast to rsync unison does a sync in both directions. Further more unison includes rsync and works on windows, linux and mac. Unison is a very mature and stable software.

    SYNCANY:Use
    sy status
    to show differences
    sy up
    to upoload differences
    sy down
    to download differences

    syncany has also version control capilities. Nevertheless syncany is in Alpha status as in 2016.

    Both tools are for advanced usage and a simple "diff -rq foderA folderB" does the job. With rsync, unison and syncany you can automate tasks, share folders like dropbox and sync them over the internet.

    Greetings and hopefully it helped you.

    ReplyDelete
  33. Diff folders by checksums, between 'local' and SSH 'remote':

    rsync --dry-run -aci --delete /local/path/ -e "ssh -i ~/.ssh/sshkey" user@domain.de:/remote/path/

    Do a simulation (-n, --dry-run) only, as if you were archiving (-a, --archive),
    comparison based checksum (-c. --checksum), output a change-summary for all updates
    ( -i, --itemize-changes), as if you were mirroring to destination (--delete).

    ReplyDelete
  34. we can use diff -rq path1(dir)/ path2(dir) > list.txt
    all the compared list will be saved in list.txt

    ReplyDelete
  35. Thanks, it works.

    ReplyDelete
  36. Thank you very much, you saved me a lot of time !

    ReplyDelete