Saturday, January 29, 2011

How can I get diff to show only added and deleted lines? If diff can't do it, what tool can?

How can I get diff to show only added and deleted lines? If diff can't do it, what tool can?

  • That's what diff does by default... Maybe you need to add some flags to ignore whitespace?

    diff -b -B
    

    should ignore blank lines and different numbers of spaces.

    C. Ross : No, it shows CHANGED lines as well (lines that have a character or four different). I want lines that only exist in left or right.
    markdrayton : You could argue that the differing versions of a CHANGED file each exist only in left or right.
    Cian : There's no way for diff (or any other tool) to reliably tell what's a change, and what's a deleted line being replaced by a new line.
    KFro : Technically, diff treats a "changed" line as if the original line was deleted and a new line was added...so technically it is showing you only added and deleted lines.
  • comm might do what you want. From its man page:

    DESCRIPTION

    Compare sorted files FILE1 and FILE2 line by line.

    With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files.

    These columns are suppressable with -1, -2 and -3 respectively.

    Example:

    [root@dev ~]# cat a
    common
    shared
    unique
    
    [root@dev ~]# cat b
    common
    individual
    shared
    
    [root@dev ~]# comm -3 a b
        individual
    unique
    

    And if you just want the unique lines and don't care which file they're in:

    [root@dev ~]# comm -3 a b | sed 's/^\t//'
    individual
    unique
    

    As the man page says, the files must be sorted beforehand.

  • No, diff doesn't actually show the differences between two files in the way one might think. It produces a sequence of editing commands for a tool like patch to use to change one file into another.

    The difficulty for any attempt at doing what you're looking for is how to define what constitutes a line that has changed versus a deleted one followed by an added one. Also what to do when lines are added, deleted and changed adjacent to each other.

    Kamil Kisiel : My thoughts exactly. What percentage of characters in a line has to change in order to consider it a new one instead of a modification of the original? Technically even if you have one character in common, you could consider it a "change" instead of a deletion and insertion.
    Dennis Williamson : It's been a long time since I've looked at the `diff` sources, but I seem to remember all manner of gyrations to keep track of where two files match to stay in synch and I think there's a threshold for giving up based on how far apart the lines are. But I don't remember any intra-line matching except for (optionally) collapsed white space or ignoring case. Or (perhaps) words to that affect. In any case, it's all about `patch` and "vgrep" just comes along for the ride. Maybe. On Tuesday.
  • Another way to look at it:

    Show lines that only exist in file a:

    comm -23 a b
    

    Show lines that only exist in file b:

    comm -13 a b
    

    Show lines that only exist in one line or the other:

    comm -3 a b | sed 's/^\t//'
    
    From TomOnTime

0 comments:

Post a Comment