18. Future Projects

Here are some ideas for improving GNU diff and patch. The GNU project has identified some improvements as potential programming projects for volunteers. You can also help by reporting any bugs that you find.

If you are a programmer and would like to contribute something to the GNU project, please consider volunteering for one of these projects. If you are seriously contemplating work, please write to [email protected] to coordinate with other volunteers.

18.1 Suggested Projects for Improving GNU diff and patch

One should be able to use GNU diff to generate a patch from any pair of directory trees, and given the patch and a copy of one such tree, use patch to generate a faithful copy of the other. Unfortunately, some changes to directory trees cannot be expressed using current patch formats; also, patch does not handle some of the existing formats. These shortcomings motivate the following suggested projects.

18.1.1 Handling Multibyte and Varying-Width Characters

diff, diff3 and sdiff treat each line of input as a string of unibyte characters. This can mishandle multibyte characters in some cases. For example, when asked to ignore spaces, diff does not properly ignore a multibyte space character.

Also, diff currently assumes that each byte is one column wide, and this assumption is incorrect in some locales, e.g., locales that use UTF-8 encoding. This causes problems with the `-y' or `--side-by-side' option of diff.

These problems need to be fixed without unduly affecting the performance of the utilities in unibyte environments.

The IBM GNU/Linux Technology Center Internationalization Team has proposed some patches to support internationalized diff http://oss.software.ibm.com/developer/opensource/linux/patches/i18n/diffutils-2.7.2-i18n-0.1.patch.gz. Unfortunately, these patches are incomplete and are to an older version of diff, so more work needs to be done in this area.

18.1.2 Handling Changes to the Directory Structure

diff and patch do not handle some changes to directory structure. For example, suppose one directory tree contains a directory named `D' with some subsidiary files, and another contains a file with the same name `D'. `diff -r' does not output enough information for patch to transform the directory subtree into the file.

There should be a way to specify that a file has been removed without having to include its entire contents in the patch file. There should also be a way to tell patch that a file was renamed, even if there is no way for diff to generate such information. There should be a way to tell patch that a file's time stamp has changed, even if its contents have not changed.

These problems can be fixed by extending the diff output format to represent changes in directory structure, and extending patch to understand these extensions.

18.1.3 Files that are Neither Directories Nor Regular Files

Some files are neither directories nor regular files: they are unusual files like symbolic links, device special files, named pipes, and sockets. Currently, diff treats symbolic links like regular files; it treats other special files like regular files if they are specified at the top level, but simply reports their presence when comparing directories. This means that patch cannot represent changes to such files. For example, if you change which file a symbolic link points to, diff outputs the difference between the two files, instead of the change to the symbolic link.

diff should optionally report changes to special files specially, and patch should be extended to understand these extensions.

18.1.4 File Names that Contain Unusual Characters

When a file name contains an unusual character like a newline or white space, `diff -r' generates a patch that patch cannot parse. The problem is with format of diff output, not just with patch, because with odd enough file names one can cause diff to generate a patch that is syntactically correct but patches the wrong files. The format of diff output should be extended to handle all possible file names.

18.1.5 Outputting Diffs in Time Stamp Order

Applying patch to a multiple-file diff can result in files whose time stamps are out of order. GNU patch has options to restore the time stamps of the updated files (see section 10.5 Updating Time Stamps on Patched Files), but sometimes it is useful to generate a patch that works even if the recipient does not have GNU patch, or does not use these options. One way to do this would be to implement a diff option to output diffs in time stamp order.

18.1.6 Ignoring Certain Changes

It would be nice to have a feature for specifying two strings, one in from-file and one in to-file, which should be considered to match. Thus, if the two strings are `foo' and `bar', then if two lines differ only in that `foo' in file 1 corresponds to `bar' in file 2, the lines are treated as identical.

It is not clear how general this feature can or should be, or what syntax should be used for it.

A partial substitute is to filter one or both files before comparing, e.g.:

sed 's/foo/bar/g' file1 | diff - file2

However, this outputs the filtered text, not the original.

18.2 Reporting Bugs

If you think you have found a bug in GNU cmp, diff, diff3, or sdiff, please report it by electronic mail to the GNU utilities bug report mailing list [email protected]. Please send bug reports for GNU patch to [email protected]. Send as precise a description of the problem as you can, including the output of the `--version' option and sample input files that produce the bug, if applicable. If you have a nontrivial fix for the bug, please send it as well. If you have a patch, please send it too. It may simplify the maintainer's job if the patch is relative to a recent test release, which you can find in the directory ftp://alpha.gnu.org/gnu/diffutils/.

