I have 2 HTML files which I believe are identical, but certain versions of
diff are giving conflicting results.
The 2 files have been downloaded from a remote URL using PHP curl (http://php.net/manual/en/book.curl.php). The files were downloaded on different days, but I believe the content has not changed, including any of the markup. The purpose of the application is actually to determine whether there were changes or not.
The file sizes are identical; both 358,341 bytes. A visual inspection of the content shows them to be identical.
To make sure there are no differences in the markup or other contents I’ve used DiffMerge on my local machine and it’s reporting that the files are identical.
However, when I ssh into a centOS server and do a comparison, it’s showing the following from running
diff file1.html file2.html
12159,12161c12159,12161 < < < --- > > > 12163,12172c12163,12172 < < < < < < < < < < --- > > > > > > > > > > 12174c12174 < --- >
When I look at those line numbers in a text editor there are no noticeable differences. What does this output actually mean?
I’ve also used a web-based diff tool, https://github.com/chrisboulton/php-diff, which is reporting exactly the same line numbers as being different. However the output when viewed in “side by side” mode (
file2.html) is exactly the same!
Does anyone have any ideas how to debug this or what the issue may be? The files were downloaded using the same script and method in both instances and, as far as I know, there’s no encoding differences.