[译]差异工具
By robot-v1.0
本文链接 https://www.kyfws.com/applications/diff-tool-zh/
版权声明 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
- 10 分钟阅读 - 4966 个词 阅读量 0[译]差异工具
原文地址:https://www.codeproject.com/Articles/3666/Diff-tool
原文作者:Stephane Rodriguez.
译文由本站 robot-v1.0 翻译
前言
A simple diff tool, usable on arbitrary file formats, with a nice html rendering.
一个简单的diff工具,可用于任意文件格式,并带有漂亮的html渲染.
一个简单但有用的diff工具,带有html前端(A simple yet useful diff tool with html frontend)
本文提供了一个简单的C ++ WIN32工具来对任意文件执行比较.它还具有一个不错且可行的html输出.(This article provides a simple C++ WIN32 tool to perform diffs on arbitrary files. It also features a nice and workable html output.)
1.差异工具(1. Diff tools)
那么,您告诉我为什么在devstudio软件包中已经包含windiff的情况下,为什么还需要一个diff工具?当您无事可做时,windiff会很不错,但是请记住,此工具有很多需求,尤其是差异显示方式的无效方式.(So you tell me why the hell should I need a diff tool while I already have windiff in the devstudio package ? windiff is great when you have nothing else to do your work, but heck this tool leaves a lot to desire, especially the unproductive way of how the diffs are presented.)
毕竟,如果差异太大了,以至于您花时间只是想弄清楚它们,为什么不应该将windiff升级到更好的东西.(After all, if the diffs are so badly presented that you spend time just to figure them out, why shouldn’t windiff be upgraded to something better.)
就是说,如果您有兴趣进行合并和比较,可以购买第三方工具,例如(That said, if you are interested in merging as well as diffing, you can buy a third party tool such as) [轴数^](http://www.araxis.com) ,或免费工具,例如(, or a free tool such as) [Winmerge ^]() .如果您专注于Xml内容,MS可以让您使用Xml差异(. If you are focused on Xml content, MS lets you play with the Xml diff)补丁^,是专门用于Xml的基于C#的差异工具.(, a C#-based diff tool specialized with Xml.)
我被迫生产这种工具,因为我必须处理随时间变化的配置文件,我不仅希望有一些东西可以显示随时间变化的差异,还希望它能很好地集成到自动化链中.该要求实际上排除了第三方,因为没有人同时提供我需要的API和适当的呈现格式.我也喜欢diff算法和相关技术对我来说是新事物的想法.(I was urged to produce this tool since I have to do with configuration files that change over time, and I wanted not only something to show me the diffs over time, I wanted it to be nicely integrated in the automation chain. This requirement in fact excluded third parties, because none were providing both the API I needed and the appropriate rendering format. I also loved the idea that diff algorithms and associated techniques were something new to me.)
因此,我拿起键盘并编写了这个简单的工具.引擎本身花了我几个小时.这意味着差异工具可以轻松构建.好吧,那我们有什么:(So I took the keyboard and wrote that simple tool. The engine itself took me a couple hours. It means that a diff tool can’t be that hard to build. Ok, what do we have then :)
- 能够通过双击(GUI)并选择多个文件来启动该工具(Ability to start the tool by double-click (GUI), with a multiple-file selection)
- 能够以批处理模式启动工具(Ability to start the tool in batch mode)
- 快速简单的算法(Fast and simple algorithms)
- 自定义选项,例如禁用区分大小写和缩进(Custom options such like disabling case sensitiveness, and indent)
- 并排HTML渲染,带有关键差异着色(Side-by-side Html rendering, with key diff coloring)
- 空行用于同步两个源文件中的内容(Blank lines used to sync content from both source files)
2.使用(2. Using it)
重要的是要注意,尽管上图显示了Xml文件之间的差异,但此工具可用于您可能想到的所有可能的文本文件格式.这是一个不可知的差异工具.(It’s important to note that, although the picture above shows a diff between Xml files, this tool can be used for ALL possible text file formats you might think of. It’s an agnostic diff tool.)
2.1互动模式(2.1 interactive mode)
只需双击可执行文件,然后在多选文件对话框中选择两个文件进行比较.差异引擎完成作业后,默认浏览器会自动显示HTML渲染.(Simply double-click on the executable, then choose two files to compare in the multi-selection File dialog. The Html rendering is automatically displayed by your default browser as soon as the diff engine has finished the job.)
2.2命令行模式(2.2 command line mode)
在批处理模式下,语法为:(In batch mode, the syntax is :)
diff.exe <file1> <file2>
<htmlfile>
Bookshop.xml
Bookshop.4.xml
Bookshop.5.xml
```
diff.exe “c:...\Bookshop.4.xml” “c:...\Bookshop.5.xml”
diff.html
对于那些希望将输出传递到其他地方的人,我提供了另一个项目文件,(*For those of you expecting to pipe the output somewhere else, I have provided another project file,*) `diffstdoutput.dsp` ,这是具有stdout输出的控制台应用程序.(*, which is a console application with stdout output.*)
### 2.3使用选项(*2.3 using options*)
`-c` `-i` `CFileOptions` ## 3.编译(*3. Compiling it*)
虽然主要(*Although the main*) `diff.dsp` 项目文件使用MFC(打开文件对话框,CString,CFile),diff引擎本身不需要它.实际上,我已经提供了一个单独的(*project file uses MFC (Open-File dialog, CString, CFile), the diff engine itself does not require it. In fact, I have provided a separate*) `diffengine.dsp` 生成静态库并且仅依赖WIN32的项目(我已经在其中添加了我自己的CString和CFile类).(*project which produces a static library and is only relying on WIN32 (I have added to it my own CString and CFile classes).*)
在9X/2K上测试.提供了VC ++ 6和VC ++ 7工作区.(*Tested on 9X/2K. Both VC++6 and VC++7 workspaces are provided.*)
## 4.发展(*4. Developing it*)
### 4.1差异引擎(*4.1 the diff engine*)
我一直认为产生差异是一个困难的工程问题.我错了.千方百计,我最初想到的设计在整个过程中都能完美运行.基本上,我拥有的结构是,对于两个源文件的每一行,都附加一个签名和一个状态.(*I have always thought that producing a diff was a difficult engineering problem. I was wrong. Against all odds, the design I had on first thought perfectly worked through time. Basically, what I have is a structure which, for each line of both source files, attaches a signature and a status.*)
签名是一个预先计算的令牌,使我可以非常快速地比较两个源文件中的字符串,而无需实际操作(*The signature is a precalculated token that lets me compare strings from the two source files very fast, without actually going through*) `strcmp` 东西或可能包裹的东西:(*stuff or anything that might be wrapped around :*)
// preprocessing the file, build precalculated tables BOOL CFilePartition::PreProcess(/in/CString &szFilename, /in/CFileOptions &options) { ASSERT( !szFilename.IsEmpty() ); if (szFilename.IsEmpty()) { OutputDebugString(“error : empty input filename\r\n”); return FALSE; }
SetName(szFilename); SetOptions(options);
// read the file first, // and build the table of tokens
CStdioFile f; if ( !f.Open(szFilename, CFile::modeRead) ) { TCHAR szError[MAX_PATH]; sprintf(szError, “error : cannot open %s\r\n”, szFilename.GetBuffer(0)); OutputDebugString(szError); return FALSE; }
// CString s; while ( f.ReadString(s) ) // (reads both Unix and Windows files) AddString(s);
f.Close();
return TRUE; }
// store it void CFilePartition::AddString(/in/CString &s, /in/long i) { CFileLine *p = new CFileLine(); ASSERT(p); if (p) { m_arrTokens.Add( p->SetLine(s, m_options) ); m_arrLines.Add( p ); } }
// shows how the token is calculated long CFileLine::SetLine(/in/CString &s, /in/CFileOptions &o) { m_s = s;
CString so = GetLineWithOptions(s,o); // filters the input line // according to options (case, indent, …)
long nToken = 0; long nLength = so.GetLength(); TCHAR lpString = so.GetBuffer(0); for (long i=0; i<nLength; i++) nToken += 2Token + *(lpString++); // (George V. Reilly hint)
return nToken; }
状态是一个枚举,它是从两个源文件中找到的结果的结果:我想知道更改了什么,添加了什么以及删除了什么.(*The status is an enum which is the result of what was found out of the two source files : I want to know what was changed, what was added, and what was deleted.*)
令牌准备就绪后,我要做的就是遍历第一个源文件的所有内容行,即通过参考文件的方式.所有行都与另一个源文件的内容匹配.只要一行匹配,就可以直接知道另一个源文件中的双行是否处于相同的"高度".如果它在下面,那是因为添加了一个块.因此,我感兴趣的事情之一是:(*Once tokens are all ready, what I do is go through all content lines of the first source file, namely by the way the reference file. All lines are matched against the other source file's content. Anytime a line is matched, it is straight forward to know whether the dual line in the other source file is at the same "height" or not. And if it's below, it's because a block has been added. Hence one of the things I am interesting in : the*) `added` 此块的状态.算法如下:(*status for this block. Algorithm is as follows :*)
// performs a diff between the reference file (f1) and the other file (f2) // CFilePartition instances are actually virtual file objects // results : two new virtual file objects with a status for each // content line // BOOL CDiffEngine::Diff( /in/CFilePartition &f1, /in/CFilePartition &f2, /out/CFilePartition &f1_bis, /out/CFilePartition &f2_bis) { long nbf1Lines = f1.GetNBLines();
long i = 0; long nf2CurrentLine = 0;
while ( i<nbf1Lines ) { // process this line long nLinef2 = nf2CurrentLine; if ( f1.MatchLine(i,f2,nLinef2) ) { // matched, either the lines were identical, or f2 has added something if (nLinef2 > nf2CurrentLine) { // add blank lines to f1_bis long j = nLinef2 - nf2CurrentLine; while ( j>0 ) { f1_bis.AddBlankLine(); f2_bis.AddString( f2.GetRawLine(nLinef2-j), Added );
j--;
}
}
// exactly matched
f1_bis.AddString( f1.GetRawLine(i), Normal);
f2_bis.AddString( f2.GetRawLine(nLinef2), Normal);
nf2CurrentLine = nLinef2 + 1; // next line in f2
}
else
{
... // checking out "change" or "deletion"
}
i++; // next line in f1
}
return TRUE; }
然后有趣的事情开始发生.将其他源文件与参考文件进行匹配只会得到蛋糕的前半部分.由于两个文件都扮演双重角色,因此值得利用另一个文件(现在是参考文件)建立的关系.尤其是当生成的算法交替引用时,很像DNA形状(或您此时可能想到的任何形状).这就是上述...点的实现方式:(*Then funny things begin to happen. Matching the other source file against the reference file gives only the first half of the cake. Since both files play a dual role, it is worth to take advantage of relations built out of the other file being now the reference file. Especially when the resulting algorithm cross references alternatively, much like in a DNA shape (or whatever you might think of at the moment). That's how the ... dots above get their implementation :*)
long nLinef1 = i; if ( f2.MatchLine(nLinef2, f1, nLinef1) ) { // the dual line in f2 can be found in f1, that’s because // the current line in f1 has been deleted f1_bis.AddString( f1.GetLine(i), Deleted); f2_bis.AddBlankLine();
// this whole block is flagged as deleted if (nLinef1>i+1) { long j = nLinef1 - (i+1); while ( j>0 ) { i++;
f1_bis.AddString( f1.GetRawLine(i), Deleted);
f2_bis.AddBlankLine();
j--;
}
}
// note : nf2CurrentLine is not incremented } else { // neither added, nor deleted, so it’s flagged as changed f1_bis.AddString( f1.GetRawLine(i), Changed); f2_bis.AddString( f2.GetRawLine(nLinef2), Changed);
nf2CurrentLine = nLinef2 + 1; // next line in f2 }
请注意,在此过程中,只要将某行标记为以下行,我们就会在参考文件或其他文件中添加空白行(*Please note that within the process we are adding blank lines in either the reference or the other file anytime a line is flagged as*) `added` 要么(*or*) `deleted` .这是因为我们要确保在显示结果时,我们在源文件之间具有完美的行匹配.当然,我们正在研究虚拟文件对象,(*. That's because we want to make sure that when the results get presented, we have a perfect row match between the source files. Of course, we are doing our work on virtual file objects,*) `CFilePartition` 实例,而不是实际的源文件. (保持不变).(*instances, not the actual source files. (left untouched).*)
这就是全部.该代码低于500行阈值!(*That's pretty much all about it. This code is below the 500-line threshold!*)
请务必注意,此处介绍的算法可能存在缺陷,也可能冗长无用.特别是如果您碰巧从事此类算法已有一段时间.那是一个1.0版本.请随时贡献.(*Be sure to note that algorithms presented here may have flaws, or may be uselessly lengthy. Especially if you happen to have been working on such algorithms for a while. That's a 1.0 release. Please feel free to contribute.*)
### 4.2 html渲染器(*4.2 the html renderer*)
我想要展示精美,制作快速且易于使用的东西.这个简单的渲染器仅仅是这些要求的结果.(*I wanted something nice to show, fast to produce, and easy to work with. This simple renderer is simply the result of these requirements.*)
友善意味着我希望在以后的生活中摆脱心灵的困扰.我已经有了足够多的带有重叠文件的水平视图,尤其是当增加了挫败感时,很明显在比较文件时,水平视图违背了直觉.拥有垂直的非重叠视图是许多需求,并且很容易通过使用html单元格表标签来提出.(*Being nice means that I wanted windiff to be purged out of my mind for the rest of my life. I have had enough of that horizontal view with overlapped files especially when, adding to the frustration, it's obvious that horizontal views are against intuition when it comes to comparing files. Having a vertical non overlapped view was numero uno requirement, and was easy to come up with by using html cell table tags.*)
在它旁边,我希望它能快速生产.实际上,没有太多要说的.差异引擎的输出是两个虚拟文件实例,其中对于实际文件的每一行内容,其状态都是已知的.要产生差异,我只需要选择给定状态的颜色并使用CSS html样式覆盖行格式即可.使用样式可以说明(*Next to it, I wanted it to be produced fast. There is actually not much to say about it. The output of the diff engine is two virtual file instances where the status is known for each line of content of actual files. To produce the diff, I only have to choose colors for a given status and use CSS html styles to to override the row formatting. Using styles exemplifies a*)*实际上(*de facto*)*因式分解.下次创建ASP代码时,请考虑一下!(*factorization. Think about it next time you create ASP code!*)
另外,我不想错过让报告自定义的机会.这是一个简单的API:(*In addition, I didn't want to miss the opportunity to let the report be customized. Here is a simple API :*)
// adds a header and a footer to the resulting html report void SetTitles(CString &szHeader, CString &szFooter);
// defines sequentially : // - the color of the source text (of the form #FF4444) // - the color of the background // - the color of lines that have changed // - the color of lines that have been added // - the color of lines that have been deleted void SetColorStyles(CString &szText, CString &szBackground, CString &szChanged, CString &szAdded, CString &szDeleted);
最后,易于使用是由于将行标记为添加或删除时,双行中添加了空白行.这样做,我们确保代码块在进行较小或较大更改后完全匹配.由此产生的差异易于浏览.(*Finally, being easy to work with was a result of the blank lines added to dual files when lines are flagged as added or deleted. Doing so, we ensure that code blocks perfectly match after small or big changes. The resulting diff is easy to browse.*)
CString CDiffEngine::Serialize(/in/CFilePartition &f1,
/in/CFilePartition &f2)
{
// html header
CString s =
“\r\n”
“\r\n”
“\r\n”
“\r\n”
“ File Diff \r\n”
“\r\n”
“\r\n”
“\r\n”
“\r\n”
“\r\n”
“\r\n”
“\r\n” + m_szHeader +
“”
“\r\n”
“"
“\r\n”
“old versionnew version”
" ( changed "
“ "
“added deleted) "
“<SELECT id=‘fontoptions’ "
“onchange=‘maintable.style.fontSize=this.options[this.selectedIndex]"
“.value’>”
“6pt7pt8pt”
“9pt”
“\r\n”
“”
+ f1.GetName() + “” +
f2.GetName() + “”
“\r\n”
“\r\n”
“\r\n”
“\r\n”
“\r\n”
“\r\n” ;
long nbLines = f1.GetNBLines();
if (nbLines==0)
{
s += “empty files”;
}
else
{
s += “"
“”
“\r\n”;
}
char *arrStatus[4] = { “”, " class=‘C’”, " class=‘A’”, " class=‘D’” };
CString sc;
// write content //
for (long i=0; i<nbLines; i++) { sc += “<td width=50%” + CString(arrStatus[ f1.GetStatusLine(i) ]) + “>” + Escape(f1.GetRawLine(i)) + “"; sc += “<td width=50%” + CString(arrStatus[ f2.GetStatusLine(i) ]) + “>” + Escape(f2.GetRawLine(i)) + “"; } // for i
s += sc;
if (nbLines>0)
s += “”
“\r\n”;
// write html footer
s += m_szFooter + “\r\n”
“\r\n”;
return s; }
// a helper aimed to make sure tag symbols are passed as content CString CDiffEngine::Escape(CString &s) { CString o; long nSize = s.GetLength(); if (nSize==0) return CString(” “);
TCHAR c; BOOL bIndentation = TRUE;
for (long i=0; i<nSize; i++) { c = s.GetAt(i); if (bIndentation && (c==' ' || c=='\t')) { if (c==' ‘) o += " “; else o += " “; continue; } bIndentation = FALSE;
if (c=='<')
o += "<";
else if (c=='>')
o += ">";
else if (c=='&')
o += "&";
else
o += c;
} return o; }
### 4.3总结(*4.3 wrap up*)
CString szFile1 = “…"; CString szFile2 = “…"; CString szOutfile = “…"; //.html file
CFileOptions o; if (!bCaseOption) o.SetOption( CString(“case”), CString(“no”) ); if (!bIndentOption) o.SetOption( CString(“indent”), CString(“no”) );
CFilePartition f1; f1.PreProcess( szFile1, o ); // precalculate tokens
CFilePartition f2; f2.PreProcess( szFile2, o ); // precalculate tokens
CFilePartition f1_bis, f2_bis;
CDiffEngine d; d.Diff(f1,f2,f1_bis,f2_bis); // actual diff d.ExportAsHtml(szOutfile, d.Serialize(f1_bis, f2_bis)); // wrap up
## 5.更新历史(*5. Update history*)
- 2月16日-初始版本(*16 Feb - initial release*)
2月23日-添加:(*23 Feb - added :*)
- 差异选项((*diff options (*) `CFileOptions` 类):可以禁用区分大小写以及缩进(*class) : case sensitiveness can be disabled, as well as indent*)
- 添加了使用stdout将输出传递到其他地方的功能(*added the ability to use stdout to pipe the output somewhere else*)
- 动态字体大小选择(*dynamic font size selection*)
- 5月10日-添加了以下功能:(*10 May - added the following features :*)
- 文件夹支持:difftool现在可以生成整个文件夹的报告.用例是通过比较每个文件对的最后修改日期来跟踪更改的能力.(*folder support : difftool now builds a report of entire folders. The use case is the ability to track changes by comparing the last modified dates of each file pairs.*)
- 添加了行号,更改了字体以提高可读性(*added line numbers, changed the font for better readibility*)
## 许可
本文以及所有相关的源代码和文件均已获得[The Code Project Open License (CPOL)](http://www.codeproject.com/info/cpol10.aspx)的许可。
VC7.0
C++
VC6
WinXP
Windows
Win2K
MFC
Visual-Studio
Dev
新闻
翻译