About file resolution Jeff Anton Perforce Software COMPANY CONFIDENTIAL 1/7/2004 This is a description of how the diff merge process works as I understand it. First lets get some concepts down... Users of the merge class diffmerge supply files and call DiffMerge::Read many time until the contents are delivered. On each call a chunk of results are returned along with a return code which indicates from which file the results originate from and whether the results are in conflict. The best way to see this work is to run p4 merge3 which shows the result blocks and the source/conflict indicator. If you use the '-r' option to merge3 then the RCS like merge result is written not the individual blocks. p4 merge3 runs entirely in the p4 client and on the files provided. Now lets discuss how diffmerge works in different levels of detail. The merge process starts by getting the differences between the base file and each of the two legs. That is two diffs. These diffs are represented by regions of lines known to be different. For example, lines 5 through 10 in the base file do not match lines 7 through 11 in the other (leg) file. Regions of lines between these diff regions will match. Each leg has a list of regions which do not match against the base. These lists are ordered. DiffMerge::Read takes these two lists of diffs and then runs a complex state machine which produces the results. Now about that DiffMerge::Read state machine. DiffMerge::Read returns blocks of output which can be one of ALL meaning that both legs are in agreement with the base file L1 and/or L2 with possible conflict indicating that some or both legs differ from the base and whether those legs conflict in their differences with the base. This is pretty simple once you understand the diff regions mentioned previously. You just have to start at the start of the base file and look at the top diff region of each of the legs. If the legs diff regions both start after the start of the file the method will advance its position in the base file and emit the lines common to all files until the first diff is reached. Once there is a diff the two legs are examined to see which one comes first and then if they overlap. If there is no overlap then it is a simple change imposed by one of the two legs. The state machine emits the base version lines which cover the diff, then the lines in the leg which those lines were changed to (or deleted). Then the machine goes back to the top emitting lines in common until the next diff region is reached. The hard bit occurs when both legs of a diff region overlap. Then a conflict is present. To report that conflict the two diff regions of the two legs have to be made to cover the same lines in the base file. This can be that one leg changes three lines and the other leg changes only two lines. This "common base coverage" is needed because in the merged file we want to report the base lines and each of the conflicting legs without emitting lines of the base multiple times and for general readability. The leg diff regions can be expanded by extending the base and leg regions by identical number of lines. This will increase a region to include lines which do not really differ, but it can make the two diff legs source regions match the base region. However, this extention might bump up against the next diff region in which case that next diff region must be merged into the first and then the other leg diff region must again be looked at to assure common base coverage. If there are many alternating overlapping changes this can make a conflict appear very large. Once the conflicting diff regions were settled, those regions which be checked for identical content and if identical the changes would be in both legs but not the base and that was reported as a BOTH case and emitted once as a common change. That's basically it except it is worth discussing what overlap really means. The overlap test occurs by testing the line number of the diff regions on the source (base) side of the diff region because the source side of the diff refers to the lines of the base file which are a common reference. Pre 2003.1 perforce considered diffs to overlap if the two leg diff regions touched. The diff regions are represented as a start line number which is included in the region and an end line number which is just past the end of the region. This is an interval sometimes represented symbolicly as "[start, end)" The 2003.1 release changed the overlap rule to allow these regions/intervals to be adjacent (touch) as long as they do not really overlap (i.e. loose vs. strict overlap.) but two diff regions which are inserts have base file regions where the start and end lines numbers are the same i.e. no lines in the base are changed/removed. The test was carefully coded so that two inserts at the same point in the base would be reported as a conflict but the other touching cases would not. This was found to cause some confusion when inserts were adjacent to changes and those inserts and changes had lines in common. This caused apparent duplicate changes so the overlap rule was changed again to report that inserts adjacent to changes would be in conflict leaveing only the case that two changes (i.e. diff regions with non-empty source/base regions) that are adjacent will not be considered overlapping. (This change was done through a bug fix/change.) New conflict resolution.... The new conflict handling code occurs once a conflict is found but before that conflict is adjusted to have a common base range as mentioned above. When the first conflict is encountered a third diff between the two legs is performed. This diff provides information on regions common to the two legs and helps breakup conflict regions into more regions which would not conflict. The new code works as follows... If a common region between the two legs is found which overlaps both legs of the conflict then we can be smart, otherwise we just handle the conflict as before. Since there is some common region between the two halfs of the conflict there is something smart we can do. We start by checking if the common region covers the first lines of the two change legs. If neither change leg starts in the common region, both legs are split to create to conflicting regions with nothing in common and that is then handled normally. If one leg is in the common region but the other is not, the other leg has a new change. This is split off and handled as a LEG1 or LEG2 change. This split is tricky to cover the base range or not. The base range adjustment is based on the leg being at the start or end of diff legs. If both legs start with lines in common, if the diff legs have different line counts the smaller leg is expanded to match the longer leg to the extent that lines continue to stay in common. This expansion is handled in part by the ExpandBy method which takes care to see if the expansion runs up against subsequent diffs from that leg and carefully adjusts the base if subsequent legs are included. At this point we make another check if the two diff legs have the same number of lines and if not the longer is split to match the shorter. After all this we have a BOTH (i.e. not conflicting block) which is handled normally. There is one last check for an insert being adjacent to a change and insert or change is allowed through without conflict. This was considered a conflict earlier so as to get this case into the new conflict handling code, but the new conflict code will generate apparent inserts so those need to go through. Because the new code can tell when inserts match changes we should not see the duplicate lines problem that was seen and required stricter handleing. This process iterates again and that's it.