When cross-referencing objects from another resource, EMF may use proxies instead of the actual object. As long as you do not access the element in question, EMF does not need to load the resource in which it is contained. The proxy is kind of a placeholder that tells EMF what resource should be loaded, and which of this resource's objects referenced, when we actually need to access its value.
Proxy resolution is generally transparent, but a lot of tools based on EMF do not consider these proxies as first-class citizens: they simply resolve it without considering that it might not be needed. On the other hand, EMF Compare will never resolve proxies except for those strictly necessary. Whatever the phase of comparison, we strive to never hold the whole model in memory.
The initial resolution phase is the most intensive in terms of proxy resolution and I/O operations. Though we will never hold the whole logical model in memory at any given time, we do resolve all cross-references of the compared resources, on all sides of the comparison. Since the logical model may be located in a lot of different files on disk, it might also be very heavy in memory if loaded at once. However, even if EMF Compare does resolve all fragments composing that logical model, it also unloads them as soon as the cross-references are registered. In other words, what we create is a dependency graph between the resources, not a loaded model. Afterwards, we only reload in memory those resources that have actually changed, and can thus contain differences. There will be proxies between these "changed" resources and the unchanged ones we have decided not to reload, but EMF Compare will never resolve these proxies again (and, in fact, will prevent their resolving from other tools).
At the time of writing, the user interface will never resolve any proxies either. This might change in the future for a better user experience since proxies usually end up displayed in strange manners.
The equality helper is a very central concept of EMF Compare. Of course, EMF Compare's aim is to be able to compare objects together, objects which comparison is not trivial and cannot be done through a mere "equal or not" concept. However, we still need to be able to compare these objects at all time and, whenever possible, without a full-fledged comparison.
The equality helper will be used in all phases of the comparison, from matching to merging (please see the overview for a bird's eye view of the different comparison phases, or their detailled descriptions down below). The matching phase is precisely the time when EMF Compare is trying to match two-by-two the elements from one side to the elements from the other side. As such, we do not have -yet- the knowledge of whichi element matches with which other. However, for all subsequent phases, the equality helper will rely on information from the comparison itself (the Match elements) to make a fail-fast test for element 'equality'.
When we do not have this information, the equality helper will resort to less optimal algorithms. For any object that is not an EMF EObject, we will use strict equality through == and Object#equals() calls. One of the cause for EMF Compare failing to match attribute values together is in fact the lack of implementation of the equals method on custom datatypes (see the FAQ for more on that particular issue).
Note that the equality helper will be used extensively, and that any performance hit or improvement here will make a huge difference for the whole comparison process. Likewise, any mistake you make when implementing a custom equality helper will introduce a lot of bugs.
As seen above, EMF Compare consider proxies as real citizens of the EMF realm. This mainly shows in the matching mechanism. EMF Compare uses a scoping mechanism to determine which elements should be matched together, and which others should be ignored. Any element that is outside of the comparison scope will be ignored by the comparison engine and left alone (if it is a proxy, it won't even be loaded). This also means that we won't really have a way to compare these proxy (or otherwise out-of-scope values) when the Diff process encounters them.
For example, an element that is outside of the comparison scope, but referenced by another element which is in the scope will need specific comparison means: we've ignored it during the matching phase, so we don't know which 'out-of-scope' element corresponds to which 'other out-of-scope' element. Consider the following: in the first model, a package P1 contains another package P2. In the right, a package P1' '' contains a package ''P2' ''. We've told EMF Compare that ''P2 and P2' '' are out of the comparison scope. Now how do we determine that the reference from ''P1 to P2 has changed (or, in this example, that it did not change)?
This is a special case that is handled by the IEqualityHelper. Specifically, when such cases are encountered, EMF Compare falls back to using the URI of the two objects to check for equality. This behavior can be changed by customizing the IEqualityHelper (see above).
By default, the only thing that EMF Compare considers "out of scope" are Ecore's "EGenericType" elements. These are usually meaningless as far as comparison is concerned (as they are located in derived references and will be merged along with their "true" difference anyway). Please take note that, when used from the user interface, EMF Compare will narrow down the scope even further through the resolution of the logical model and determining which resources are actually candidates for differences.
The comparison scope provides EMF Compare with information on the content of ResourceSets, Resources or EObjects, according to the entry point of the comparison. Take note that the scope is only used during the matching phase. The differencing phase only uses the result of the matching phase to proceed.
PENDING description of the algorithm, why do we use it, references