Touch the boxes.
Ideally the same language of manipulation and any tool built from it should be able to describe, at minimum, basic responses like ‘roll overs’ and ‘clicks’. This is possible so long as the forms have quantifiable properties like ‘touched’ or ‘pressed’ that can be used to drive other qualities. For example; a box’s ‘touched-ness’ could be set to drive a quality of the mapping between the box and the mouse. Furthermore, we might grant that any mapping has a property ‘active’ so that it may be deactivated or re-activated at will.
box1 touched -> (mouse position -> box2 position) active