If x and y need to move together to prevent collision it sounds like some start positions might also cause problems. If you move Y back first wouldn't then always be enough space to move X afterwards?
Anyway, we always have a direction flag as it is a single bit. Is is either min or max, so if you invert logic to prevent min being marked max is marked and the problem just moves to users with max endstop. What you need is something similar to z homing with 2 motors and 2 endstops. Here the move only ends when both endstops are hit and when one endstop is hit the axis does not move any more. In your case you would need to change check endstops for homing flag and set x steps = 0 when x axis is hit but continue move instead of stopping it. Or same if y axis is first.