Large geometric foundation models (e.g. MASt3R, 688.6M parameters) achieve state-of-the-art 3D reconstruction from stereo pairs but are prohibitive to deploy on resource-constrained platforms — particularly relevant for lunar space missions. We propose a distillation framework combining (i) SVD-based decoder initialization from the teacher (Eckart–Young optimal), (ii) a feature-alignment loss at intermediate ViT layers, and (iii) full encoder fine-tuning.
Inference was run with our actual checkpoint-50 for every model on each pair. Pick one — all comparisons below update accordingly.
If you use this work, please cite the paper below.
This work was supported by the French Agence Nationale de la Recherche (ANR, “Investissements d’avenir”, ANR-21-ESRE-0051) and the European Space Agency (ESA, contract 4000140461/23/NL/GLC/my).