INTRODUCING DINOV3DINOv3 scales self-supervised learning (SSL) for images to produce our strongest universal vision backbones, enabling breakthrough performance across diverse domains.DINOV3 OVERVIEWWe scaled unsupervised training to 7B-parameter models and 1.7B image datasets, using a fraction of compute compared to weakly-supervised methods. Despite keeping backbones frozen during evaluation, they achieve absolute state-of-the-art performance across diverse domains.SSL unlocks domains where annotations are scarce or costly. Backbones enable state-of-the-art results for tasks including object detection in web imagery, but also canopy height mapping in satellite and aerial imagery.High-resolution dense features from a single DINOv3 backbone enable leading performance across vision tasks, including object detection, depth estimation, and segmentation, without any finetuning.We release a comprehensive model suite addressing a wide range of use cases, including broad coverage of ViT sizes and efficient ConvNeXt models for on-device deployment.PERFORMANCEDINOv3 sets a new standard in vision foundation models. For the first time, a model trained with SSL outperforms weakly-supervised models on a broad range of probing tasks, from fine-grained image classification, to semantic segmentation, to object tracking in video.APPROACHPre-training data is curated from a large unlabeled dataset. During pre-training, the model learns general-purpose visual representations, matching features between different augmented views of the same image. In post-training, the model is distilled into more efficient models.A pre-trained DINOv3 model can be easily tailored by training a lightweight adapter on a small amount of annotated data.DINO EvolutionDINOv3 marks a new milestone in self-supervised training at scale. It builds upon the scaling progress of DINOv2, further increasing the model size x6, and training data x12.DINOInitial research proof-of-concept, with 80M-parameter models trained on 1M images. DINOv2First successful scaling of a SSL algorithm. 1B-parameter models trained on 142M images.DINOv3An order of magnitude larger training compared to v2, with particular focus on dense features.Explore additional resourcesFoundational modelsOur approachResearchMeta AILatest newsFoundational models Meta © 2025