In this First Break article from February 2025, TGS experts Altay Sansal, Ben Lasscock and Alejandro Valenciano address the complexities of large-scale training of a seismic foundation model on a global dataset of 63 seismic volumes and leverage a cloud-native, digitalized seismic data infrastructure to address the data engineering challenges, avoiding duplication.
The pre-trained ViT-MAE model is an emerging technology in seismic processing and interpretation [Lasscock 2024, Sheng 2023]. Much like how large language models have been a step change in natural language processing, there is potential for this new way of approaching AI to disrupt geophysical applications. Until now, these studies have been applied to small, open-source datasets with synthetic data and older seismic imaging and processing techniques. [Ordonez 2024] reported an expansive study that high-graded a subset of 60,000 2D crops for pretraining from a larger 20 survey dataset. In each case, these studies have demonstrated the efficacy of pre-training a seismic foundation model (SFM) and then using or fine-tuning it on various downstream tasks, including seismic salt and facies classification.
The highly scalable characteristics of the ViT-MAE technology, mainly when applied in 3D [Lasscock 2024], have yet to be explored in geophysical literature. In computer vision, it has been established [Zhai 2022] that larger models pre-trained on large datasets (ImageNet-21k and JFT-300M) achieve better performance in image classification tasks. This study aims to tackle the problem of scaling ViT-MAE models trained on seismic data to a global corpus of 63 seismic surveys. And evaluate if a subsequent downstream task can be efficiently fine-tuned from these large pre-trained models to outperform existing AI methods regarding their generalisation capacity.
Figure 1 - A modified schematic view that explains the ViT-MAE pre-training concept [He 2021] is shown in the picture. Large 3D data patches are loaded in batches, 90% of the data is discarded, and the remaining 10% is used to reconstruct the original data from the mask tokens.
As we train large models, data management becomes a crucial enabling technology, both in need of exploring and curating such a large corpus of data and efficiently saturating large clusters of GPU computing required to train them in a timely manner. Tracking this problem on seismic data presents unique challenges. We will explain how cloud object storage and the MDIO seismic data format [Sansal 2023] are used efficiently in pretraining a 660M parameter 3D seismic ViT-H model. We will address the model’s usefulness by fine-tuning it for salt interpretation. The salt interpretation model builds on our SaltNet dataset, consisting of interpretation from 23 seismic volumes, and we will compare model IoU scores with existing state-of-the-art 2D and 3D U-Net models [Warren 2023, Roberts 2024].
Read the full article here.