Where Do We Go from Here? The future of artificial intelligence in archaeology

by Christian Horn (Swedish Rock Art Research Archives, Department of Historical Studies, University of Gothenburg) and Ashely Green (Centre for Digital Humanities, Department of Literature, History of Ideas, and Religion, University of Gothenburg)

Artificial intelligence (AI) is a growing field encompassing machine learning (1)  and deep learning (2) , in which machines learn patterns and features from training data. AI is widely applied in diverse areas of research, such as in medicine, ecology, and sustainability. Even at this seemingly early stage, there have already been many AI applications in archaeology, largely in remote sensing and aerial archaeology that cover large landscapes and datasets. With the growth of big data and new technologies, there is a question of whether machines can interpret archaeological data like professionals and if these interpretations can be relied upon without human verification. The application of machine learning to archaeological questions has grown widely just in the last five years, providing new workflows for recording and interpreting aerial, geophysics, and artefact data. It has also created new problems and anxieties, which we shall briefly discuss below.

In our work, we use convolutional neural networks to semi-automatically detect rock art images in visualisations produced from 3D documentation as well as documentations made with older techniques, such as rubbing or tracing (Horn et al. 2021). Machines can only learn from the information with which they are provided. Training is a crucial step in generating any useable, accurate model. While transfer learning (3) (retraining the last layer(s) of a trained model on a similar dataset) can be helpful when working with small datasets, there is still much progress to be made in creating labelled datasets for supervised learning in archaeological applications of AI. Training models that are not overfitting (memorising the training data) for several similar tasks requires large datasets, transfer learning, data augmentation, or a combination thereof. As such, data labelling and retraining models should be continuous. While citizen science projects have helped create large training datasets, typical archaeological applications are often limited by the available data, especially data that has already been labelled. These issues of small datasets, unlabelled data, and overfitting models have been common points of concern in advancing the role of deep learning in archaeology.

Deep learning offers advantages to geospatial and geophysical problems, where the digitisation and interpretation of large datasets can be tedious and incur bias from the interpreter. Some of the more straightforward applications (such as land classification) have been successful due to the clear boundaries and simplistic, consistent morphologies of features and land formations. This is an advantage that cannot be achieved in the recent wave of work on the interpretation of 3D/2.5D visualisations and representative images of more creative subjects (such as decorated ceramics/pottery as well as painted and carved rock art). Human creativity is biased, messy, and complex, especially when the outcomes of such creativity are drawn together from several centuries.

Reducing the complexity of training data is one method to improve the classification accuracy of the models, but in so doing, we risk losing the nuances that indicate the differences and variances analysed. Rock art – in our case from southern Scandinavia, dating to the Bronze Age (1800/1700-550 BC) - is an excellent example of this because several features make it inherently difficult to classify through AI approaches. We can single out two characteristics of rock art that impede a straightforward classification in a trained model.

Firstly, rock art is multi-phased. This means that newer images overlay older ones and partially destroy them by physically removing earlier lines. In such cases, it becomes difficult to differentiate the original features from added elements. For example, a well-established chronology exists for boats based on particular features like ships’ stems. However, Early Bronze Age boats were updated during the Late Bronze Age with contemporary stems. Secondly, there is a fluidity of form and constructional elements, making boats and animals – (such as deer and horses) – appear very similar. This is further complicated by the presence of boats with horse heads, and it gets even messier with partial boats that look like humans and vice versa.

Such data often lead to misclassifications. For example, horses are partially classified as boats, while boat prows and stems are classified as humans. We argue that such misclassifications should not be dismissed as errors. Instead, every misclassification should be investigated because it can teach us not only about our data and our models but also about our biases and perhaps even about the past. Looking at our Scandinavian data, for example, we can begin to understand how much the lines between things (boats, weapons, etc.), animals (deer, horses, snakes, etc.) and humans are blurred. We can discover more instances of such blurring across the corpus of Scandinavian rock art. This phenomenon conveys the sense that the Bronze Age carvers were well aware of this and constructed their images ambiguously on purpose (Fig. 1). Thus, training models (and humans!) to recognise these nuances is a vital but complex undertaking.

Figure 1: Is the encircled figure a boat or an animal? An example of the ambiguous construction of Bronze Age rock art in Scandinavia. (Panel in Aspeberget (Tanum 12:1), visualisation from a laser scan using “ratopoviz” https://github.com/Swedish-Rock-Art-Research-Archives/rock-art-ratopoviz-gui)

For these more complex tasks, it is crucial to have an extensive training dataset and even systematically augment the data to demonstrate the possible variations that can occur in real-world examples. This means that when we train our models, we need to provide data at various degrees of rotation, in different contrasts and lighting conditions (and much more) to create sustainable, reusable models. We need feedback from real-world results. For example, ground verification through excavation for LiDAR data or renewed inspection of carved panels for rock art data may prove insightful. Perhaps most importantly, we need the comprehensive knowledge base of experienced professionals to engage in a dialogue with machine-produced models. Deep learning and other methods are powerful tools to assist in the initial data interpretation and reduce human bias in classification. However, the consistency of modern algorithms does not equate to a correct interpretation of the human past. What if the inconsistencies and biases of researchers are actually necessary to arrive conjointly at a fuller interpretation that gets closer to past realities? We believe this to be the case.

Perhaps as computational capabilities become more advanced and researchers build and share training datasets, it will become possible for machines to accurately interpret the nuances in human-made features, landscapes, or geospatial data. However, in the current state of AI in archaeology, we are far from automatic data interpretation (4)  in most applications without the data requiring confirmation, verification, or at minimum ground-truthing by professionals. With continuous training in a feedback loop and secondary verification of AI outputs by archaeologists, machine learning can be fruitfully integrated into more aspects of archaeology.

With a consistent feedback system in place, deep learning models can be improved. Eventually, their outputs will require less checking and editing. In time, there may be enough data available to have models that are more than 50% accurate in interpreting subjective materials. So, for now, there should be a focus on creating comprehensive training datasets to help train deep learning models that can be used to create more efficient workflows for tedious tasks such as digitisation and documentation of remote sensing and survey data.

Moving forward, we should try to improve our efficiency with semi-automated data interpretation. Archaeologists should welcome this trend because it does not threaten our jobs. We study our data because we want to know about ancient people and their societies. Just like us, the people inhabiting the past were biased, inconsistent, and unpredictable. Researchers using AI need to take this into account. An understanding of these aspects of the human condition needs a place at the table in our reconstructions. There can be no doubt that AI will become an integral part of archaeology, just like the computer has. Future archaeologists will not give these tools much thought, just like we do not give much thought to using Excel, SPSS, or GIS software. To students who will become these future scholars – we highly recommend learning a programming language, like Python, C++, Lisp, etc.! You will be grateful for these investments in the future.


  • Horn, C., Ivarsson, O., Lindhé, C., Potter, R., Green, A. & Ling, J. 2021, Artificial Intelligence, 3D Documentation, and Rock Art: Approaching and Reflecting on the Automation of Identification and Classification of Rock Art Images, Journal of Archaeological Method and Theory, doi:10.1007/s10816-021-09518-6.

1) Machine Learning is a subfield of artificial intelligence (AI) that uses algorithms to infer or predict patterns from given data, often used for cluster analysis and simple classification or detection tasks, such as edge detection. Machine learning can be supervised (using labelled, or known, data), unsupervised (using data without labels and allowing the computer to make correlations), or semi-supervised (using a dataset with labelled and unlabelled data).
2) Deep Learning is a subfield of machine learning which uses deeper neural networks to learn patterns and representations in data, most often images. Deep learning tasks can include object detection, instance segmentation, and semantic segmentation.
3) Transfer Learning or fine-tuning is a method often used in deep learning that applies information learned in an original training task to new, similar data to improve training metrics. Transfer learning is often used when large, labelled datasets are limited, for example, using a model trained on the ImageNet or MNIST databases and retraining the last layer(s) in a network with different images or new class labels.
4) In the context of archaeological data and AI, automatic data interpretation is the use of computational methods to automatically infer patterns in or draw conclusions and predictions from spatial, 2.5D/3D, or archival data.

Go back to top

Please discuss here