Rethinking geological logging through a machine learning lens

The problem
Geological logging of diamond drill core underpins our understanding of the subsurface in mining projects. The process involves describing lithology, alteration, mineralisation, and many other features. Early on, it can feel natural to record every subtle detail. But as a project moves from exploration into feasibility and production, the logging schema – the set of classes and terms used to describe the rock – tends to shift from a bottom-up, observation-based approach to a top-down, model-driven one. This evolution often introduces confusion, creates “noise,” and can end up not doing what it is meant to do – support people making decisions within the operation.

We see a lot of logging schemas at Datarock, and some of the common issues we see include:

Too many classes lumped into a single field, i.e. 57 options within the Lith1 column.
Classes without meaningful differences making them impossible to log consistently.
Classes with large internal variability that should be split.
Classes relying on unobservable geological knowledge (e.g. relative stratigraphic position)

Most importantly, many of these classes are not linked to a clear business requirement. Early in a project, it seems logical to record every detail, as we don’t yet know what’s important. Later, though, many classes might not align with what actually matters: improved recovery, cost savings, or better geotechnical decisions. When we are exploring it may seem important to us to map every variation of mineral assemblage within our granodiorites, but over time we should review that and see if thats actually useful. Our logging should evolve as our understanding does.

Geologists are classifiers too
It helps to think of a geologist as a kind of machine learning (ML) classifier (there are a lot of geologists on Linkedin who will not appreciate this, but bear with me). An ML model assigns data to categories, and if there are too many categories, it struggles – each class demands more training data and computational effort. Similarly, a geologist faced with an overly complex schema must spend extra time and brainpower distinguishing categories that don’t add real value. The result is less consistent, slower logging, and lower-quality data.

An important rule of thumb for a machine learning model – if you have two classes you might need 100 examples of each to build a classifier. If you have 10 classes, you don’t need 100 examples of each, you likely need 500-1000. Redundant classes make life hard.

Additionally, and this is our opinion, if geologists are too focused on what they are logging, then it becomes less likely they are thinking about why they are logging it. Without the context of how the data is being used, it is much easier to “forget” the end customer and hand over a lower quality product/output.

This doesn’t mean we ignore geological complexity or not allow geologists to be scientists. Early on, it’s important to capture it and later in the cycle a geologist’s job is to ensure that our geological understanding of the project remains as accurate and clear as possible. But, from a data capture point of view, as the project develops, we need to focus on the parts that really matter for decisions and outcomes.

A business-driven approach
Data scientists know that a good classification scheme starts with the problem it needs to solve. If the challenge is improving ore recovery, then the classes should highlight factors that affect recovery, not subtle but irrelevant geological details.

For geological logging, the same principle applies. If clay content affects processing, define classes that capture clay-related features. Keep these initial categories broad, and only refine them if further analysis shows it will help improve outcomes. This approach mirrors how ML solutions develop: begin simply, and only add complexity if justified.

By using ML models, today’s geological logging can be more dynamic than ever. It brings the process full circle, enabling real feedback loops—where processing and downstream datasets inform and refine logging, and drill core imagery can be reclassified again and again to solve business problems at every stage as rock moves through the operation.

What we have learnt at Datarock:
Datarock initially tried replicating traditional lithology logs, building ML models that mirrored large, complex schemas. These models required many labels, were tough to train, and often underperformed. They were like a junior geologist trying to master a complicated schema – overloaded and inefficient.

We now do things differently. We work with companies to define models tied directly to business outcomes. Whether it’s targeting certain ore zones, mapping marker units, or identifying material types that affect processing, we focus on what influences economic results. We drop irrelevant classes and avoid making one model to solve multiple unrelated problems. This targeted approach makes our models clearer, more accurate, easier to understand and far easier to train.

When a new business question arises, we simply add a new model or carefully introduce a new class – rather than forcing one model to do everything.

Digital geology

One of the primary drivers of how we have historically logged in that coreshed is the fact that physical drill core logging was a one-shot event. Once logged, the data and categories are locked in and it was very difficult to go back and audit or change those logs. By contrast, digital data – like geochemistry, scanner output, and core imagery – can be revisited and refined.

This flexibility allows for continuous improvement. Manually, or with automated methods, we can start with a lean, business-focused schema, review the results, and expand only if the data justifies it. This approach not only supports operational goals – like improving recovery or cutting costs – but also gives geologists confidence to evolve their logging. If a new feature proves important later, there’s a path to reclassify the backlog. We no longer need to capture everything up front. Logging is no longer fixed – it can grow as understanding grows. Align the schema with the mine’s business objectives, focus on a few key signatures, and adjust as necessary.

A practical example of this was Datarock was at a mine where they had logged their veins into two classes. A consultant visit later (they have to change something), they now have five classes. Traditionally, that schema could only be logged on new holes, or very slowly from the photos. We built a model to redefine their vein classes and applied it to over a million metres of historical drilling, allowing them to analyse the classes spatially and decide whether or not it added any value. If the classes don’t actually help – don’t log them going forward (automatically or manually). Without that testing, there is a good chance those five vein classes are being logged forever, regardless, until no one can remember why they are logged in the first place.

Learnings

Logging core manually is a lot of work, and it’s made harder by inefficient logging schemes. In this age of digitalisation of drill core, we believe there should be a change in approach to how these logging schemes are set up and maintained over the life of an asset. Datarock’s experience shows that moving from full lithology replication to targeted, outcome-driven classification models leads to better results – clearer decisions, easier training, and more meaningful data. This approach sets both geologists and ML models on the path to delivering real, measurable value.

Rethinking geological logging through a machine learning lens

Previous PostGeomet isn’t dead. It just grew up.

Next PostWhen plans miss, it’s rarely the rock’s fault

Subscribe to our newsletter

Info@datarock.com.au

F.A.Q.

Privacy Policy

Rethinking geological logging through a machine learning lens

Previous PostGeomet isn’t dead. It just grew up.

Next PostWhen plans miss, it’s rarely the rock’s fault

Subscribe to our newsletter

Info@datarock.com.au

linkedin

F.A.Q.

Privacy Policy