Dieser Artikel ist in erweiterter Version auch auf deutsch bei heise Developer verfügbar.

 

When working in some kind of technical role in IT, you may have heard about, experienced, identified, introduced, or hopefully reduced technical debt. In contrast, Domain Debt is seldom talked about. In this article, I compare the two and will explain why Domain Debt needs to be taken more seriously.

What is technical debt?

When I ask what technical debt is, answers may vary. Some say it is the postponement or even the non-execution of certain tasks. Who has not heard that documentation was not updated or code was not tested? Other may say it is simply poor programming style, missing coding standards, or usage of antipatterns. However, a common point in all definitions seems to be that technical debt primarily acts as an expense driver. Software that contains technical debt is harder to write and maintain than software without technical debt. It simply takes longer and therefore is more cost intense. The longer technical debt persists, its elimination becomes more and more expensive, too. Additionally, writing further code on top of it, things become even worse.

In the software industry we seem to get used to this kind of debt. We try to fight it, but somehow, we cannot avoid it. No judgement!

However, as an industry we rarely or never discuss business-driven or as I call it Domain Debt. I have not even heard someone ever considering such a thing. Or did I?

(Mis)Understanding

Alberto Brandolini once said

„It’s developer’s (mis)understanding, not expert knowledge, that gets released in production.“

If this is true (and I do believe it is), am I correct to claim that developers must become domain experts? I am convinced this should be the case. Even more, I sometimes meet such developers – sorry, domain experts – out there.

But what if developers are not such experts, yet? What if developers are in a learning process, becoming experts? They are making failures, because failure is vital to learning. For example, sometimes developers tend to assume that concepts and models from other contexts of the domain are applicable within ther current context of work. Sometimes they build inaccurate abstractions and models, because they are missing context information – simple things like the big picture or how their current task conforms to user’s needs or business process. As a result, rework and refactoring is inevitable. And more often, developers make (probably misled) hundred thousand dollar decisions by writing code that does not align with business goals which noone has ever clearly expressed.

Whatever kind of failures or misunderstandings, it will be reflected in code. We are spending time writing code and hopefully gaining further insights while doing so. We are refactoring code, adjusting models, re-shaping contexts, or changing teams and responsibilities. This takes time, effort, and money. So, is not this code an expense driver? Is this debt – domain, non-technical debt?

A story on debt analysis

Let me give you a concrete example to explain what I mean.

I was working with a few teams on a large project re-implementing parts of a software system for a railway company. Of course, when working on a railway system you are modelling (sometimes not so) obvious things, such as train runs, segments, route sections, legs, stops, and so on.

We had contexts and use cases which heavily relied on different models of train runs. In one context a train run consists of a sequence of stops. A stop basically refers to a train station and an arrival and departure time. In other contexts, we had use cases which made use of train runs consisting of a sequence of sections. A section has an origin and destination station with departure and arrival time, respectively. Moreover, there were models where a train run consists of a sequence of segments, each segment consisting of sections. Other in turn consists of segments which consist of stops.

Sometimes these models seem to be interchangeable, though they differed in subtleties. For example, you could easily transform a sequence of sections into a sequence of stops. However, depending on context, one or the other model did suit a certain use case better than another.

Variations of train run models.

Variations of train run models.

Back then we had a team which was struggling with implementation of certain use cases. Implementation felt awkward to them and every change to the software took quite a while. The team had discussions with an architect and came back with a „new“ model. They drew some UML diagrams and were quite happy with their achievements. When it came to cost and effort estimation for a refactoring (basically a rewrite of core parts), the estimated effort was quite high. Hence, we tried to figure out a way to have an incremental refactoring which allows them to gain some quick wins upfront.

Reducing debt by gaining insight

We tried to find such quick wins by analyzing the code and especially the change history of code. By using methods from Adam Thornhill’s book Software Design X-Rays – interestingly subtitled Fix Technical Debt with Behavioral Code Analysis – we could find some hotspots in code. These hotspots were parts of the code which seem to have a high complexity, underly a bunch of changes, had a high coupling, and were touched by nearly all developers at some point.

I did some more investigation on these hotspots and found some interesting insights. When I looked at the history of that code, I first found a method introduced which looked similar to this:

List<Section> getSections() {
    return getSegments()
        .map(segment -> segment.getSections().stream())
        .collect(Collectors.toList();
}
This method was used to support delta calculation in case train runs change, e.g. when additional stops are planned:
List<Delta> calculateDelta(TrainRun base, TrainRun new) {
    return calculateDelta(base.getSections(), new.getSections()) {
}

And I found a line of code like this:

List<Section> sectionsAdded = detectSectionsAdded()

where detectSectionsAdded uses the calculateDelta method.

Going forward in the history of that code, I came to a point where this last line was changed to

List<Section> stopsAdded = detectSectionsAdded()

This little change in wording made me curious. Were we calculating based on stops or sections here?
I dug into this change and found a corresponding change with the same commit in the calculateDelta method. The method changed to this:

List<Delta> calculateDelta(TrainRun base, TrainRun new) {
    return calculateDelta(calculateStopsFrom(base.getSections()), calculateStopsFrom(new.getSections()));
}

List<Stop> calculateStopsFrom(List<Section> sections) {
    // ...
}

And then later in history another change caught my attention. A change which moved the line of code which made me curious to another class and rephrased it to:

List<Stop> stopsAdded = detectStopsAdded(diff);

On first, this looked like a simple refactoring. The delta detection algorithm was changed from working based on sections to working based on stops. But now the code was cluttered with transformations from segments to sections to stops … back and forth.

Don’t make the same mistake twice

After seeing this, I understood why the team was struggling with the implementation and could not get up to speed. To me it became clear that we needed to understand whether the algorithm should use stops or sections as core of the model. I assumed the team was going through a learning process and gained some insights while implementing the stop-based algorithm. Our UX designer and the product owner both confirmed my assumption during a review. For both it was clear that from an end-user perspective, only a stop-based algorithm made sense.

Remember that the team wanted to rewrite the service? I took a second look at the proposed „new“ model, especially the UML diagrams. At the core of a class diagram I saw a „TrainRun“ with „Sections“. And moreover, a saw „Stops“ derived from „Sections“. The total opposite of what the team had learned and the experts proposed.

Obviously, I totally advised against this rewrite. Instead I proposed some smaller refactorings to get rid of the sections at all. However, that is not the point. My key takeaway from this was not my dissuasion, but a question that came to my mind.

What am I trying to tackle here?

What am I trying to tackle here? The team wanted a refactoring. I was suggesting a refactoring. Are we not doing refactorings, because of technical debt? At least that is what I hear throughout most of our industry. „We need to refactor this, to reduce technical debt.“

In contrast, the large codebase built around the existing section-based model was clean, well tested, and followed many software design principles. However, the model was fundamentally wrong due to the developers having a lack of expertise in this part of the domain. As a result of this design choice, the cost of adding every new feature was higher than it needed to be if the model had been stop-based.

What about debt? As said before, debt primarily acts as an expense driver. The initial implementation of the model took time and money. So did the refactorings done by the team. The rewrite would have even cost a lot of money. So, in this sense, there were a couple of expense drivers.

But was this technical debt? No word about microservice, Kafka, Kubernetes, databases, missing documentation, bad coding style, little code coverage … you name it. Just a model that did not suit the use case. Maybe just developer’s (mis)understandings in code. Beside that code may always be a technical artifact, I could not find any technical aspects here.

Domain debt

So this is not technical debt, it’s business understanding debt, or as I like to think of it domain debt. And there are tons of such debt out there. We as a software industry, we talk way to little about it. We take way to less responsibility for it.

I see a lot of responsibility within the Domain Driven Design (DDD) community, though. Carving domains with bounded contexts to keep models consistent and demarcate them. Or keep an Ubiquitous Language to build a rigorous and common language for everyone involved in the project. Look at methods like Event Storming or Domain Story Telling to get a quick basic understanding of the domain in even early product/project stages.

We have the tools and methods at hand to prevent such Domain Debt. Now developers, product owners, projects managers, and everyone involved in our industry, please take your responsibility to prevent and tackle Domain Debt.

 

 

 

Acknowledgements

Thank you Nick Tune for the review of the first draft, your feedback and contribution.