Tomammon.net – Operational Intent

This is Part 5 in an 7-part series discussing the www.tomammon.net online resume application. Check out the architectural overview for some context, or see the links at the end of the article to navigate to other posts in this series.

What is “Operational Intent”?

I see Operational Intent as a set of maintenance practices that go along with the decisions made during the design of a system. It’s the stuff you have to do to keep the system running smoothly. If the Operational Intent is created at the same time you are designing the network, you are far more likely to design something that the business can actually consume and use as a competitive advantage. Hint: the person designing the network should not be the principal author of Operational Intent.

For a more concrete example, consider the routing policy portion of a network design. Let’s say that route redistribution between multiple routing domains is a part of your design. There are a number of ways to control the flow of prefixes from one domain to another, each with their pros and cons. Once you have selected a method for controlling the redistribution of routing information, you have created an entity that has to be cared for and looked after. For example, if you chose prefix lists to control the routing information flow, you have to understand when and how to maintain these prefix lists. To determine Operational Intent for this part of the design, document the answers to questions like these:

  • When and why will the prefix lists be modified?
  • How specific should the matches in the prefix lists be? Should they always be exact matches, or can we use variable prefix lengths? Does it even matter how long the match is, so long as it matches the new prefixes being added?
  •  How will our automation framework interact with this component? Is the name of the prefix list constructed using some type of logic? Or is the name just known in a database or some other state store? Should humans normally be touching this part of the configuration?
  • If human operators are going to be touching this, what is the minimum level of skill that I, as the designer, assume they will possess? If the operators’ skill changes over time, does that have practical consequences for the function of the system down the road?
  • What are the consequences to the stability of the system if this prefix list is accidentally deleted? What if it’s blown wide open with something like “permit 10.0.0.0/8 le 32”?
  • As parts of the network are divested or consolidated, will this component be audited and cleaned up, and if so, how?

It’s not always possible to know the answers to these types of questions in advance, but they serve as a way to bring the design of the system down to operational reality. If you are designing something and can’t answer most of the questions above, or if the answers trouble you, that’s a signal that you need to rethink the design. If you are an operator, and the answers to these questions are unclear or are wildly different when asked of different members of your team, that’s a sign that something isn’t right with the design.

Another valuable role of an OI document is to connect the expertise of the system designer to the expertise of the operator. In my career I have met some very talented operators who could run circles around me in their ability to monitor and automate the networks I have built. Working with individuals like that is a very rewarding experience for me, because my designs almost always become more practical and more functional after they look at it from a practical operational perspective. The OI document allows the designer to spell out their assumptions about how the system will be maintained. That communication of assumptions cannot be overvalued.

Operational Intent for Tomammon.net

Here’s a representative sample of what an OI document might look like for Tomammon.net. Since the networking is pretty simple, we’ll focus on the application services.

Maintenance of Static Content and Application Code

  • Static content and application code are all hosted on github in the tommmonet repo.
  • Changes made to any of these elements are pushed up to the github repo from the dev environment, and then pulled down by each Content Node independently.
  • For risky changes, a Content Node can be pulled out of the round-robin DNS A record before pulling the new code down, and then tested. Rolling back to an older, functional version of the code is handled by git on the Content Node.

Monitoring the Application and Database Layers

  • The app server can be monitored directly via the “testapi” call, see code example for the API below.
  • The database contains a static table with generic content, which is retrieved using the “testdb” call, see code example for the API below.

Common Database Problems and Solutions

  • If connectivity problems arise in the transport network between the Content Nodes and the master database, replication problems can result. Before addressing these problems, confirm that network connectivity is healthy and stable between the slave and master.
  • Database replication problems can often be cleared by simply restarting replication from the MariaDB client (on the slave side, in the Content Node inside the tandb_slave container), using these commands:

 

Final Thoughts

If there is a healthy balance of power between the designers and the operators of systems, the concepts of Operational Intent can produce real wins for the business. At its heart, OI is about collaboration between designers and operators as equal partners, enabled by open communication about technical decisions and requirements.

In the next article in the series, I’ll make some confessions about the weaknesses and problems of the resume application and its supporting infrastructure.

Your documentation will be read by almost nobody. Write it anyway.

Documentation is a chore that IT professionals hate doing. For most of us, it’s right up there with picking staples out of the carpet or scrubbing the tile around the toilet in your home. If you’re nodding in agreement right now, may I humbly suggest a different perspective?

Documentation can provide some substantial benefits, but you will probably have to adjust your ideas about how and when to create it to realize those benefits. Let’s talk about some of the reasons, spoken and unspoken, that we don’t ever seem to find the time to document.

“I am a senior engineer. It’s not a good use of the company’s opex to have me doing something a technical writer or junior engineer should be doing.”

The more “senior” you are, the more your documentation has the potential to positively influence your organization. A Network design engineer, for example, will often find flaws in his design while he is drawing diagrams that explain the solution he is building. This has certainly been the case with me whenever I have worked in that role. This implies, then, that the time to document is not after the design work is done, but rather, during the design effort. How cool would it be to find your problems and mistakes before you present budget numbers to your management or customer, rather than after? If you’ve ever felt the pit in your stomach when you realize that you’ve forgotten a license or goofed up the math on the number of boxes, you know what I’m talking about.

Also, it is highly unlikely that a technical writer or junior engineer can put together documentation with anywhere near the efficiency that you can. Perhaps you are insecure about your ability to effectively communicate in writing what is in your head. Maybe the hundreds of buttons and knobs in Visio turn what should be a 5 minute task into a 2 hour ordeal for you. Whatever your hangup is, it is well worth it to you and your career to figure out how to overcome those things.

“The network is the documentation”

I have some sympathy for this point of view, because a well-designed and well-implemented network will be configured consistently. Someone who truly understands the protocols and the interoperation of them can look at the configuration of a couple of devices and pretty well predict what the original designer’s objectives were.

This ideal network exists almost nowhere, however. Networks experience configuration “decay” or “drift” over time, which happens for a variety of reasons. There is a critical shortage of networking professionals who actually understand networking at a fundamental level, so it is unlikely (statistically speaking) that your organization employs very many of them. Good docs can be the thing that makes it possible for a less experienced engineer to succeed in their assigned tasks. Good docs can help an experienced and truly senior engineer cut past hours of trying to read your mind and actually accomplish a complex change or troubleshoot a difficult problem.

If you are working in one of the rare shops that has embraced DevOps and the principles of CI/CD in their network infrastructure, the documentation is even more important to you, because it lays the foundations for the code that will have to be written to bring your solution to life.

“I don’t have time to write docs”

This argument is the first cousin of the much-maligned and super-lame “I don’t have time to automate”.  Good documentation allows an implementation engineer to successfully implement something that you designed without having to redo all of the mental gymnastics that you went through to come up with the design, and this is a good thing. If you wear both hats, then good docs help you quickly get back to that mental zone you were in when you created the design. Humans forget things at an alarming rate, and you are likely not immune to the forgetting.

If your team is large, or if the rate of change in your network infrastructure is high, you can accomplish impressive economies of scale with good docs. Without good docs, engineers will continue to run like hamsters on a wheel, continually re-doing the same work over and over and over again. If the success of other engineers matters to you, (and it should if you have any vision about where our industry is headed) then you will see that your time is well spent writing good documentation.

“If the other engineers were smart enough, they wouldn’t need me to babysit them with a diagram or wall of text about how networking works – they should be able to figure it out on their own.”

This kind of statement is usually born of hubris. An engineer who says or thinks this is likely deluded about their own awesomeness, and is quick to talk down to or about other “less awesome” engineers. If this is you, I strongly urge you to reconsider your position. You never know which of those junior engineers has had it with your treatment of them and is working their tail off at night and on the weekends, while you’re relaxing, to catch up with you. When they do catch you (and it will come faster than you think), you will end up leaning on your position of authority instead of your expertise, and you’ll lose credibility.

A much better course would be to learn some humility and write down your thoughts and reasoning, and maybe even draw a few diagrams, in the spirit of helping the junior guys along. You may end up with an ally who is committed to, and proficient at, doing all of the tasks you don’t want to do, and everyone will win.

“If I document the design of the network, I’m giving away the secret sauce. People will use it against me. Or worse, I will lose my relevance and be seen by management as not vital to their success.”

It’s true that documentation can be used for political purposes, sometimes even at multiple levels of management above you. I have personally experienced having my docs used in this way – but it has never hurt me.

Some engineers fear that their manager will see them as a replaceable cog in the machine if they document what they’re doing. Or they think that if nobody has to come to them personally for their wisdom about how to build or fix the network, they will lose power and influence. My experience is the opposite. Having my name on the docs has always given me more influence, not less. I’ve had good managers and bad managers, but I’ve never had a manager that tried to manage me out or give me less interesting work because I shared my knowledge with my peers. In the end, the things that make up our experience and expertise can’t be distilled down into a document – but the explanation of our designs can be, and should be.

Documentation does not make you replaceable – it makes the work that you do consumable. If it’s easily consumable, it’ll very quickly become the standard that everyone uses.