Foundations of MPLS: Label Switching

In this article I will use the terms LSR, Edge LSR, Ingress LSR, Egress LSR, P, PE, and LSP. I won’t spend much time defining them. For a review of the terminology, here’s a decent summary video. My aim with this piece is to illustrate the fundamentals of label switching by looking inside a functional MPLS network with CLI commands and packet captures.

Whenever you see a packet capture image in this article, you can click on it to go to that packet capture’s location in my github repo to download it. Sometimes it can be handy to explore the packet capture with your local copy of wireshark.

In this post, we will follow an ICMP packet from the time it is sent from router PE3, through the network, to when it lands on its destination,  router PE4. You may want to open the below diagram in another window so that you can keep it in sight as you read through the rest of the article.

The label-switched path (LSP) that our ICMP packet follows looks like this:

A few facts:

  • The LSP only exists in the MPLS domain. In simple, classical MPLS the beginning and end of the LSP are always on an Edge LSR (or PE) router. Even if we were pinging from Host8 to Host7, the LSP would still only exist on routers PE3, P1, P2, and PE4.
  • The LSP is unidirectional. We are concerned only with the path the packet takes towards router PE4. The packet coming back will take a different LSP.

How does traffic know to follow this LSP?

To answer that, let’s see what the packet looks like as it travels along the LSP. Here it is when it exits router PE3’s Gi1 (upstream) interface.

An MPLS header with a label value of 102 has been inserted between the ethernet and IP headers. This is what the router is using to forward the packet.

The forwarding table in the router shows that label 102 is assigned to the prefix that we are pinging, 4.4.4.4/32:

The next time this packet will be on the wire is between P1 and P2. Let’s take a look at it as it leaves P1’s Gi2 (upstream) interface:

An MPLS header is again seen, but this time we see a different label value. Label 102 has been swapped out for Label 201. Now let’s look at the forwarding table on P1.

You’ve probably noticed there are not one but two label values associated with a prefix at each LSR – a Local Label and an Outgoing Label.

This is a good time to stop and ask the question, on that first router, PE3, how did label 102 get picked? Two discrete things are going on with label 102:

  1. The label value 102 (from router PE3’s perspective) has to be assigned by something. That something is the label manager process running on the upstream router, P1. In most vendor implementations, the LDP process is responsible for assigning labels to prefixes.
  2. The label value 102 is chosen by P1 to represent the prefix 4.4.4.4/32. We can clearly see from our show command output above that PE3 knows about that mapping between 4.4.4.4 and label 102. How did it learn that? LDP is one protocol that MPLS LSR’s can use to tell each other about which labels are assigned to which prefixes.

Router P1 allocated label 102 to the prefix 4.4.4.4/32, and then turned around and used LDP to tell router PE3 that if it wants to send packets to 4.4.4.4, it’s going to have to put label 102 in the MPLS header.

LDP Quick Summary

  • LDP discovers neighbors using multicast hellos and then nails up TCP sessions between all discovered neighbors using port 646. This TCP session is what the LSRs use to exchange their label bindings. A label binding is simply the mapping between a prefix and a label value.
  • The most common configuration of LDP will only discover directly-connected neighbors.

The last time we will see the packet on the wire will be as P2 forwards it towards the final destination on PE4. This will reveal an important behavior of MPLS forwarding.

The MPLS header is gone, even though the P2 – PE4 link is an MPLS link. Why is this happening? This behavior is known as Penultimate Hop Popping (PHP). To understand why PHP is useful, check out this and this. We can see that PHP was specifically requested by router PE4 by examining the forwarding table on router P2 and the label bindings table on router PE4.

Router P2 knows that it needs to perform PHP, as shown by Pop Label in the Outgoing Label field. But how did it learn that it needs to pop the label for traffic destined to this prefix? The same way that it learns all of the other labels – the upstream router told it to, by use of a special label called “implicit-null”. Here’s the relevant section of the label bindings table on router PE4:

Router PE4 has allocated a label with a value of 3, which is a reserved label. This label (shown as imp-null in the example above) instructs the downstream router to pop the label and then use its unicast IP routing table to forward the packet to the tail end of the LSP (router PE4).

Let’s run a traceroute to see what the LSP looks like from end to end. Cisco’s implementation shows us which labels are used on each hop.

Now let’s add the labels for 4.4.4.4/32 on each hop to our diagram to show the complete end to end LSP.

The Role of the IGP

You probably have a lingering question in the back of your mind at this point – “How does the router know about the destination prefix (4.4.4.4/32) in the first place?” The answer is simple – the IGP provides the prefixes, like it normally does in a non-MPLS network. In this example, we are using IS-IS as our IGP. Here’s what the routing table looks like on PE3:

Labels are assigned based on which prefixes are in the routing table. If we look at the full label bindings table on PE3 we see that a label has been assigned for each of the above routes:

Which IGP should you use in your MPLS network? Anything will work – even RIP or static routes if it comes down to it. For some of the more advanced features, such as traffic engineering, you’ll need to run a link-state IGP. But for basic implementations, it matters not how the routes get into the routing table, as long as they get there. Choosing an IGP is an entire discussion on its own, so I won’t go in to it here.

So What?

If running label switching in the network results in the packets taking the exact same path as would be accomplished just using a flat IGP, why would you do it?

MPLS applications (such as L3VPN, L2VPN, or Traffic Engineering) run over the top of a label-switched infrastructure. The LSP we have been using in this article is a transport mechanism, and is generally not useful all by itself. For example, if you were going to run L3VPNs on your MPLS network, you would see an additional label on the label stack as the packets transit across the network. But we have to have the label switching foundations laid before we can start using some of these other applications.

Why Does This Matter?

  • Label switching provides the foundations upon which MPLS applications rest.
  • To quickly troubleshoot issues with MPLS forwarding, you must understand how LSPs are built.
  • One control-plane protocol that builds LSP’s for us is LDP – there are other choices, such as RSVP. It’s also possible to build LSP’s by carrying labels inside of BGP, as is the case with L3VPN’s.

 

 

 

 

 

 

 

 

Tomammon.net – The Ugly Bits

This is Part 6 in an 7-part series discussing the www.tomammon.net online resume application. Check out the architectural overview for some context, or see the links at the end of the article to navigate to other posts in this series.

In this post we’ll take a look at some of the design problems in the resume application.

Some Obvious Problems

The first set of problems comes from the low-budget nature of the project. There is no proper datacenter for most of the components, and the hypervisors running the data warehouse and one of the Content Nodes have lots of single points of failure in their power, cooling, and network connectivity. There’s also no redundancy for internet connectivity. The system is built to be able to handle these points of failure – for example, if the data warehouse goes away for a short period, both Content Nodes will continue to serve content, but no updates to the data can be made. But in general, this is not a robust application in terms of infrastructure.

Backend and Container Networking

The use of remote access VPN as a backend network (for remote Content Nodes) is hokey at best. But, when you’re on a shoestring, you do what you have to do to get things working. On the container side, I could have used one of the available overlays to connect the slave databases to the master, at a container level, instead of letting Docker do its thing with iptables. I chose not to go that route in order to speed up delivery of the project.

DNS issues

The DNS infrastructure is somewhat fragile.

  • The Content Node HA mechanism is round-robin DNS. I have plans to build some monitoring logic to pull out the A record representing a non-responsive content node, but this is not currently implemented. This means that if one of the content nodes go offline some percentage of client traffic will end up getting dropped on the floor.
  • Even when the monitoring logic pulls the record, most browsers cache DNS data independently of the OS, and they also don’t necessarily care about what the stub resolver in the OS is doing after they get an answer. Just because the TTL is set to 300 seconds doesn’t mean that the browser’s DNS cache is going to expire it that quickly. Even if it did, 5 minutes is a long time to wait for a failed content node to disappear.
  • DNS is hosted entirely on Route53. ’nuff said.
  • There is no real CDN. If you happen to be in Europe and if your stub resolver happens to pick the Romanian content node, you’ll get a little better performance than if it picked the North American node. But that choice is not deterministic based on your location.

Application Code

While debugging a different, unrelated, problem with the web service containers, I realized I had a single point of failure between the web server and app server layers inside the content nodes. You can find this in the index.php code on my github account. Do you see it? Hint: look at line 15 in index.php. This relates to how I am using mounts in docker to keep the code on the docker host instead of inside each container. I can think of a few ways to mitigate this defect, but do you have ideas? If so, put them in the comments below or hit me up on twitter and I’d love to discuss them with you.

I’m sure that I’m missing other problems with my design, and I would love to hear any feedback that anybody wants to share. My purpose in building the application has been to broaden my horizons and try to get a taste of life outside my silo. If you are someone who builds web applications, I’d love to hear any criticism you might have – I’m sure I will benefit greatly from it.

The next post is the last in this series. I’ll summarize what I have learned and how my perspective has changed, and hopefully that will inspire you to look outside your particular silo as well.

Tomammon.net – Operational Intent

This is Part 5 in an 7-part series discussing the www.tomammon.net online resume application. Check out the architectural overview for some context, or see the links at the end of the article to navigate to other posts in this series.

What is “Operational Intent”?

I see Operational Intent as a set of maintenance practices that go along with the decisions made during the design of a system. It’s the stuff you have to do to keep the system running smoothly. If the Operational Intent is created at the same time you are designing the network, you are far more likely to design something that the business can actually consume and use as a competitive advantage. Hint: the person designing the network should not be the principal author of Operational Intent.

For a more concrete example, consider the routing policy portion of a network design. Let’s say that route redistribution between multiple routing domains is a part of your design. There are a number of ways to control the flow of prefixes from one domain to another, each with their pros and cons. Once you have selected a method for controlling the redistribution of routing information, you have created an entity that has to be cared for and looked after. For example, if you chose prefix lists to control the routing information flow, you have to understand when and how to maintain these prefix lists. To determine Operational Intent for this part of the design, document the answers to questions like these:

  • When and why will the prefix lists be modified?
  • How specific should the matches in the prefix lists be? Should they always be exact matches, or can we use variable prefix lengths? Does it even matter how long the match is, so long as it matches the new prefixes being added?
  •  How will our automation framework interact with this component? Is the name of the prefix list constructed using some type of logic? Or is the name just known in a database or some other state store? Should humans normally be touching this part of the configuration?
  • If human operators are going to be touching this, what is the minimum level of skill that I, as the designer, assume they will possess? If the operators’ skill changes over time, does that have practical consequences for the function of the system down the road?
  • What are the consequences to the stability of the system if this prefix list is accidentally deleted? What if it’s blown wide open with something like “permit 10.0.0.0/8 le 32”?
  • As parts of the network are divested or consolidated, will this component be audited and cleaned up, and if so, how?

It’s not always possible to know the answers to these types of questions in advance, but they serve as a way to bring the design of the system down to operational reality. If you are designing something and can’t answer most of the questions above, or if the answers trouble you, that’s a signal that you need to rethink the design. If you are an operator, and the answers to these questions are unclear or are wildly different when asked of different members of your team, that’s a sign that something isn’t right with the design.

Another valuable role of an OI document is to connect the expertise of the system designer to the expertise of the operator. In my career I have met some very talented operators who could run circles around me in their ability to monitor and automate the networks I have built. Working with individuals like that is a very rewarding experience for me, because my designs almost always become more practical and more functional after they look at it from a practical operational perspective. The OI document allows the designer to spell out their assumptions about how the system will be maintained. That communication of assumptions cannot be overvalued.

Operational Intent for Tomammon.net

Here’s a representative sample of what an OI document might look like for Tomammon.net. Since the networking is pretty simple, we’ll focus on the application services.

Maintenance of Static Content and Application Code

  • Static content and application code are all hosted on github in the tommmonet repo.
  • Changes made to any of these elements are pushed up to the github repo from the dev environment, and then pulled down by each Content Node independently.
  • For risky changes, a Content Node can be pulled out of the round-robin DNS A record before pulling the new code down, and then tested. Rolling back to an older, functional version of the code is handled by git on the Content Node.

Monitoring the Application and Database Layers

  • The app server can be monitored directly via the “testapi” call, see code example for the API below.
  • The database contains a static table with generic content, which is retrieved using the “testdb” call, see code example for the API below.

Common Database Problems and Solutions

  • If connectivity problems arise in the transport network between the Content Nodes and the master database, replication problems can result. Before addressing these problems, confirm that network connectivity is healthy and stable between the slave and master.
  • Database replication problems can often be cleared by simply restarting replication from the MariaDB client (on the slave side, in the Content Node inside the tandb_slave container), using these commands:

 

Final Thoughts

If there is a healthy balance of power between the designers and the operators of systems, the concepts of Operational Intent can produce real wins for the business. At its heart, OI is about collaboration between designers and operators as equal partners, enabled by open communication about technical decisions and requirements.

In the next article in the series, I’ll make some confessions about the weaknesses and problems of the resume application and its supporting infrastructure.

Tomammon.net – Public Cloud Integration

This is Part 4 in an 7-part series discussing the www.tomammon.net online resume application. Check out the architectural overview for some context, or see the links at the end of the article to navigate to other posts in this series.

In this post we’ll see the role that public cloud plays in the application. The majority of the business logic that powers www.tomammon.net runs in private facilities. However, it makes sense (as is the case in many modern applications) to run part of it from the public cloud. Here’s the strategic view of the application again:

 

The biggest cloud component of the application is the DNS Service.

DNS Design

Amazon’s Route53 is used to provide name services. Route53 is authoritative for tomammon.net. Global site redundancy for the application is provided by populating a single A record with the public IP addresses of each Content Node to create a round-robin DNS entry. Two content nodes are currently in service, as shown here:

The TTL for these records is set low at 300 seconds, to allow for changes in the recordset to propagate quickly. This approach has some major flaws, which we’ll discuss in Part 6: The Ugly Bits. Also, in Part 5: Operational Intent, I’ll show how the AWS API is used to make these DNS records responsive to the health of the application.

Static Assets

Some of the static assets (such as my profile picture image) for the site are located in an Amazon S3 bucket, which is called via the tomammon.com domain:

 

It’s pretty painless to set up a simple static website on S3. All of the dynamic components of the application are located in private compute facilites for www.tomammon.net, so I’m only using public cloud for very specific parts of the application, which keeps the costs quite low.

 References

  1. Hosting a Static Website on Amazon S3
  2. Example: Setting up a Static Website Using a Custom Domain
  3. How Do I Configure an S3 Bucket for Static Website Hosting?

 

Your documentation will be read by almost nobody. Write it anyway.

Documentation is a chore that IT professionals hate doing. For most of us, it’s right up there with picking staples out of the carpet or scrubbing the tile around the toilet in your home. If you’re nodding in agreement right now, may I humbly suggest a different perspective?

Documentation can provide some substantial benefits, but you will probably have to adjust your ideas about how and when to create it to realize those benefits. Let’s talk about some of the reasons, spoken and unspoken, that we don’t ever seem to find the time to document.

“I am a senior engineer. It’s not a good use of the company’s opex to have me doing something a technical writer or junior engineer should be doing.”

The more “senior” you are, the more your documentation has the potential to positively influence your organization. A Network design engineer, for example, will often find flaws in his design while he is drawing diagrams that explain the solution he is building. This has certainly been the case with me whenever I have worked in that role. This implies, then, that the time to document is not after the design work is done, but rather, during the design effort. How cool would it be to find your problems and mistakes before you present budget numbers to your management or customer, rather than after? If you’ve ever felt the pit in your stomach when you realize that you’ve forgotten a license or goofed up the math on the number of boxes, you know what I’m talking about.

Also, it is highly unlikely that a technical writer or junior engineer can put together documentation with anywhere near the efficiency that you can. Perhaps you are insecure about your ability to effectively communicate in writing what is in your head. Maybe the hundreds of buttons and knobs in Visio turn what should be a 5 minute task into a 2 hour ordeal for you. Whatever your hangup is, it is well worth it to you and your career to figure out how to overcome those things.

“The network is the documentation”

I have some sympathy for this point of view, because a well-designed and well-implemented network will be configured consistently. Someone who truly understands the protocols and the interoperation of them can look at the configuration of a couple of devices and pretty well predict what the original designer’s objectives were.

This ideal network exists almost nowhere, however. Networks experience configuration “decay” or “drift” over time, which happens for a variety of reasons. There is a critical shortage of networking professionals who actually understand networking at a fundamental level, so it is unlikely (statistically speaking) that your organization employs very many of them. Good docs can be the thing that makes it possible for a less experienced engineer to succeed in their assigned tasks. Good docs can help an experienced and truly senior engineer cut past hours of trying to read your mind and actually accomplish a complex change or troubleshoot a difficult problem.

If you are working in one of the rare shops that has embraced DevOps and the principles of CI/CD in their network infrastructure, the documentation is even more important to you, because it lays the foundations for the code that will have to be written to bring your solution to life.

“I don’t have time to write docs”

This argument is the first cousin of the much-maligned and super-lame “I don’t have time to automate”.  Good documentation allows an implementation engineer to successfully implement something that you designed without having to redo all of the mental gymnastics that you went through to come up with the design, and this is a good thing. If you wear both hats, then good docs help you quickly get back to that mental zone you were in when you created the design. Humans forget things at an alarming rate, and you are likely not immune to the forgetting.

If your team is large, or if the rate of change in your network infrastructure is high, you can accomplish impressive economies of scale with good docs. Without good docs, engineers will continue to run like hamsters on a wheel, continually re-doing the same work over and over and over again. If the success of other engineers matters to you, (and it should if you have any vision about where our industry is headed) then you will see that your time is well spent writing good documentation.

“If the other engineers were smart enough, they wouldn’t need me to babysit them with a diagram or wall of text about how networking works – they should be able to figure it out on their own.”

This kind of statement is usually born of hubris. An engineer who says or thinks this is likely deluded about their own awesomeness, and is quick to talk down to or about other “less awesome” engineers. If this is you, I strongly urge you to reconsider your position. You never know which of those junior engineers has had it with your treatment of them and is working their tail off at night and on the weekends, while you’re relaxing, to catch up with you. When they do catch you (and it will come faster than you think), you will end up leaning on your position of authority instead of your expertise, and you’ll lose credibility.

A much better course would be to learn some humility and write down your thoughts and reasoning, and maybe even draw a few diagrams, in the spirit of helping the junior guys along. You may end up with an ally who is committed to, and proficient at, doing all of the tasks you don’t want to do, and everyone will win.

“If I document the design of the network, I’m giving away the secret sauce. People will use it against me. Or worse, I will lose my relevance and be seen by management as not vital to their success.”

It’s true that documentation can be used for political purposes, sometimes even at multiple levels of management above you. I have personally experienced having my docs used in this way – but it has never hurt me.

Some engineers fear that their manager will see them as a replaceable cog in the machine if they document what they’re doing. Or they think that if nobody has to come to them personally for their wisdom about how to build or fix the network, they will lose power and influence. My experience is the opposite. Having my name on the docs has always given me more influence, not less. I’ve had good managers and bad managers, but I’ve never had a manager that tried to manage me out or give me less interesting work because I shared my knowledge with my peers. In the end, the things that make up our experience and expertise can’t be distilled down into a document – but the explanation of our designs can be, and should be.

Documentation does not make you replaceable – it makes the work that you do consumable. If it’s easily consumable, it’ll very quickly become the standard that everyone uses.

 

 

 

 

 

Tomammon.net – Services Infrastructure and Application Code

This is Part 3 in an 7-part series discussing the www.tomammon.net online resume application. Check out the architectural overview for some context, or see the links at the end of the article to navigate to other posts in this series.

In this post we will take a look at the application and the services that it consumes. I wrote most of the code myself. The only code I didn’t write myself was the frontend HTML and CSS, which I took from a template several years ago. Unfortunately, the author didn’t leave his contact information in the source code and I have long since forgotten his name. If you are him, please email me so that I can give you credit.

Let’s start with a quick recap of the high level design of the application. We have Content Nodes, a Data Warehouse, DNS, and Internet Transport. We’ll focus on the content nodes for today. We’ll start our tour at the point that the user’s browser connects to the one of the content nodes. In the next post (Public Cloud Components) we will look at how DNS fits in to all of this.

The content node is where the bulk of the application’s work is done. It is built using a set of 6 docker containers, which interact as shown in the diagram below.

Here are a few things to notice:

  • The Load Balancer distributes incoming HTTP requests using the round-robin method to each of the websrv-* containers. Here’s a snippet of the relevant HAProxy configuration:

The full config file is available here.

  • Once an HTTP request has been dispatched from the load balancer, the websrv-* containers serve the static content (static HTML code in index.php, etc.) and make API calls to the appsrv-* containers for the parts of the app that are dynamic. For example, the data in the “Vendors and Technologies” part of the resume:

lives in several MariaDB database instances, and not natively in the index.php file. This content is retrieved by index.php making an API call to the app server layer, like this:

  • The appsrv-* containers answer the API calls made from the web server layer, translate them into SQL queries, query the local MariaDB slave, and return the results of the query back to the web server layer in JSON format. Here’s an example of one of those API endpoints, which resides on the appsrv-* containers:

  • The MariaDB database is a slave receiving row-based replication from the master DB instance, which we’ll cover a little later on in the post.

That’s a quick drive-by of the flow of data through a Content Node. If you’re interested in more details on any of the code, leave a comment below or email me and I’d be happy to elaborate. You can also see all of the configuration files and application code used by the resume application in my github account if you are so inclined.

Container Configuration

A linux container is, basically, a way to package application code in such a way that it is portable and free of dependencies on the host on which it runs. Sort of like a virtual machine, but way more lightweight. Using containers is one way to implement a microservices architecture. I chose to run my containers with Docker. See the references section at the end for some good resources on learning containers and Docker.

The containers are connected as shown below.

  • For all but the database slave, I have built custom containers for each service in the application, and these are hosted on dockerhub. The Dockerfiles for these are located in my github account. A Dockerfile is a set of instructions for building a custom docker container.
  •  A Docker network is really just a software bridge running on the host. After deploying the app, you can see that this bridge is running on the host:

  • The load balancer has front and back side connectivity, similar to how you might implement it with a physical appliance. It has a bind mount to allow the configuration file to live on the host server instead of inside the container itself. This container is based on Alpine Linux, a very lightweight distro that is popular in container deployments. Here’s some output from the docker inspect command that shows how the load balancer’s bind mount and network connections are configured:

  • The webserver and appserver containers are based on the official PHP docker images – the only reason I had to customize them was because I needed the mysqli database extensions which were not part of the official PHP docker images that I wanted to use. The Dockerfile for the appserver image shows how this is done:

  • docker-compose is a Python utility that allows you to specify all of the parameters for your containers in a YAML configuration file. Without it, you’d have to specify all of the options for each container, such as the networks it connects to, the image it requires, the mounts it requires, etc., at runtime, which can get tedious. The full docker-compose for www.tomammon.net is available in my github account, but here’s a section of the docker-compose.yml file that I built to specify the parameters for all of the containers. This one focuses on the load balancer container:

Database Replication

The slave database instances (tandb_slave containers) are read-only copies of the database running on the master. Changes to the database are made on the master and then replicated as SQL transactions out to the slaves. As we’ll discuss in Part 6: Operational Intent, this scheme has some maintenance and operations drawbacks, but overall it is the simplest choice available, and as we all know, Simplicity is Sophistication.

Designing for High Availability

We can see that there are several places in the application where failures could be absorbed while allowing the application to continue functioning. For example, there are multiple web services and multiple app services, and the load balancer is doing health checks to enable it to remove broken containers from service if they fail. But we still have some single points of failure. The load balancer itself is not in an HA pair, and neither is the database slave.

These were deliberate choices in the design. There is always a tradeoff between complexity, maintenance burden, and redundancy. Often, the first two come at the expense of the latter, and vice versa. To balance these factors, I chose to wrap up the entire set of microservices in a VMWare virtual machine, which I could then replicate as many times as I wanted. This replication would also give me one other key benefit – I now have the possibility of locating the content near to the users who will be consuming it. Add to this the relationship of the master database to the application, and we end up with an application that looks like this:

All of this is running over a combination of private networks, public Internet connectivity for end users, private VPN tunnels that ride over public Internet transport, and public cloud infrastructure. Currently, the production application has two Content Nodes, one in Salt Lake City, Utah, and another in Romania. As we progress in this series of blog posts, we’ll discuss the interaction between all of these components in greater detail.

References

  1. HAProxy Official Documentation
  2. HAProxy Quickstart w/ full example config file
  3. Creating a Simple REST API in PHP
  4. How To Install and Use Docker on CentOS 7
  5. Learn Docker in 12 Minutes
  6. Overview of Docker Compose
  7. How Does MySQL Replication Really Work?
  8. How To Set Up Master Slave Replication in MySQL
  9. Tom’s Github Account – tomammon.net repo
  10. Tom’s DockerHub Account

TomAmmon.net – Network Infrastructure

This is Part 2 in an 7-part series discussing the www.tomammon.net online resume application. Check out the architectural overview for some context, or see the links at the end of the article to navigate to other posts in this series.

The Network Base Layer

The network infrastructure surrounding tomammon.net is pretty simple. All of the components to the left of the Internet cloud in the diagram below run on my home network infrastructure, but only the parts of my home network relevant to our topic are shown. Let’s start with a connectivity-focused perspective.

Network Connectivity Design

Most of the connectivity is delivered with physical appliances, while the nodes that run the application code are all virtualized. The load balancer is the only device that straddles these two worlds. This device is a VNF running the following software:

  • CentOS Linux for the operating system
  • HAProxy as the load balancing engine
  • The Free Range Routing suite to provide connectivity into the OSPF routing domain

The load balancer advertises a loopback, taken from PA space, into the core network. This public IP address is one of the addresses configured in the DNS A record for the application. We’ll discuss the DNS components in Part 4: Public Cloud Integration.

OSPF Design

A simple single-area OSPF design keeps things clean. Since the routing domain is quite small, there is no need to run multiple areas or turn nerd knobs. Here are a few things to keep in mind:

  • The Internet Router unconditionally redistributes a default route into the OSPF domain to provide Internet access to the core network. Since there are no other exit points that do not depend on this single internet circuit, there’s no need to make the advertisement of the Type 5 LSAs conditional on a received default route.
  • The VPN headend (A Cisco ASA 5505) is running as an Anyconnect Remote Access VPN. It injects /32 static routes for remote clients, which are then redistributed into OSPF to provide connectivity for remote Content Nodes. The remote Content Nodes run OpenConnect, an open source SSL VPN client, to connect to the VPN headend.
  • The load balancer is not technically an ABR since all it is really doing, from an OSPF perspective, is advertising a stub network that represents its public loopback IP.
  • The load balancer is only required because other public websites run on this same hypervisor, using the same public IP address. This first load balancer simply directs traffic intended for www.tomammon.net to the local Content Node using an “ACL” – HAProxy’s equivalent of an F5 iRule.

Compute Virtualization and Containers

The hypervisors running on the local network provide the Data Warehouse containing all of the dynamic content for the application, one of the content nodes, and the load balancer. Since the data warehouse resides on a shared database host, it is containerized. From the perspective of the resume application, there is no requirement for the data warehouse to run in a container – it was just a convenient way to provide an isolated environment on a compute resource that was already in service. We’ll go in to much more detail on the use of containers in Part 3.

The links below dive into the details of www.tomammon.net.

Under the Hood of www.tomammon.net

I began my network automation journey in earnest a few years ago. As I learned about the building blocks of automation, I found my mind wandering beyond the confines of vendor-driven network designs. In my years as a networking engineer I have learned that I need to deliver a real product, satisfying real business requirements, to truly understand how a thing works. With those principles in mind I set out to build my online resume (www.tomammon.net) as a microservices-oriented web application. In this series of blog posts I will describe its architecture and share lessons that I have learned along the way.

Requirements

My resume application needed to demonstrate my basic functional understanding of:

  • Microservices Architecture
  • API’s
  • Application Delivery Control (load balancing)

This was in addition to all of the packet delivery expertise I have developed over the course of my career. In addition, I wanted to provide for some level of fault tolerance and high availability. I would need to make use of virtualization technologies to meet all of the above requirements with minimal costs, since no revenue would be directly generated by the application.

Finally, I should say that my goal was not to become a software developer, or to claim that I have any special proficiency with any of these technologies, outside of the networking components. My purpose with the project was to gain an appreciation for the workloads that run on the networks I build, and to gain the perspective needed to be a successful network architect.

At the highest level, the resume application consists of 4 major components:

  • Content Nodes – the discrete compute units that serve application content to users
  • Data Warehouse – a central database that houses all of the dynamic content
  • Domain Name System (DNS)
  • Internet Transport

The links below will dive into the details of how these components interact and connect. I’ll also discuss the supporting network infrastructure and other parts of the system that make it all work.