just disconnect the internet

2024-07-31

So, let's say that a security vendor, we'll call them ClownStrike, accidentally takes down most of their Windows install base with a poorly tested content update. Rough day at the office, huh? There are lots of things you could say about this, lots of reasons it happens this way, lots of people to blame and not to blame, etc., etc., but nearly every time a major security incident like this hits the news, you see a lot of people repeating an old refrain:

these systems shouldn't be connected to the internet.

Every time, I get a little twitch.

The idea that computer systems just "shouldn't be connected to the internet," for security or reliability purposes, is a really common one. It's got a lot of appeal to it! But there's not really that many environments where it's done. In this unusually applied and present-era article, I want to talk a little about the real considerations around "just not connecting it to the internet," and why I wish people wouldn't bring it up if they aren't ready for some serious considerations.

We Live in a Society

In the abstract, computers can perform valuable work by doing, well, computation. In practice, the computation is rarely that important. In industry, there is a lot more "information technology" than there is "computation." Information technology inherently needs to ingest and produce information, and while that was once facilitated by a department of Operators loading tapes, we have found the whole Operator thing to be costly and slow compared to real-time communications.

In other words, the modern business computer is almost primarily a communications device.

There are not that many practical line-of-business computer systems that produce value without interconnection with other line-of-business computer systems. These interconnections often cross organizational and geographical boundaries.

I am thinking, for example, of the case of airline reservation and scheduling systems disabled by the CrowdStrike, er, sorry, whatever I called them incident. These are fundamentally communications systems, and have their origins as replacements for the telephone and telegraph. It is not possible to simply not internetwork them, because networking is inherent to their function.

Networking is important to maintenance and operations

But let's consider systems that don't actually require real-time communications to perform their business purpose. Network connectivity still tends to be really valuable for these.

For one, consider maintenance: how does a system obtain software updates if you have no internet connection? How is that system monitored?

And even if you think you can avoid those requirements by declaring a system "complete" and without the need for any updates or real-time monitoring or intervention, business requirements have the frustrating habit of changing over time, and network connectivity reduces the cost of handling those changes tremendously.

What does it mean for a system to not be connected to the internet?

First, we need to consider the fact that there are as many forms of "not connected to the internet" as there are ways of being connected to the internet. For this reason alone, proposing that a system shouldn't be internet-connected is usually too nonspecific to really discuss. Let's consider a menu of possibilities:

List 1:

A single device with no network connection at all.
A system of devices that is "air-gapped" in the strictest sense, with no connection to any network other than its private local-area one, where data never crosses the security boundary.
That same system, but someone carries DVD-Rs across the security boundary to introduce new data to the private network.
That same system, but a cross-domain solution or "data diode" allows movement of data from a wider (or lower-security) network into the private (or higher-security) network.
That same system, where the cross-domain solution does not have a costly and difficult to obtain NSA certification.

List 2:

A system of devices which interconnect over a private wide-area network using fully independent physical infrastructure with physical precautions against tampering.
That same system, but the independent physical infrastructure is run through commodity shared ducts.
That same system, but the infrastructure is leased dark fiber.
That same system, but the infrastructure is wavelengths on lit fiber.
That same system, but the infrastructure is "virtual private ethernet" implemented by the provider using, let's say, MPLS.
That same system, but the infrastructure is "virtual private ethernet" implemented by the provider using a tunneling solution with encryption and authentication.

List 3:

A system of devices which interconnect over a common-carrier network (such as, we might even dare say, the internet), where private network traffic is tunneled through encryption and authentication performed by hardware devices.
That same system, but the hardware devices do not have a costly and difficult to obtain NSA certification.
That same system, but the tunneling is performed by a software solution that is well-designed such that it configures the operating system network stack, at a low level, to prevent any traffic bypassing the tunnel, and this has been validated by someone much smarter than me.
That same system, but not so well designed and validated by someone like me.
That same system, but the "software solution" is like Wireguard and an iptables script that has been "thoroughly tested" by someone on Reddit.

List 4:

A system of devices which interconnect on a private network that has interconnection to the internet that is strictly limited by policy-based routing or other reliable methods, such that only very narrowly defined traffic flows are possible.
That same system, but the permissible network flows are documented in some old Jira tickets and some of them were, you know, just thrown in to make it work.
That same system, but it's basically protected by a firewall that's pretty liberal about outbound flows (maybe with IPS or something), and pretty restrictive about inbound flows.

List 5:

An AWS private VPC without any routing elsewhere.
An AWS private VPC with PrivateLinks and other AWS networking baubles that allow it communicate with other private VPCs.
That same system, but some of the interconnected VPCs can route traffic to/from the internet.
An AWS private VPC with NAT GW and IGW but the security groups are set up pretty tight in both directions.

These are all things that I have seen described as non-internet-connected. Take a moment to work through each list and mark the point at which you think that is no longer a reasonable claim. It's okay, I'll wait.

I'm not going to provide threat modeling for all of these scenarios because it would go on for pages, but you can probably see that pretty much every option is at least slightly different in terms of attack surface and risk.

This might seem like an annoying or pedantic argument, but this is actually the biggest reason I get irritated when people say that something should never be connected to the internet. What do they mean by that? When someone says that an airline reservation system shouldn't be internet-connected, they clearly don't actually mean the strictest form of that contention (no network connection at all) unless their name is Adama and they liked when airline reservation centers had big turntables of paper cards they spun around to check off your seat. They must mean one of the midpoints presented above, which are pretty much all coherent positions, but all positions with different practical considerations.

This ambiguity makes it hard to actually, seriously consider the merits of dropping internet connectivity.

Non-internet connected systems are so very, very annoying

In my day job, I work with a wide variety of clients with a wide variety of cultures, IT architectures, and so on. Some of them are in highly regulated industries or defense or whatever, and so they actually conduct software operations in networks with either no internet connectivity or tightly restricted internet connectivity.

When I discover this to be the case, I mentally multiply all of the schedule/cost estimates by a factor of, I would say, 3 to 10, depending on where they fall on the above lists (usually 3x to 5x for list 5 and 10x to a bajillion times forever for list 1, just rule of thumb).

Here's the thing: virtually the entire software landscape has been designed with the assumption of internet connectivity. Your operating system wants to obtain its updates from online servers. If you are paying for expensive licenses for your operating system, the vendor probably offers additional expensive licenses for infrastructure to perform updates within your private network. If you are getting your operating system for free-as-in-beer, there's a good bet you can figure it out yourself, but if you're using anything too new and cutting-edge it might be a massive hassle.

But that just, you know, scratches the surface. You probably develop and deploy software using a half dozen different package managers with varying degrees of accommodation for operating against private, internal repositories. Some of them make this easy, some of them don't, but the worst part is that you will have to figure it out about fifty times because of the combinatorial complexity of multiple package managers, multiple ways of invoking them, and multiple environments in which they are invoked.

If you are operating a private network, your internal services probably don't have TLS certificates signed by a popular CA that is in root programs. You will spend many valuable hours of your life trying to remember the default password for the JRE's special private trust store and discovering all of the other things that have special private trust stores, even though your operating system provides a perfectly reasonable trust store that is relatively easy to manage, because of Reasons. You will discover that in some tech stacks this is consistent but in others it depends on what libraries you use.

A bunch of the software you use will want to perform cloud licensing and get irritated when it cannot phone home for entitlements. You will have to go back and forth with your vendors to figure out a workaround somewhere between "add these ninety seven /16s to your firewall exceptions" and "wait six months while we figure out the internal process to issue you a bespoke licensing scheme."

All of your stuff that requires updates or content updates will have some different process you have to follow to obtain those updates and then provide them internally. Here's a not at all made up example, but a real one I have personally lived through: you will find that a particular (and particularly hated) enterprise software vendor provides content updates for offline use only through a customer support portal that is held over from three acquisitions ago, and that it is only possible to get an account in that customer support portal by getting an entitlement manually added in a different customer support portal held over from two acquisitions ago. It will take over three months of support tickets and escalations through your named account executive to get accounts opened in successively older customer support portals until you can finally get into the right one, which incidentally has an invalid TLS cert you are reassured is not something to worry about. Once you download your offline content update, you will find that the documented process to apply it no longer works, and it will take a long email chain with one of the engineers to get the right instructions. You paid a five-figure sum for a 1-year license to this software and it has now nearly elapsed while you figured out how to use it. You will of course get an extension on that license pro bono, because this is enterprise software sales and what is a quarter worth of my salary between friends, but they won't manage to issue the extension license until after your original one has already expired, causing a painful interruption in CI pipelines and a violent revolution by the developers.

I am sorry, you are not my therapist, I will try to stop remembering that dark time in my career. Don't worry, the software in question seems to have fallen out of favor and cannot hurt you.

So, like, that's an over-the-top example (but seriously, a real one!), but you get the point. It's not really that any individual part of operating in an offline environment is hard---I mean some of them are, but most of them aren't. It's a death by a thousand cuts. Every single thing you ever do is harder when you do not have internet connectivity, and you will pay for it in money and time.

The largest problem by far is that almost everyone who develops software assumes that their product will not need to operate in an offline environment, and if they find out that it does they will fix that with duct tape and shell scripts because it only matters for a small portion of their customers. You, the person with the offline environment, will become the proud owner of their technical debt.

None of this really needs to be that way, it's just how it is! There are not really that many offline environments, and they tend to be found in big institutions that have adapted to the fact that they make everything cost more and take longer, and are surprisingly tolerant of vendors who perform a three stooges routine every time you say "air-gap," because that's what pretty much every vendor does. Except for like Red Hat, I genuinely think Red Hat is pretty good about this, but you betcha what you save in time you are paying in cash.

Not many people do this

That's kind of the point, right? The problem with non-internet-connected environments is that they are rare. The stronger versions, things from List 1 and List 2, are mostly only seen in defense and intelligence, although I have also seen some banks with pretty impressive practices. You will note that defense and intelligence, and even banks, are also famously industries where everything costs way too much and takes way too long. These correlations are probably not coincidences.

Even the weaker forms tend to be limited to highly-regulated industries (finance and healthcare are the big ones), although you see the occasional random software company that just takes security really seriously and keeps things locked down. Occasionally.

Okay, let's stop just complaining

Here's the thing: I genuinely do not think that "fewer systems should be connected to the internet" is a bad idea. I really wish that things were different, and that every part of the software industry was more prepared and more comfortable operating in environments with no or limited internet connectivity. But that is not the world that we currently live in! So let's get optimistic, what should we be doing right now?

Apply restrictive network policy on as much of your stuff as possible. Cloud providers generally make this easier than it has ever been before, it's not all that easy but it's also not all that hard to operate a practical non-internet-routed environment in AWS. If you stay within the lanes of all the AWS managed services, it's mostly pain-free. You will pay for this, but, you know, AWS always gets their check anyway.
Build software with offline environments in mind. Any time that you need to phone home to get something, provide a way to disable it (if practical) or a way to override the endpoint that will be used. If the latter, keep in mind that you will also need to come up with a way for a customer to feasibly host their own endpoint. If you keep to simple static files, that's really easy, just nginx and a directory or whatever. If it's an API or something, well, you're probably going to have to ship your internal implementation. Brace yourself for the maintenance overhead.
Try to think about the little assumptions that go into connecting to other services that become more complex in an offline environment. Please, for the love of God, do not assume you can reach LetsEncrypt. But that's not the only TLS problem, offline environments virtually always imply internal certificate authorities. Use the system trust store. Please. I am begging you.
Avoid fetching any kind of requirements or dependencies at deploy time. One of the advantages Docker supposedly brought us was making all of the requirements of a given package self-contained, but then I still run into Docker containers that can't start if they can't reach the npm repos or something. And now I have yet another place to fix configuration and trust store and etc., in your stupid Docker container. It has made things more difficult instead of less.

Have I mentioned that Docker, paradoxically, actually makes offline environments more difficult to manage? Yeah, because virtually every third-party Docker container has at least a TLS trust store you'll have to modify. Docker is, itself, a profound example of how the modern software industry simply assumes that everything is running On The Internet.

Anyway

I wrote this out in a bit of a huff because I have seen "why were they connected to the internet at all?" like four times in response to the CrowdStrike incident. I know, I am committing the cardinal sin of taking things that people on the internet say seriously, but I feel obligated to point out: internet connectivity is pretty much completely orthogonal to what happened. CrowdStrike content updates are the kind of thing that, in a perfect world, you would promptly make available in your offline environment. In practice, an internal CrowdStrike update mirror would probably lag days, weeks, months, or years behind, because that's what usually ends up happening in "hard" offline environments, but that's a case of two wrongs making a right.

Which they do, more often than you would think, in the world of information technology.

Don't worry, I'll be back next time with something more carefully written and less relevant to the world we live in. I just got in a mood, you know? I just spent like half the day copying Docker images into an offline environment and then fixing them all. I have to find something to occupy the time while a certain endpoint security agent pegs the CPU and makes every "docker save" take ten minutes.