When the server still hums but the vendor has left

The PowerEdge under the desk has run for nine years. Three rebuilds, two memory upgrades, one motherboard swap. Last week, Dell's support portal showed it as past end-of-service-life. The renewal email from the OEM arrives, and it asks for three times what the gear cost new.

Most teams treat this moment as binary. Replace, or pay the renewal. The third option — the option that requires discipline — gets called risky and gets discarded.

This is the moment to think clearly.

The four arms of the decision

Every piece of aging hardware sits at the same fork. You have four choices, not two:

Keep it under OEM support. The expensive, comfortable option. Pay what the vendor asks, sign the contract, sleep at night. Most teams default here without examining the math.
Keep it under third-party maintenance. The same coverage band, often 40–60% cheaper, from a provider whose business is keeping post-EOL gear alive. The credible TPM market has matured into a real industry.
Repair-as-needed without a contract. Self-insure. Stock the failure-prone parts on a shelf. Accept that an outage might mean a few hours of degraded service. Cheap, demands discipline.
Replace. Buy new. Refresh the rack. Walk forward.

Most decision frameworks collapse this to (1) or (4) because the choice is easier when the menu is shorter. That is intellectually lazy. Each of the four has a real role.

Three questions that change the answer

Before any of those four becomes correct, three things have to be true.

First: what does the machine do?

A domain controller and a print server are not the same thing. A 9-year-old server that authenticates 800 users every morning is mission-critical. A 9-year-old server that hosts a quarterly report PDF is not. The hardware doesn't know what it's running; you do. Match the support tier to the workload, not the chassis.

Second: what is the cost of an hour of downtime?

Not what you tell your CFO. What it actually costs. For an SMB law firm, an hour of email outage is a few hours of staff time recovered later. For an e-commerce shop on a Saturday, it is direct lost revenue. Until you have honest numbers for this, you cannot price your support tier correctly. A 24×7×4 onsite contract on a server whose downtime cost is $200/hour is irrational.

Third: what are the failure modes that actually happen, and which can you handle yourself?

Power supplies fail. Disks fail. Backplane capacitors leak. iDRAC batteries die. Each of these has a known mean-time-between-failure curve and a known replacement procedure that takes 15 to 45 minutes from a competent technician. If you have a competent technician on staff, and you can pre-stock $300 of spare parts, you have already replicated 80% of what an OEM support contract delivers.

Answer these three honestly and the right answer presents itself. Skip them and you will either over-pay for support you don't need, or you will run blind on something that should have been replaced two years ago.

Where each option actually wins

OEM support wins when the workload is mission-critical, the team has no spare technical capacity, the regulatory environment requires named-vendor support documentation, or the platform is recent enough that the OEM's discount is competitive with TPM.

TPM wins when the gear is past OEM end-of-sale but not past your usefulness threshold, the workload is important but tolerates a half-day of degraded service, the budget delta from OEM is meaningful, and the TPM provider has demonstrated capability with your specific platform (ask for references on the same model).

Self-insure wins when the workload is non-critical, the team has someone who can swap a PSU at 11pm without panic, parts availability is documented (eBay supply is usually fine for 5–10 year old enterprise gear), and the downtime budget is honest.

Replace wins when the workload is critical AND any of the following: the gear is past its rational service life on power efficiency alone, the workload has grown past what the platform can serve, regulatory requirements have shifted, or the next 5 years of any other path costs more than a refresh now.

The pattern that keeps showing up

The teams that consistently make this decision well share a habit: they have a written hardware register, updated quarterly, with three columns per asset — workload criticality, current support state, and downtime cost in dollars per hour. That single document changes the conversation from what do we feel like doing to what does the math say.

The teams that consistently get it wrong default to one of two patterns. Either they auto-renew OEM contracts because that's last year's invoice and finance doesn't ask questions. Or they refresh everything on a fixed 5-year cycle whether the workload needs it or not, because that's what the vendor's sales rep told them is "best practice."

Neither pattern engages with the actual question. Both are expensive.

The honest counsel

End-of-support is not end-of-usefulness. The vendor's product roadmap is not your operations roadmap. The discipline of the operator is in knowing which gear to retire, which to keep on a quiet contract, and which to keep running on a shelf of spare parts and a clean restore plan.

The Zen of TPM is not in the contract. It is in the clarity of the decision before the contract.