About the author

Hi, I'm Ben. I've been trying to make computers do things I want them to do for years.

Over the last decade people have been paying me to get computers to do things they want them to do as well.

Want to get in touch? You can reach me on hi.ben@dartoxia.com (...wtf is a dartoxia?)

How to enable pre-emption in nomad

Created on: 2021-11-04 12:09

I needed preemption to be enabled for services in my nomad cluster to shuffle around some existing services and allow a new larger multi service job to be allocated. Specifically, I had the distinct hosts constraint on the services in my job, and there were no nodes immediately available to allocate them to.

Annoyingly if you don't enable preemption during server bootstrapping, you need to enable it against the API directly, this stanza isn't used to set the scheduler config after bootstrapping and the CLI doesn't seem to support changing the scheduler settings.

From a machine that can speak to a nomad server you can find the current scheduler config with the following (this is the default scheduler config for Nomad 1.1.6):

-> % curl -s http://localhost:4646/v1/operator/scheduler/configuration | jq .
{
  "SchedulerConfig": {
    "SchedulerAlgorithm": "binpack",
    "PreemptionConfig": {
      "SystemSchedulerEnabled": true,
      "BatchSchedulerEnabled": false,
      "ServiceSchedulerEnabled": false
    },
    "MemoryOversubscriptionEnabled": false,
    "CreateIndex": 5,
    "ModifyIndex": 5
  },
  "Index": 5,
  "LastContact": 0,
  "KnownLeader": true
}

Specify your desired config in schedulerUpdate.json. I want to switch to the spread scheduler, and enable preemption on services, so my config looked like:

{
  "SchedulerAlgorithm": "spread",
  "MemoryOversubscriptionEnabled": false,
  "PreemptionConfig": {
    "SystemSchedulerEnabled": true,
    "BatchSchedulerEnabled": false,
    "ServiceSchedulerEnabled": true
  }
}

Then apply it with:

-> % curl -X PUT http://localhost:4646/v1/operator/scheduler/configuration \
     --data @schedulerUpdate.json
{"Updated":true,"Index":38465}%

Now if you check the config, it has been updated:

-> % curl -s http://localhost:4646/v1/operator/scheduler/configuration | jq .
{
  "SchedulerConfig": {
    "SchedulerAlgorithm": "spread",
    "PreemptionConfig": {
      "SystemSchedulerEnabled": true,
      "BatchSchedulerEnabled": false,
      "ServiceSchedulerEnabled": true
    },
    "MemoryOversubscriptionEnabled": false,
    "CreateIndex": 5,
    "ModifyIndex": 38465
  },
  "Index": 38465,
  "LastContact": 0,
  "KnownLeader": true
}

After applying this change, I restarted the job that was failing to allocate, nomad then preempted lower priority jobs and was able to allocate all of my services.