Introduction
Problem
When trying to change the list of allowed workspaces on an agent pool, upon saving you get the error:
Error saving agent pool
Agent pool is still being used by workspaces in your organization
This even occurs when the UI does not show any workspaces connected.
Trying to update the agent pool via the API or TFE provider, you will get an error:
The remote server returned an error: (422) Unprocessable Entity.
Cause
Workspaces connected to an agent pool were deleted in TFE version v202208-1 or older.
The workspace is marked for deletion and should be removed by an automated job after 30 minutes.
However, this is not the case and remnants of the workspace are left behind still pointing to the agent pool.
TFE versions after v202208-1 delete the workspaces properly.
However, until TFE version v202308-1 it is still impossible to update the agent-pool when there are remnants of old workspaces present.
An API call to show the agent pool and it's connected workspaces also does not show these remnant workspaces.
Sample output:
{
"data":{
"id":"apool-pSJHZQ3TDq6qiC14","type":"agent-pools",
"attributes":{"name":"agent_pool",
...
},
"workspaces":{
"data":[]},
},
...
}
(This sample output is of an agent-pool that does not have any connected workspaces in the UI, but does have deleted workspaces connected.)
Overview of possible solutions
Solution:
-
Upgrade to TFE v202309-01 (733) or higher.
The remnant workspaces are still present, however saving an agent pool skips over these
remnant workspaces and should return success.
Workaround:
- Create another agent-pool and connect the workspaces to this agent-pool until you are able to upgrade to version v202309-01 (733) or higher.
The following is a workaround if you are not able to upgrade or use another agent-pool. Please read carefully and create a ticket with HashiCorp support for verification.
-
Disconnect the workspace(s) from the agent pool via Rails.
Please contact HashiCorp support before implementing step 5 of this workaround.
Share the output of all the commands step 1 to 4 for review.
1) SSH into your TFE host.
2) Connect to the rails console:
sudo docker exec -it tfe-atlas /usr/bin/init.sh /app/scripts/wait-for-token -- bash -i -c 'cd /app && ./bin/rails c'
3) Show your agent pool (replace the apool-id with the apool-id of your corrupt agent pool):
AgentPool.find_by_external_id("apool-pSJHZQ3TDq6qiC14")
Sample output
=> #<AgentPool id: 5,
external_id: "apool-pSJHZQ3TDq6qiC14",
...
organization_scoped: false>
Here the agent pool id is "5". Use this id in the next query.
4) Get all workspaces connected to the agent pool (Substitute the value of 'agent_pool_id' with the value found in the previous query):
Workspace.where('agent_pool_id = 5')
Sample output:
[#<Workspace:0x00007f625dd7e5d0
id: 602,
name: "test-ws-xtp9x",
external_id: "ws-hbbTzPogiyopdwGR",
organization_id: 2,
...
agent_pool_id: 5,
execution_mode: "agent",
...
discarded_at: Mon, 31 Jul 2023 16:07:38.883200000 UTC +00:00,
...
Compare the workspace names of the list obtained via Rails with the connected workspaces in the UI. Remove matching workspaces so you are left with a list of workspaces that are found via Rails, but do not show in the UI.
From the workspaces that are left, make sure the 'discarded_at' date is set and older then 30 minutes.
Also note that the original name of the workspace was 'test-ws', after deleting it TFE changed the name to 'test-ws-*****'.
Write down the id from the workspace, here it is 602.
5) Update a workspace to disconnect it from the agent pool
Workspace.update(602,"agent_pool_id" => nil, "execution_mode" => "remote")
Do this for all the leftover workspaces.
Outcome
You will be able to update the allowed workspaces list again of the agent pool.