Introduction
Sometimes there is a workspace that appears in the TFE overview page but crashes when selected or produces error : `SIC-0001`. At the same time - that workspace name or id can not be found when querying any API endpoints. It happens when due to the broken connection during saving of workspace state - the lock is not fully written, and it is left in this semi-locked state.
Symptoms
Note: For installations using Terraform Enterprise v202205-01 through v202308-1, all container names now follow the naming convention of "tfe-<service>"
Example:
ptfe_atlas > tfe-atlas ptfe_archivist > tfe-archivistNote - older version can have "ptfe" prefix
More information can be found in the release notes with a change here.
There are no real symptoms in the logs or sometimes you can catch a glimpse of `SIC-0001 error` in Atlas logs, with seemingly no real reason behind it. The UI though gives you a crash or not accessible page with `HTTP error 500`.
Double check would be - to get the JSON workspace object in Rails console of the TFE like this :
1. Login to TFE instance shell using your prefered method
2. From the user that has access rights to execute Docker commands run :
docker exec -it ptfe_atlas bash /usr/bin/init.sh /app/scripts/wait-for-token -- bash -i -c 'cd /app && ./bin/rails c'
3. Now you are in the Rails console, execute the following (name "aws-ws-dev-01" used here and later as an example ) :
Workspace.find_by(name: 'aws-ws-dev-01')
4. Observe the workspace state output :
irb(main):003:0> Workspace.find_by(name: 'aws-ws-dev-01')
=> #<Workspace id: 257, name: "aws-ws-dev-01", external_id: "ws-7HABBJu62a23234",
organization_id: 2, created_at: "2020-10-30 10:31:58",
updated_at: "2020-01-10 12:03:26", archived_at: nil, auto_apply: false, periodic_run: 0,
queue_run_on_artifact_upload: false, terraform_version: "0.12.14",
variable_set_id: nil, trace_resource_id: "c84251b4-8d0a-5167-9e4f-ae73bd961d2b",
locked_by_id: 57, locked_by_type: "User", locked_reason: nil,
environment: "default", current_state_version_id: [FILTERED], working_directory: "",
ssh_key_id: nil, current_run_id: 4939, queue_all_runs: false,
file_triggers_enabled: true, trigger_prefixes: [], speculative_enabled: true,
source: "tfe-api", source_name: nil, source_url: nil, description: nil,
allow_destroy_plan: true, auto_destroy_at: nil, agent_pool_id: nil,
execution_mode: "remote", readme_id: nil, structured_run_output_enabled: nil,
global_remote_state: [FILTERED], apply_duration_average: nil,
plan_duration_average: nil, policy_check_failures: nil, run_failures: nil,
workspace_kpis_runs_count: nil, remote_state_access: nil, runs_not_plan_only_count: 21,
tag_list: nil>
most important here is the part locked_reason: nil - and it shows that workspace is locked, but.. in a strange way. Also in the example can be observed source: "tfe-api" - so probably there was a fail during communication of the API, that resulted in this state of locking.
Steps to resolve
We are going to force-unlock it via Rails console. Same approach as above
- Login to instance using your preferred method
- Login into Rails console
docker exec -it ptfe_atlas bash /usr/bin/init.sh /app/scripts/wait-for-token -- bash -i -c 'cd /app && ./bin/rails c'
- Find workspace ( the name from the example above is used ) :
ws = Workspace.find_by(name: 'aws-ws-dev-01')
- Unlock it :
ws.unlock(nil)
- Relogin in UI and confirm that workspace is now accessible