We have seen cases where a Dask workers get stuck in zombie state where, despite having a nanny process, the worker is dead, but the Fargate task is still running and driving up costs.
Would this project accept a patch that enables a health check for ECS/Fargate tasks? The second part of the solution (stopping tasks in an unhealthy state) is probably out of the scope of the this project.
We have seen cases where a Dask workers get stuck in zombie state where, despite having a nanny process, the worker is dead, but the Fargate task is still running and driving up costs.
Would this project accept a patch that enables a health check for ECS/Fargate tasks? The second part of the solution (stopping tasks in an unhealthy state) is probably out of the scope of the this project.