AWS re:Invent 2017

Amazon ECS (EC2 Container Service) provides Container Management Service in AWS without needing to own Cluster Management Infrastructure.

Amazon EKS

Amazon EKS is a managed Kubernetes service in AWS. EKS manages 3 Kubernetes masters in 3 AZs and manages the software upgrades. It integrates with Elastic Loadbalancing, IAM and VPC. EKS is in preview and will be GA in early 2018

AWS Fargate

AWS Fargate allows running containers without managing servers or clusters.

Fargate mode: Package containers, Specify your Container specs, IAM and Networking and AWS will manage the containers for you

Traditional ECS (and EKS) run in EC2 mode. Use ECS or EKS to manage a cluster of servers and schedule tasks. You will be responsible for managing server lifecycle. The sections below have a comprehensive coverage of Traditional ECS. In the future, I will cover EKS and Fargate.

Key facts and Glossary

  • Cost: Amazon ECS is free of cost and we only pay for the resources we use (EC2 instances/EBS Volumes etc..)
  • Service Scope: ECS is a regional service that can be deployed across multiple availability zones within a VPC
  • Containers: ECS supports Docker containers. Docker Containers contain everything that needs for an application to run including all dependencies like code, runtime, system tools and libraries.
  • Image: Containers are created from a read-only template called Image. Images are typically built from a Dockerfile
  • Registry: Images are stored in a Docker registry. You can use your own registry, or Docker Hub or AWS provided ECR (EC2 Container Service)
  • Task Definitions: Task Definitions are blueprints that define what containers to use, their resource specifications like memory/cpu, the ports that need to be exposed, the volumes that will be used, permissions of containers using IAM role, networking details.
  • Task: Task is the instantiation of Task Definition. Tasks can be run individually or as part of a Service Definition
  • Cluster: ECS Cluster is a logical grouping of container instances. ECS downloads the container images into the EC2 instances and run the containers through Task Definitions
  • Container Instances: Container Instances are EC2 instances that are part of ECS Cluster. These can be part of one or more ASGs and also be individual EC2 instances. Container Instances must be installed with Docker and ECS Agent or alternatively can use an ECS Optimized AMI from Amazon. Container instances must have the IAM role with ecsInstanceRole permissions. Container instances may not be relocated to a different cluster or you may not change the instance type.
  • ECS Service: An ECS Service is a way to run a specific version of Task Definition with specified number of tasks and a deployment plan
  • Container Agent: ECS Container agent is a docker container that runs in every ECS Container Instance. Container Agent syncs with ECS Service to run the tasks accurately and report status. Container Agents need an IAM role for ECS Container Instances to run with right permissions and also require connectivity to ECS API endpoints.

ECS Capabilities

Scheduling Tasks

  • Tasks can be run manually using RunTask API
  • Tasks can be run in a schedule using CloudWatch events (like cron jobs)
  • Custom schedulers like blox can be plugged in
  • Tasks can be run using Services that allow configuration of a specific number of tasks to be run with a deployment configuration of how to update new versions. Check Services

Task Placement Strategies

When placing tasks, certain techniques can be applied to achieve desired results

  • binpack places tasks based on least amount of available CPU or Memory minimizing the number of container instances
  • random places tasks randomly
  • spread places tasks based on key:value pairs for example, you can spread by AZ and then by instanceId

Task Placement Constraints

When placing tasks, certain constraints can be observed.

  • distinctInstance: Places each task in a different instance
  • memberOf: Places a task based on an expression for example place only on t2 instances or a specific AMI. Refer to the link for details

Task level IAM permissions

Each Task Definition can be associated with an IAM role for fine-grained permissions

Task level Networking

  • bridge mode: Take advantage of dynamic port mappings
  • host mode: high performance, but container and host ports have to match, no dynamic ports
  • awsvpc mode: Task will get an ENI and private IP. Security Groups can be associated with each Task Definition providing fine-grained security, The instance types will limit how many such tasks can be run due to ENI limits

Data Volumes

  • Using sourcePath attribute, containers can share a persistent volume
  • Using an empty host, containers can share a scratch volume that’s not persisted across task stop/start
  • You can mount a read-only volume (like docroot) across many containers
  • You can mount volumes from other containers in same Task Definition using volumesFrom

Service Load balancing

  • An ECS Service can be load balanced with ALB/NLB or ELB. Each Service can only be attached to 1 Loadbalancer
  • Application Loadbalancer(ALB): Supports Application Layer (HTTP/HTTPS), Dynamic Ports, Path-based Routing, Priority rules and SSL Termination. No TCP load balancing
  • Network Loadbalancer(NLB): Supports transport layer (TCP/SSL). High throughput. Supports dynamic ports
  • Classic Loadbalancer(ELB): Supports both HTTP/HTTPS and TCP/SSL. Doesn’t support dynamic ports.
  • If a task fails health check, it will be killed

Service Autoscaling

  • Service Autoscaling adjusts desired count within the boundaries of Min/Max Capacity
  • Uses CloudWatch alarms to autoscale.
  • Both ECS metrics based CW alarms and Custom CW alarms can be used as triggers

Scaling Container Instances

  • If Container Instances are part of an ASG, they can be scaled using the ECS Console or by modifying the ASG desired
  • ASG can be configured with ECS based Reservation and Utilization alarms to configure scaling policies

Container Registry (ECR)

Amazon ECR is a managed Docker registry service.

  • ECR is account level Registry and regional service.
  • The EC2 Container Instance should have IAM permissions to access ECR
  • ECR only supports private images and needs authentication from an AWS account

Logging

  • ECS Container Agent logs can be shipped to CW logs.
  • The Container Instances will need appropriate IAM permissions
  • Container logs can be sent to CloudWatch using awslogs Log driver

Draining Container Instances

You can prevent scheduling tasks into Container Instance by changing its status to DRAINING using ECS Console or ECS API call. This capability can be used to do AMI updates

Remote Management of Container Instances

  • You can use EC2 System Manager to remotely perform tasks like Cleaning up Docker images, perform security updates, view logs etc...
  • Run Command will need appropriate IAM policy

Running Containers at startup time

  • Many time, we may need System containers that have to be run exactly once in every instance for e,g security/monitoring agents. Running them in startup scripts won’t give resource visibility to ECS.
  • You can run ECS managed tasks at start up by using a special User data section as described in this link. You will need runTask IAM policy for ecsInstanceRole to accomplish this

Private Registry Authentication

Private Docker registries can be authenticated as described here

Image and task clean up

Unused Images and finished tasks can be clean up using ECS agent settings as described here

Access Container and Agent Metadata

  • The Container metadata can be accessed using an environment variable ECS_CONTAINER_METADATA_FILE. This can be used to query about various details of containers like Image, Port mappings
  • ECS agent provides API access for introspection using http://localhost:51678/v1/metadata

Proxy Configuration

  • Many corporates will be behind firewalls and will need Proxy configured
  • ECS Agent can be configured with HTTP Proxy as described in the line

ECS Available Metrics

  • CPUReservation
  • CPUReservation
  • CPUUtilization
  • CPUUtilization

CW Event integration

ECS publishes events that can be used as triggers for CW events and can invoke Lambda functions as targets to take actions

Best Practices

  • Task Definitions should group containers with a common purpose. Arbitrarily grouping containers will make scheduling difficult
  • If ECS Agent is disconnected, make sure that you de-register the Container Instance to prevent corrupted state
  • Always stay up-to-date with ECS Container Agent versions
  • Validate your Docker version with ECS Container Agent version