Manual AWS Batch configuration

This page describes how to set up AWS roles and Batch queues manually for the deployment of Nextflow workloads with Seqera Platform.

tip

Manual AWS Batch configuration is only necessary if you don't use Batch Forge.

Batch Forge automatically creates the AWS Batch queues required for your workflow executions.

Complete the following procedures to configure AWS Batch manually:

Create a user policy.
Create the instance role policy.
Create the AWS Batch service role.
Create an EC2 Instance role.
Create an EC2 SpotFleet role.
Create a launch template.
Create the AWS Batch compute environments.
Create the AWS Batch queue.

Create a user policy

Create the policy for the user launching Nextflow jobs:

In the IAM Console, select Create policy from the Policies page.

Create a new policy with the following content:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1530313170000",
      "Effect": "Allow",
      "Action": [
        "batch:CancelJob",
        "batch:RegisterJobDefinition",
        "batch:DescribeComputeEnvironments",
        "batch:DescribeJobDefinitions",
        "batch:DescribeJobQueues",
        "batch:DescribeJobs",
        "batch:ListJobs",
        "batch:SubmitJob",
        "batch:TerminateJob"
      ],
      "Resource": ["*"]
    }
  ]
}

Save with it the name seqera-user.

Create the instance role policy

Create the policy with a role that allows Seqera to submit Batch jobs on your EC2 instances:

In the IAM Console, select Create policy from the Policies page.

Create a new policy with the following content:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "batch:DescribeJobQueues",
                "batch:CancelJob",
                "batch:SubmitJob",
                "batch:ListJobs",
                "batch:DescribeComputeEnvironments",
                "batch:TerminateJob",
                "batch:DescribeJobs",
                "batch:RegisterJobDefinition",
                "batch:DescribeJobDefinitions",
                "batch:TagResource",
                "ecs:DescribeTasks",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeInstanceAttribute",
                "ecs:DescribeContainerInstances",
                "ec2:DescribeInstanceStatus",
                "logs:Describe*",
                "logs:Get*",
                "logs:List*",
                "logs:Create*",
                "logs:Put*",
                "logs:StartQuery",
                "logs:StopQuery",
                "logs:TestMetricFilter",
                "logs:FilterLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

Save it with the name seqera-batchjob.

Create the Batch Service role

Create a service role used by AWS Batch to launch EC2 instances on your behalf:

In the IAM Console, select Create role from the Roles page.
Select AWS service as the trusted entity type, and Batch as the service.
On the next page, the AWSBatchServiceRole is already attached. No further permissions are needed for this role.
Enter seqera-servicerole as the role name and add an optional description and tags if needed, then select Create.

Create an EC2 instance role

Create a role that controls which AWS resources the EC2 instances launched by AWS Batch can access:

In the IAM Console, select Create role from the Roles page.
Select AWS service as the trusted entity type, EC2 as the service, and EC2 - Allows EC2 instances to call AWS services on your behalf as the use case.
Select Next: Permissions. Search for the following policies to attach to the role:
- AmazonEC2ContainerServiceforEC2Role
- AmazonS3FullAccess (you may want to use a custom policy to allow access only on specific S3 buckets)
- seqera-batchjob (the instance role policy created above)
Enter seqera-instancerole as the role name and add an optional description and tags if needed, then select Create.

Create an EC2 SpotFleet role

The EC2 SpotFleet role allows you to use Spot instances when you run jobs in AWS Batch. Create a role for the creation and launch of Spot fleets — Spot instances with similar compute capabilities (i.e., vCPUs and RAM):

In the IAM Console, select Create role from the Roles page.
Select AWS service as the trusted entity type, EC2 as the service, and EC2 - Spot Fleet Tagging as the use case.
On the next page, the AmazonEC2SpotFleetTaggingRole is already attached. No further permissions are needed for this role.
Enter seqera-fleetrole as the role name and add an optional description and tags if needed, then select Create.

Create a launch template

Create a launch template to configure the EC2 instances deployed by Batch jobs:

AWS Batch with Fusion v2
AWS Batch without Fusion v2

In the EC2 Console, select Create launch template from the Launch templates page.

Scroll down to Advanced details and paste the following in the User data field:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: text/cloud-config; charset="us-ascii"

#cloud-config
write_files:
  - path: /root/custom-ce.sh
    permissions: 0744
    owner: root
    content: |
      #!/usr/bin/env bash
      yum install -q -y jq sed wget unzip nvme-cli lvm2
      wget -q https://amazoncloudwatch-agent.s3.amazonaws.com/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
      rpm -U ./amazon-cloudwatch-agent.rpm
      rm -f ./amazon-cloudwatch-agent.rpm
      curl -s https://nf-xpack.seqera.io/amazon-cloudwatch-agent/custom-v0.1.json \
      #  | sed 's/custom-id/<your custom ID>/g' \
        > /opt/aws/amazon-cloudwatch-agent/bin/config.json
      /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
        -a fetch-config \
        -m ec2 \
        -s \
        -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json          
      mkdir -p /scratch/fusion
      NVME_DISKS=($(nvme list | grep 'Amazon EC2 NVMe Instance Storage' | awk '{ print $1 }'))
      NUM_DISKS=${#NVME_DISKS[@]}
      if (( NUM_DISKS > 0 )); then
        if (( NUM_DISKS == 1 )); then
          mkfs -t xfs ${NVME_DISKS[0]}
          mount ${NVME_DISKS[0]} /scratch/fusion
        else
          pvcreate ${NVME_DISKS[@]}
          vgcreate scratch_fusion ${NVME_DISKS[@]}
          lvcreate -l 100%FREE -n volume scratch_fusion
          mkfs -t xfs /dev/mapper/scratch_fusion-volume
          mount /dev/mapper/scratch_fusion-volume /scratch/fusion
        fi
      fi
      chmod a+w /scratch/fusion
      mkdir -p /etc/ecs
      echo ECS_IMAGE_PULL_BEHAVIOR=once >> /etc/ecs/ecs.config
      echo ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true >> /etc/ecs/ecs.config
      systemctl stop docker
      ## install AWS CLI
      mkdir -p /home/ec2-user
      curl -s https://nf-xpack.seqera.io/miniconda-awscli/miniconda-awscli.tar.gz \
      | tar xz -C /home/ec2-user
      export PATH=$PATH:/home/ec2-user/miniconda/bin
      ln -s /home/ec2-user/miniconda/bin/aws /usr/bin/aws
      systemctl start docker
      systemctl enable --now --no-block ecs
      echo "1258291200" > /proc/sys/vm/dirty_bytes
      echo "629145600" > /proc/sys/vm/dirty_background_bytes

runcmd:
  - bash /root/custom-ce.sh

--//--

To prepend a custom identifier to the CloudWatch log streams for AWS resources created by your manual compute environment, uncomment the | sed 's/custom-id/<your custom ID>/g' \ line and replace <your custom ID> with your custom ID. If ommitted, this defaults to custom-id.
Save the template with the name seqera-launchtemplate.

In the EC2 Console, select Create launch template from the Launch templates page.

Scroll down to Advanced details and paste the following in the User data field:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: text/cloud-config; charset="us-ascii"

#cloud-config
write_files:
  - path: /root/custom-ce.sh
    permissions: 0744
    owner: root
    content: |
      #!/usr/bin/env bash
      yum install -q -y jq sed wget unzip nvme-cli lvm2
      wget -q https://amazoncloudwatch-agent.s3.amazonaws.com/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
      rpm -U ./amazon-cloudwatch-agent.rpm
      rm -f ./amazon-cloudwatch-agent.rpm
      curl -s https://nf-xpack.seqera.io/amazon-cloudwatch-agent/custom-v0.1.json \
      #  | sed 's/custom-id/<your custom ID>/g' \
        > /opt/aws/amazon-cloudwatch-agent/bin/config.json
      /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
        -a fetch-config \
        -m ec2 \
        -s \
        -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json
      mkdir -p /etc/ecs
      echo ECS_IMAGE_PULL_BEHAVIOR=once >> /etc/ecs/ecs.config
      echo ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true >> /etc/ecs/ecs.config
      systemctl stop docker
      ## install AWS CLI v2
      mkdir -p /home/ec2-user
      curl -s https://nf-xpack.seqera.io/miniconda-awscli/miniconda-awscli.tar.gz \
      | tar xz -C /home/ec2-user
      export PATH=$PATH:/home/ec2-user/miniconda/bin
      ln -s /home/ec2-user/miniconda/bin/aws /usr/bin/aws
      systemctl start docker
      systemctl enable --now --no-block ecs
      echo "1258291200" > /proc/sys/vm/dirty_bytes
      echo "629145600" > /proc/sys/vm/dirty_background_bytes

runcmd:
  - bash /root/custom-ce.sh

--//--

To prepend a custom identifier to the CloudWatch log streams for AWS resources created by your manual compute environment, uncomment the | sed 's/custom-id/<your custom ID>/g' \ line and replace <your custom ID> with your custom ID. If ommitted, this defaults to custom-id.
Save the template with the name seqera-launchtemplate.

Create the Batch compute environments

caution

AWS Graviton instances (ARM64 CPU architecture) are not supported in manual compute environments. To use Graviton instances, create your AWS Batch compute environment with Batch Forge.

Nextflow makes use of two job queues during workflow execution:

A head queue to run the Nextflow application
A compute queue where Nextflow will submit job executions

While the compute queue can use a compute environment with Spot instances, the head queue requires an on-demand compute environment. If you intend to use an on-demand compute environment for compute jobs, the same job queue can be used for both head and compute.

note

Spot instances can significantly reduce your AWS compute costs, provided your workflow compute tasks can run on ephemeral instances.

Create a compute environment for each queue in the AWS Batch console:

Head queue with on-demand instances
Compute queue with Spot instances

The head queue requires an on-demand compute environment. Do not select Use Spot instances during compute environment creation.

In the Batch Console, select Create on the Compute environments page.
Select Amazon EC2 as the compute environment configuration.
note
Seqera AWS Batch compute environments created with Batch Forge support using Fargate for the head job, but manual compute environments must use EC2.
Enter a name of your choice, and apply the seqera-servicerole and seqera-instancerole.
Enter vCPU limits and instance types, if needed.
note
To use the same queue for both head and compute tasks, you must assign sufficient resources to your compute environment.
Expand Additional configuration and select the seqera-launchtemplate from the Launch template dropdown.
Configure VPCs, subnets, and security groups on the next page as needed.
Review your configuration and select Create compute environment.

Create this compute environment to use Spot instances for your workflow compute tasks. This compute environment cannot be assigned to the Nextflow head job queue.

In the Batch Console, select Create on the Compute environments page.
Select Amazon EC2 as the compute environment configuration.
Enter a name of your choice, and apply the seqera-servicerole and seqera-instancerole.
Select Enable using Spot instances to use Spot instances and save computing costs.
Select the seqera-fleetrole and enter vCPU limits and instance types, if needed.
Expand Additional configuration and select the seqera-launchtemplate from the Launch template dropdown.
Configure VPCs, subnets, and security groups on the next page as needed.
Review your configuration and select Create compute environment.

Create the Batch queue

Create a Batch queue to be associated with each compute environment.

note

You only need to create one queue if you intend to use on-demand instances for your workflow compute tasks. Compute environments with Spot instances require separate queues for the head and compute tasks.

Head queue
Compute queue

Go to the Batch Console.
Create a new queue.
Associate the queue with the head queue compute environment created in the previous section.
Save it with a name of your choice.

Use the AWS resources created on this page to create your manual AWS Batch compute environment.

Create a user policy​

Create the instance role policy​

Create the Batch Service role​

Create an EC2 instance role​

Create an EC2 SpotFleet role​

Create a launch template​

Create the Batch compute environments​

Create the Batch queue​