Workflows and Jobs#
Overview#
In ezbatch, a job is a basic unit of work that can be run on AWS Batch.
A job is defined by a Docker container image to run, as well as the resources required to run the job.
A workflow is a collection of jobs that depend on each other and can be set to run in a specific order. A workflow contains a list of jobs and the dependencies between them.
Creating a job#
To create a job in ezbatch, you need to define the following:
Docker Image: Specify the Docker container image that contains the code and dependencies for the job.
Command: Provide the command to execute within the container.
Resources: Define the CPU, memory, and storage requirements for the job.
Environment Variables: Set any environment variables required by the job.
S3 Mounts (optional): Specify S3 paths to read from or write to during the job execution.
Example Python code to define a job:
from ezbatch.s3 import S3Mounts
from ezbatch.workflow import EZBatchJob
job = EZBatchJob(
image="public.ecr.aws/ubuntu/ubuntu:22.04",
command="echo hello, world!",
environment={"EXAMPLE_VAR": "value"},
mounts=S3Mounts(
read=[
{"source": "s3://my-bucket/input", "destination": "/mnt/input", "recursive": True}
],
write=[
{"source": "/mnt/output", "destination": "s3://my-bucket/output", "recursive": True}
],
),
vcpus=2,
memory=4096,
platform="FARGATE",
)
Creating a Workflow#
A workflow is a collection of jobs with defined dependencies. You can create a workflow by specifying the jobs and their dependencies.
Example Python code to define a workflow:
from ezbatch.workflow import EZBatchWorkflow
workflow = EZBatchWorkflow(
name="example-workflow",
jobs={
"job1": EZBatchJob(
image="public.ecr.aws/ubuntu/ubuntu:22.04",
command="echo Job 1",
vcpus=1,
memory=2048,
),
"job2": EZBatchJob(
image="public.ecr.aws/ubuntu/ubuntu:22.04",
command="echo Job 2",
vcpus=1,
memory=2048,
),
},
dependencies={
"job2": ["job1"], # job2 depends on job1
},
)
Running a Workflow#
To run a workflow, use the submit method of the EZBatchWorkflow class. You can optionally specify a job queue.
Example Python code to run a workflow:
workflow.submit(queue="my-job-queue")
You can also save the workflow to a file for reuse:
workflow.save("workflow.json")
To load and run a saved workflow:
from ezbatch.workflow import EZBatchWorkflow
loaded_workflow = EZBatchWorkflow.load("workflow.json")
loaded_workflow.submit(queue="my-job-queue")