/
Pipeline Execution

Pipeline Execution

Config File controled execution

SemiF-Annotation pipeline uses hydra to handle the configurations used during execution. Each task is structured as a Python mudule containing a main function. Following fields in the conf/config.yaml file handle the execution of the tasks:

general: batch_id: ??? # Control the batch_id through the command line multitask: False multitasks: [develop_images, copy_from_blob, auto_sfm, localize_plants, remap_labels, assign_species, segment_vegetation, copy_products_to_blobs] task: develop_images # auto_sfm, localize_plants, remap_labels, segment_vegetation
  • general.batch_id: The batch ID to be processes

  • general.multitask: Set this to True to execute multiple tasks in sequence

  • general.multitasks: Lists all the tasks that are to be executed. Note that the order of the tasks matters and the default config contains the order in which the tasks need to be executed

  • general.task: This field contains which single task is to be executed in case where multitask is False

Note that the task name is the same as the corresponding Python file which contains the execution code for the task. For simplicity, the bash script scripts/execute.sh contains a wrapper and needs only the general.batch_id and autosfm.metashape_key and handles the end to end execution.

 

Description of tasks

The pipeline is divided into separate tasks for modular processing. Following as the tasks:

develop_images

Pre-process images usng RawTherapee

copy_from_blob

Copy the developed images from the blobs to local storage for processing. The following command invokes this task:

python SEMIF.py general.batch_id=<batch_id> \ general.task=copy_from_blob \ autosfm.autosfm_config.use_masking=<True/False>

Note that the use_masking flag is used if the masks for all the images have been generated for the autoSfM to use. This flag needs to be passed to the copy_from_blob task to ensure that the masks are also copied to the local storage.

auto_sfm

This task runs autoSfM on a given batch_id. The following command invokes this task:

python SEMIF.py general.batch_id=<batch_id> \ general.task=auto_sfm \ autosfm.autosfm_config.use_masking=<True/False> \ autosfm.metashape_key=<METASHAPE_KEY>

The use_masking flag controls whether to use masks for autoSfM. Note that when this flag is set to True, the masks must be present in data/semifield-developed-images/<batch_id>/masks.

localize_plants

This task runs the detection model on all the images to generate bounding boxes over the plants.

remap_labels

This task maps the bounding boxes from images coordinates to the bench (orthomosaic) coordinate system

Note that both, auto_sfm and localize_plants have to be run before running remap_labels

assign_species

This task assigns species to the bounding boxes based on the shapefiles.

segment_vegetation

Segments and generates cutouts for individual plants generated from the bounding boxes.

copy_products_to_blobs

Copies all the products of the processing from the local storage to the blobs

Maintenance

The maintenance script scripts/maintenance.sh handles the automated updates to the codebase by pulling the latest code and building the autoSfM Docker container. This is meant to be run as a periodic cronjob. Note that the pipeline cannot be executed when the maintenance is in progress.

During execution, the pipeline copies data from the blob storage onto the VM to make the execution faster and avoid data corruption. The products of the execution (autoSfM files, plant detection metadata, cutouts, etc.) are stored in temporary directories on the VM, and the products are copied to the blob storage. However, the suplicate files on the VM (developed images and the products) are not removed right after execution to avoid accidental removal. Another task (maintenance) in the config handles this by removing the data from multiple batch IDs in bulk, once the user makes sure that the data is securly copied onto the blob storage. For now, this is a manual task and can be executed by putting the batch IDs to be removed in the field batch_ids in conf/maintenance/maintenance.yaml file, or by passing them through the command line as:

 

Related pages