SemiF-Annotation pipeline uses hydra to handle the configurations used during execution. Each task is structured as a Python mudule containing a main function. Following fields in the conf/config.yaml file handle the execution of the tasks:

general:
  batch_id: ??? # Control the batch_id through the command line
  multitask: False
  multitasks: [develop_images, copy_from_blob, auto_sfm, localize_plants, remap_labels, assign_species, segment_vegetation, copy_products_to_blobs]
  task: develop_images  # auto_sfm, localize_plants, remap_labels, segment_vegetation

general.batch_id: The batch ID to be processes
general.multitask: Set this to True to execute multiple tasks in sequence
general.multitasks: Lists all the tasks that are to be executed. Note that the order of the tasks matters and the default config contains the order in which the tasks need to be executed
general.task: This field contains which single task is to be executed in case where multitask is False

Note that the task name is the same as the corresponding Python file which contains the execution code for the task. For simplicity, the bash script scripts/execute.sh contains a wrapper and needs only the general.batch_id and autosfm.metashape_key and handles the end to end execution.

Maintenance

The maintenance script scripts/maintenance.sh handles the automated updates to the codebase by pulling the latest code and building the autoSfM Docker container. This is meant to be run as a periodic cronjob. Note that the pipeline cannot be executed when the maintenance is in progress.

During execution, the pipeline copies data from the blob storage onto the VM to make the execution faster and avoid data corruption. The products of the execution (autoSfM files, plant detection metadata, cutouts, etc.) are stored in temporary directories on the VM, and the products are copied to the blob storage. However, the suplicate files on the VM (developed images and the products) are not removed right after execution to avoid accidental removal. Another task (maintenance) in the config handles this by removing the data from multiple batch IDs in bulk, once the user makes sure that the data is securly copied onto the blob storage. For now, this is a manual task and can be executed by putting the batch IDs to be removed in the field batch_ids in conf/maintenance/maintenance.yaml file, or by passing them through the command line as:

python SEMIF.py general.multitask=False general.task=maintenance maintenance.batch_ids=[batch_id1, batch_id2]

Pipeline Execution

Maintenance

0 Comments