Config File controled execution
SemiF-Annotation pipeline uses hydra to handle the configurations used during execution. Each task is structured as a Python mudule containing a main function. Following fields in the conf/config.yaml
file handle the execution of the tasks:
general: batch_id: ??? # Control the batch_id through the command line multitask: False multitasks: [develop_images, copy_from_blob, auto_sfm, localize_plants, remap_labels, assign_species, segment_vegetation, copy_products_to_blobs] task: develop_images # auto_sfm, localize_plants, remap_labels, segment_vegetation
general.batch_id
: The batch ID to be processesgeneral.multitask
: Set this to True to execute multiple tasks in sequencegeneral.multitasks
: Lists all the tasks that are to be executed. Note that the order of the tasks matters and the default config contains the order in which the tasks need to be executedgeneral.task
: This field contains which single task is to be executed in case where multitask is False
Note that the task name is the same as the corresponding Python file which contains the execution code for the task. For simplicity, the bash script scripts/execute.sh
contains a wrapper and needs only the general.batch_id
and autosfm.metashape_key
and handles the end to end execution.
Maintenance
The maintenance script scripts/maintenance.sh
handles the automated updates to the codebase by pulling the latest code and building the autoSfM Docker container. This is meant to be run as a periodic cronjob. Note that the pipeline cannot be executed when the maintenance is in progress.
During execution, the pipeline copies data from the blob storage onto the VM to make the execution faster and avoid data corruption. The products of the execution (autoSfM files, plant detection metadata, cutouts, etc.) are stored in temporary directories on the VM, and the products are copied to the blob storage. However, the suplicate files on the VM (developed images and the products) are not removed right after execution to avoid accidental removal. Another task (maintenance
) in the config handles this by removing the data from multiple batch IDs in bulk, once the user makes sure that the data is securly copied onto the blob storage. For now, this is a manual task and can be executed by putting the batch IDs to be removed in the field batch_ids
in conf/maintenance/maintenance.yaml
file, or by passing them through the command line as:
python SEMIF.py general.multitask=False general.task=maintenance maintenance.batch_ids=[batch_id1, batch_id2]
0 Comments