Pipeline Execution
Config File controled execution
SemiF-Annotation pipeline uses hydra to handle the configurations used during execution. Each task is structured as a Python mudule containing a main function. Following fields in the conf/config.yaml
file handle the execution of the tasks:
general:
batch_id: ??? # Control the batch_id through the command line
multitask: False
multitasks: [develop_images, copy_from_blob, auto_sfm, localize_plants, remap_labels, assign_species, segment_vegetation, copy_products_to_blobs]
task: develop_images # auto_sfm, localize_plants, remap_labels, segment_vegetation
general.batch_id
: The batch ID to be processesgeneral.multitask
: Set this to True to execute multiple tasks in sequencegeneral.multitasks
: Lists all the tasks that are to be executed. Note that the order of the tasks matters and the default config contains the order in which the tasks need to be executedgeneral.task
: This field contains which single task is to be executed in case where multitask is False
Note that the task name is the same as the corresponding Python file which contains the execution code for the task. For simplicity, the bash script scripts/execute.sh
contains a wrapper and needs only the general.batch_id
and autosfm.metashape_key
and handles the end to end execution.
Description of tasks
The pipeline is divided into separate tasks for modular processing. Following as the tasks:
develop_images
Pre-process images usng RawTherapee
copy_from_blob
Copy the developed images from the blobs to local storage for processing. The following command invokes this task:
python SEMIF.py general.batch_id=<batch_id> \
general.task=copy_from_blob \
autosfm.autosfm_config.use_masking=<True/False>
Note that the use_masking
flag is used if the masks for all the images have been generated for the autoSfM to use. This flag needs to be passed to the copy_from_blob
task to ensure that the masks are also copied to the local storage.
auto_sfm
This task runs autoSfM on a given batch_id
. The following command invokes this task:
python SEMIF.py general.batch_id=<batch_id> \
general.task=auto_sfm \
autosfm.autosfm_config.use_masking=<True/False> \
autosfm.metashape_key=<METASHAPE_KEY>
The use_masking
flag controls whether to use masks for autoSfM. Note that when this flag is set to True
, the masks must be present in data/semifield-developed-images/<batch_id>/masks
.
localize_plants
This task runs the detection model on all the images to generate bounding boxes over the plants.
remap_labels
This task maps the bounding boxes from images coordinates to the bench (orthomosaic) coordinate system
Note that both, auto_sfm
and localize_plants
have to be run before running remap_labels
assign_species
This task assigns species to the bounding boxes based on the shapefiles.
segment_vegetation
Segments and generates cutouts for individual plants generated from the bounding boxes.
copy_products_to_blobs
Copies all the products of the processing from the local storage to the blobs
Maintenance
The maintenance script scripts/maintenance.sh
handles the automated updates to the codebase by pulling the latest code and building the autoSfM Docker container. This is meant to be run as a periodic cronjob. Note that the pipeline cannot be executed when the maintenance is in progress.
During execution, the pipeline copies data from the blob storage onto the VM to make the execution faster and avoid data corruption. The products of the execution (autoSfM files, plant detection metadata, cutouts, etc.) are stored in temporary directories on the VM, and the products are copied to the blob storage. However, the suplicate files on the VM (developed images and the products) are not removed right after execution to avoid accidental removal. Another task (maintenance
) in the config handles this by removing the data from multiple batch IDs in bulk, once the user makes sure that the data is securly copied onto the blob storage. For now, this is a manual task and can be executed by putting the batch IDs to be removed in the field batch_ids
in conf/maintenance/maintenance.yaml
file, or by passing them through the command line as: