Synthetic Configuration
Explanation of Individual Pipeline Tasks
Create Recipes
Purpose:
Generates synthetic image recipes by selecting plant cutouts based on filtering criteria from MongoDB and associating them with background images. These recipes specify how synthetic images will be created.
Key Features:
MongoDB Integration: Retrieves cutouts using specific filters from the MongoDB collection.
Randomized Synthetic Images: Associates cutouts with randomly selected background images for diverse datasets.
Cutout Usage Control: Configurable to reuse cutouts across multiple synthetic images or ensure unique usage.
JSON Output: Saves generated recipes in JSON format for further processing.
Output:
JSON files containing the list of synthetic image recipes, saved in the
<repo_root>/projects/<project_name>/<sub_project_name>/recipes
directory.
Move Cutouts
Purpose:
This task downloads plant cutout images from S3 bucket to a local directory, enabling the use of these images for generating synthetic data.
Output:
Cutout images in
.png
format are stored locally for further use. Stored in<repo_root>/data/cutouts
Synthesize
Purpose:
Generates synthetic images by placing plant cutouts onto background images with to simulate real-world variability.
Key Features:
Parallelism: Uses
concurrent.futures.ProcessPoolExecutor
to generate images in parallel.Transformations: Applies rotations, flips, and other transformations via Albumentations to enhance diversity.
YOLO Annotations: Generates bounding box or contour annotations in YOLO format.
Configuration Options:
Resize Factor: Control the scaling of cutouts (
resize_factor: 0.35
).Parallel Processing: Enable parallel synthesis (
parallel: true
orfalse
).Mask Generation: Control the output of instance masks and YOLO labels.
Output:
Images: Synthetic images saved as
.jpg
.YOLO Labels: Bounding box or contour labels for object detection.
Main Configuration Explanation Table
Parameter | Description | Possible Values / Example |
---|---|---|
general.project_name | The name of the main project. | String (e.g., |
general.sub_project_name | The name of the sub-project or task. | String (e.g., |
tasks | List of individual tasks to execute as part of the pipeline. | Example: |
synthesize | Configuration settings specific to the synthesize task. | |
synthesize.resize_factor | Resize factor for cutouts; controls cutout scaling during synthesis. | Example: |
synthesize.parallel | Enables or disables parallel processing for synthesis. |
|
synthesize.parallel_workers | Number of workers to use for parallel processing. | Integer (e.g., |
yolo_bbox_labels | Whether to generate YOLO-format bounding box labels. |
|
aws.s3_bucket | The name of the AWS S3 bucket where data is stored. | Example: |
mongodb.host | Host address of the MongoDB database. | Example: |
mongodb.port | Port number for MongoDB. | Example: |
mongodb.db | MongoDB database name. | Example: |
mongodb.collection | MongoDB collection name for cutouts. | Example: |
mongodb.auth_mechanism | MongoDB authentication mechanism and encryption method | Example: |
mongodb.auth_source | MongoDB database that the collection with user credentials | Example: |
mongodb.username | MongoDB username | Example: |
mongodb.password | MongoDB password | Example: |
Cutout Filters Configuration Table
Parameter | Description | Possible Values / Example |
---|---|---|
total_images | Total number of synthetic images to generate. | Integer (e.g., |
cuts_n_image | Number of plant cutouts to include in each image. | min: Minimum cutouts (e.g., max: Maximum cutouts (e.g., |
reuse_cutouts | Whether the same cutouts can be reused in multiple images. |
|
morphological.area | Range of the cutout area in pixels. | min: |
morphological.blur_effect | Threshold range for cutout blur effect. | min: optional, max: optional |
morphological.eccentricity | Shape eccentricity range. | min: optional, max: optional |
morphological.extends_border | Whether to include only cutouts that extend the image border. |
|
morphological.num_components | Range for the number of components in a cutout. | min: optional, max: optional |
morphological.solidity | Threshold for solidity, indicating shape compactness. | min: |
morphological.green_sum | Range for the sum of green pixel values in the cutout. | min: optional, max: optional |
morphological.is_primary | Whether to include only primary cutouts. |
|
morphological.perimeter | Range for cutout perimeter length. | min: optional, max: optional |
category.family | Exact family name to filter by. | String (e.g., |
category.genus | Exact genus name to filter by. | String (e.g., |
category.group | Plant group to filter by. | String (e.g., |
category.duration | Plant lifecycle duration to include. | List (e.g., |
category.growth_habit | Filter based on growth habit of plants. | List (e.g., |
category.species | Exact species name to filter by. | String (e.g., |
category.subclass | Exact subclass to filter by. | String (e.g., |
category.common_name | List of plant common names to include. | Example: |
common_name_weights | Assign weights to plant common names for frequency control. | Example: |
Scripts/Tasks and How to Run Them in the SemiF-SyntheticPipeline
The SemiF-SyntheticPipeline offers flexible ways to run tasks by either adjusting the configuration files or overriding specific parameters from the command line. Below is an explanation of the tasks, how they work, and examples of how to run them.
How to Run the Pipeline
Using Configuration Files:
Adjust the configuration settings in the provided YAML files, then run the entire pipeline using the following command:python main.py
Directly from the Command Line:
Override individual parameters while running specific tasks. This allows quick modifications without editing the configuration files.
Example:python main.py synthesize.resize_factor=0.5 synthesize.parallel=true
Example Workflows
Full Pipeline Execution:
Adjust settings in the YAML configuration files and run the entire pipeline:python main.py
Run Only the Synthesize Task with Custom Parameters:
If you need to modify specific settings (e.g., resize factor) without editing the config file:Generate Recipes Only:
If you only need to create recipes:Move Cutouts and Synthesize Images Sequentially:
Runmove_cutouts
andsynthesize
tasks in sequence with specific parameters:
Conclusion
The SemiF-SyntheticPipeline offers participants flexibility to adjust configurations through YAML files or directly from the command line, making it easy to experiment with different setups during the hackathon. This modular design ensures you can focus on specific tasks like generating recipes, moving cutouts, or synthesizing images, while optimizing performance using parallel processing and seamless AWS and MongoDB integrations.