/
Synthetic Configuration

Synthetic Configuration

Explanation of Individual Pipeline Tasks

Create Recipes

Purpose:
Generates synthetic image recipes by selecting plant cutouts based on filtering criteria from MongoDB and associating them with background images. These recipes specify how synthetic images will be created.

  • Key Features:

    • MongoDB Integration: Retrieves cutouts using specific filters from the MongoDB collection.

    • Randomized Synthetic Images: Associates cutouts with randomly selected background images for diverse datasets.

    • Cutout Usage Control: Configurable to reuse cutouts across multiple synthetic images or ensure unique usage.

    • JSON Output: Saves generated recipes in JSON format for further processing.

  • Output:

    • JSON files containing the list of synthetic image recipes, saved in the <repo_root>/projects/<project_name>/<sub_project_name>/recipes directory.

Move Cutouts

Purpose:
This task downloads plant cutout images from S3 bucket to a local directory, enabling the use of these images for generating synthetic data.

  • Output:

    • Cutout images in .png format are stored locally for further use. Stored in <repo_root>/data/cutouts

Synthesize

Purpose:
Generates synthetic images by placing plant cutouts onto background images with to simulate real-world variability.

  • Key Features:

    • Parallelism: Uses concurrent.futures.ProcessPoolExecutor to generate images in parallel.

    • Transformations: Applies rotations, flips, and other transformations via Albumentations to enhance diversity.

    • YOLO Annotations: Generates bounding box or contour annotations in YOLO format.

  • Configuration Options:

    • Resize Factor: Control the scaling of cutouts (resize_factor: 0.35).

    • Parallel Processing: Enable parallel synthesis (parallel: true or false).

    • Mask Generation: Control the output of instance masks and YOLO labels.

  • Output:

    • Images: Synthetic images saved as .jpg.

    • YOLO Labels: Bounding box or contour labels for object detection.

Main Configuration Explanation Table

Parameter

Description

Possible Values / Example

Parameter

Description

Possible Values / Example

general.project_name

The name of the main project.

String (e.g., my_test)

general.sub_project_name

The name of the sub-project or task.

String (e.g., my_sub_test)

tasks

List of individual tasks to execute as part of the pipeline.

Example: create_recipes, move_cutouts, synthesize

synthesize

Configuration settings specific to the synthesize task.

  •  

synthesize.resize_factor

Resize factor for cutouts; controls cutout scaling during synthesis.

Example: 0.35 (Values < 0.15 may cause issues with RandomScale transformation)

synthesize.parallel

Enables or disables parallel processing for synthesis.

true or false

synthesize.parallel_workers

Number of workers to use for parallel processing.

Integer (e.g., 4)

yolo_bbox_labels

Whether to generate YOLO-format bounding box labels.

true or false

aws.s3_bucket

The name of the AWS S3 bucket where data is stored.

Example: psi-hackathon

mongodb.host

Host address of the MongoDB database.

Example: 40.76.253.124

mongodb.port

Port number for MongoDB.

Example: 27017

mongodb.db

MongoDB database name.

Example: hackathon_db

mongodb.collection

MongoDB collection name for cutouts.

Example: cutouts

mongodb.auth_mechanism

MongoDB authentication mechanism and encryption method

Example: SCRAM-SHA-1

mongodb.auth_source

MongoDB database that the collection with user credentials

Example: hackathon_db

mongodb.username

MongoDB username

Example: test_user

mongodb.password

MongoDB password

Example: pass123


Cutout Filters Configuration Table

Parameter

Description

Possible Values / Example

Parameter

Description

Possible Values / Example

total_images

Total number of synthetic images to generate.

Integer (e.g., 1000)

cuts_n_image

Number of plant cutouts to include in each image.

min: Minimum cutouts (e.g., 50)

max: Maximum cutouts (e.g., 200)

reuse_cutouts

Whether the same cutouts can be reused in multiple images.

true or false

morphological.area

Range of the cutout area in pixels.

min: 10, max: 10000000000

morphological.blur_effect

Threshold range for cutout blur effect.

min: optional, max: optional

morphological.eccentricity

Shape eccentricity range.

min: optional, max: optional

morphological.extends_border

Whether to include only cutouts that extend the image border.

true or false

morphological.num_components

Range for the number of components in a cutout.

min: optional, max: optional

morphological.solidity

Threshold for solidity, indicating shape compactness.

min: 0.7 (example), max: optional

morphological.green_sum

Range for the sum of green pixel values in the cutout.

min: optional, max: optional

morphological.is_primary

Whether to include only primary cutouts.

true or false (if used)

morphological.perimeter

Range for cutout perimeter length.

min: optional, max: optional

category.family

Exact family name to filter by.

String (e.g., Poaceae)

category.genus

Exact genus name to filter by.

String (e.g., Setaria)

category.group

Plant group to filter by.

String (e.g., Cover crop)

category.duration

Plant lifecycle duration to include.

List (e.g., ["annual", "biennial"])

category.growth_habit

Filter based on growth habit of plants.

List (e.g., ["graminoid", "forb/herb"])

category.species

Exact species name to filter by.

String (e.g., Setaria faberi)

category.subclass

Exact subclass to filter by.

String (e.g., Magnoliidae)

category.common_name

List of plant common names to include.

Example: ["Black oats", "Goosegrass", "Maize"]

common_name_weights

Assign weights to plant common names for frequency control.

Example: Maize: 0.1, Goosegrass: 0.1

 

Scripts/Tasks and How to Run Them in the SemiF-SyntheticPipeline

The SemiF-SyntheticPipeline offers flexible ways to run tasks by either adjusting the configuration files or overriding specific parameters from the command line. Below is an explanation of the tasks, how they work, and examples of how to run them.


How to Run the Pipeline

  1. Using Configuration Files:
    Adjust the configuration settings in the provided YAML files, then run the entire pipeline using the following command:

    python main.py
  2. Directly from the Command Line:
    Override individual parameters while running specific tasks. This allows quick modifications without editing the configuration files.
    Example:

    python main.py synthesize.resize_factor=0.5 synthesize.parallel=true

Example Workflows

  1. Full Pipeline Execution:
    Adjust settings in the YAML configuration files and run the entire pipeline:

    python main.py
  2. Run Only the Synthesize Task with Custom Parameters:
    If you need to modify specific settings (e.g., resize factor) without editing the config file:

  3. Generate Recipes Only:
    If you only need to create recipes:

  4. Move Cutouts and Synthesize Images Sequentially:
    Run move_cutouts and synthesize tasks in sequence with specific parameters:


Conclusion

The SemiF-SyntheticPipeline offers participants flexibility to adjust configurations through YAML files or directly from the command line, making it easy to experiment with different setups during the hackathon. This modular design ensures you can focus on specific tasks like generating recipes, moving cutouts, or synthesizing images, while optimizing performance using parallel processing and seamless AWS and MongoDB integrations.

1d01d37f-cea7-4867-aa77-0361e93048c6.jpg
Synthetic image

 

Related content

Overview
More like this
Overview
Read with this
Cutout Data
More like this
Semi-Field Image Processing Pipeline
Semi-Field Image Processing Pipeline
Read with this
Full Sized Data
Full Sized Data
More like this
Getting started with Roboflow
Getting started with Roboflow
Read with this