Table of Contents

maxLevel
minLevel	1

2

Pipeline

The pipeline includes seven main processes and places data in five blob containers.

...

Upload and Preprocessing

Benchbot operators manually upload batches to “upload” blob container. Images are then processed using color calibration card.

Upload batches include images and metadata from a single location and capture period. Batches are named after location ID and date, for example, NC_2022-03-22.

Metadata is made up of:

Ground control point locations csv measured b
Specie map (csv)

...

Mapping and Detection

AutoSfM

...

Data Flow

Image Repository.jpg Image Added

Preprocessing

Involves moving images from Azure to SUNNY, using RawTherapee to perform color calibration, running detection model for general “plant” localization, uploading images back to Azure but to the semif-developed blob container. Object detection is performed to identify plant locations and create local bounding box coordinates. More detailed pixel wise segmentation is performed within the bounding box areas later on in the pipeline.

Outputs:

calibrated JPG
Detection results for each image. Detection results are in normalized xyxy format (.csv).

AutoSfM

Image Added

The AutoSfM process takes in developed images and ground control point metadata to create a global coordinate reference system (CRS). An orthomosaic, or collage of stitched images, and detailed camera reference information is generated, the latter being used to convert local image coordinates into global potting area locations.

For example, an image 2000 pixels high and 4000 pixels wide has a local center point at (1000, 2000), half its height and half its width, measured in pixels. Camera reference information allows us to

...

project this local center point to

...

a geographical potting area location in meters, (1.23m, 4.56m) for example.

...

Detection

Object detection is performed to identify plant locations and create local bounding box coordinates.

Gallery (Deprecated)

include	NC_2_2_1647016892000.0_DETECTION.png, NC_3_3_1647017251000.0_DETECTION.png, NC_4_1_1647017625000.0_DETECTION.png, NC_4_6_1647017719000.0_DETECTION.png, NC_5_1_1647017809000.0_DETECTION.png, NC_6_2_1647018169000.0_DETECTION.png, NC_7_1_1647018293000.0_DETECTION.png, NC_8_4_1647018523000.0_DETECTION.png
columns	4
sort	name

Detection results from 2022-03-11

Remap

Infers global bounding box positions using autoSfM camera reference information.

...

WHY?

Species mapping: Our object detection model only detect plants, not species. Species level detection for this project (24 species) is unrealistic at this early stage. When a user-defined species map and geospatial data are applied, AutoSfM results can provide specie level information. If we know what row or general geographic area these species are located, then we can label each bounding box appropriately.

...

Unique detection result: Provides unique (primary) bounding box information. The benchbot is taking 6 images along a single row of 4 pots. These images overlap considerably and the same plant is often detected, and thus segmented, multiple times at different angles. While multiple angles are good, its important to identify the unique, or primary detection result (when the camera is directly over the plants). Doing so allows us to:

maximize synthetic image diversity and avoid using the same plant segment (albeit at slightly different angles) multiple times.
us monitor and understand the distribution of primary vs non-primary data for training models. A dataset with many non-unique duplicates, while large, will not be diverse and will lead to poor model performance.
Lastly, being able to identify unique plant/pot position allows us to monitor individual plants throughout their growth

Monitoring: Monitor for inconsistencies and error in image capture across sites using detailed reporting of camera reference information

Segment Vegetation and Cutout data

...

Semi-Field Trial

Trial data was collected and used to develop the annotation pipeline and test camera settings. The trial was performed between early-March and late-April (2022) in NC under indoor conditions. Details about the data and results follow.

Setup

7 batches of images
1 batch = 18-48 images
sunflower and cereal rye(?) at early growth stages

Image Removed

...

Date

...

Images

...

03/04

...

18

...

03/11

...

48

...

03/22

...

30

...

03/29

...

36

...

04/05

...

36

...

04/12

...

36

...

04/26

...

36

...

TOTAL

...

240

Results

...

Date

...

Images

...

Plants

...

Unique Cutouts

...

Total Cutouts

...

03/04

...

18

...

03/11

...

48

...

03/22

...

30

...

03/29

...

36

...

04/05

...

36

...

04/12

...

36

...

04/26

...

Knowing the general location of species pot groups, we assign species labels to each general “vegetation” detection results. We relate the global potting area locations to local image bounding boxes coordinates allowing to us to fill in the missing species label.

Inputs:

Developed images
Masks
Ground control point information (.csv)

Outputs:

Metashape (psx) project for projecting image coordinates to real-world 3D coordinates

Image Added

Remap

Local plant detection results are remapped to global potting area coordinates.

Inputs:

Images
camera reference information (.csv)

Outputs:

detailed metadata with camera information and detection results (.json)

Assign Species

At the start of each "season," shapefiles are generated to delineate the boundaries for different species potting groups. These shapefiles are then used to assign specific species labels to individual bounding boxes, based on their intersection with the shapefile. If there is an overlap between a bounding box global coordinates and the designated shapefile features—illustrated as rectangular features below—the label of the overlapping shapefile feature is then attributed to the respective bounding box. This assignment process follows the "Remapping" phase, where bounding box coordinates were transformed from pixel representations to real-world global coordinates.

Vegetation Segmentation

Plant Cutout Generation

Digital image processing techniques like index thresholding, unsupervised classification, and morphological operations are used to separate vegetation from the background within pre-identified bounding boxes. The specific segmentation approach might vary depending on the plant species and the size of the bounding box. These extracted plant regions, or "cutouts," serve as building blocks for creating synthetic data.

Sub-Image Data (Cutouts): Details and Metadata

Cutouts are cropped sections from full-sized images, each containing a single plant instance. Each cutout's metadata includes:

Parent image ID
Unique cutout ID and number
Species classification
Primary status
Exceeding cropped image border status
Camera position
Bounding box from which the cutout originated within the full-sized image
Specific cutout properties like area, perimeter, and color statistics, which are valuable for in-depth analysis.

Furthermore, the metadata inherits EXIF data from the parent image and incorporates additional details:

Review

For each batch, a weed scientist or agronomic expert selects a random set of 50 images for visual examination. During this process, they review the species detection outcomes, along with the semantic and instance masks. Should any discrepancies arise, the batch's log files are scrutinized for any errors or anomalies. A batch that successfully clears the inspection stage, free of mislabeling or significant errors, proceeds to the next step. Upon approval, the full-sized images and their corresponding sub-images are transferred to the semif-developed and semif-cutouts blob containers in Azure. Additionally, all data products are securely backed up to the NCSU storage facilities, ensuring data integrity and availability.

Version	Old Version 20	New Version Current
Changes made by	Matthew Kutugata	Matthew Kutugata
Saved on	Jun 04, 2022	Apr 02, 2024

Versions Compared

Key

Pipeline

Upload and Preprocessing