/
Data products and structure

Data products and structure

1. Overview

This document explains how the dataset output from the Field-Annotation Pipeline is structured, formatted, and organized. Each batch, all images taken on a single day from a single location, of processed data is grouped by a unique batch ID, which links a specific set of images with its corresponding metadata.


2. Directory Structure

[dataset_root]/ ├── [batch_id]/ │ ├── cutouts/ # Cropped plant images and masks │ │ ├── *.json # Metadata for each cutout │ │ ├── *_mask.png # Segmentation mask │ │ ├── *.jpg # Raw cropout image │ │ └── *.png # Processed cutout with mask applied │ │ │ └── developed-images/ # Full-size pre-processed images │ ├── *.jpg # Pre-processed field image │ └── *.jpg.pp3 # RawTherapee processing settings

3. Data Product Details

Cutouts

Contains cropped plant images, masks, and associated metadata.

File Type

Format

Description

Example Filename

File Type

Format

Description

Example Filename

Metadata

.json

Plant-specific information, including bounding boxes, class, and environmental metadata.

NCA03585_0.json

Cutout Mask

.png

Binary mask generated by the segmentation model. The mask has value ‘0' as background and ‘class_id’ as the generated mask. See the class_id key in the “category” Property table

NCA03585_0_mask.png

Cropout Image

.jpg

Cropped image of the targeted plant, based on bounding box coordinates.

NCA03585_0.jpg

Cutout Image

.png

Final processed image with transparent background.

NCA03585_0.png


Full-sized Images

Contains full-size pre-processed images and associated processing profiles.

File Type

Format

Description

Example Filename

File Type

Format

Description

Example Filename

Full-size image

.jpg

Full-size image after color correction, sharpening, and exposure adjustment.

NCA03585.jpg

Processing Profile

.pp3

RawTherapee sidecar file with applied image adjustments.

NCA03585.jpg.pp3


4. Data Examples

Full-size image

MDA02196_full.jpg
Full-sized image

Cutouts

MDA02196.jpg
Cropout

Metadata

{ "image_info": { "Name": "MDA02196", "Extension": "jpg", "ImageURL": "https://weedsimagerepo.blob.core.windows.net/weedsimagerepo/MDA02196.JPG", "UploadDateTimeUTC": "2022-06-23 19:03:31+00:00", "CameraInfo_DateTime": "2022-06-22 08:37:07", "SizeMiB": 0.7987775802612305, "HasMatchingJpgAndRaw": true, "ImageIndex": 3.0, "UsState": "MD" }, "plant_field_info": { "PlantType": "WEEDS", "CloudCover": "Completely Obscured", "GroundResidue": "Corn", "GroundCover": "51 - 75", "CoverCropFamily": null, "GrowthStage": null, "CottonVariety": null, "CropOrFallow": "Fallow", "CropTypeSecondary": "Corn", "Species": "horseweed", "Height": "0.61 - 0.9m", "SizeClass": "MEDIUM", "FlowerFruitOrSeeds": false }, "annotation": { "bbox_xywh": [ 3813, 830, 2323, 4152 ] }, "category": { "class_id": 25, "USDA_symbol": "ERCA20", "EPPO": "ERICA", "group": "dicot", "class": "Magnoliopsida", "subclass": "Asteridae", "order": "Asterales", "family": "Asteraceae", "genus": "Erigeron", "species": "canadensis", "common_name": "Horseweed", "authority": "Linnaeus", "growth_habit": "forb/herb", "duration": "annual biennial", "category": "warm season weed", "multi_species_USDA_symbol": null, "link": null, "note": "Name change", "hex": "#5aedc3", "rgb": [ 90, 237, 195 ] }, "exif_meta": { "Make": "SONY", "Model": "ILCE-7RM4A", "Software": "RawTherapee 5.10", "ExposureTime": "1/200", "FNumber": "11", "ISOSpeedRatings": 100, "Flash": 15, "FocalLength": "55", "LensModel": "FE 55mm F1.8 ZA" }, "version": 1.0 }

5. Metadata Details

The tables below define the structure for metadata related to image cutouts which are output of the Field Annotation Pipeline.

Properties Table

Property

Type

Description

Options

Example Value

Property

Type

Description

Options

Example Value

Name

string

Name of image (without extension)

 

“NCA03587“

Extension

string

Extension of image

 

“jpg“

ImageURL

string

Link to the image

 

 

UploadDateTimeUTC

string (date-time)

Date and time that image was upload

 

"2023-05-10 22:49:00+00:00"

CameraInfo_DateTime

string (date-time)

Date and time that image was taken

 

"2023-05-09 13:41:08"

SizeMiB

number

Size of image in MiB

 

0.9428424835205078

HasMatchingJpgAndRaw

boolean

True or False to check if this image has matching raw image

True; False

true

ImageIndex

number

index within the group of images that make up a sample of a unique plant

 

0.0

UsState

string

State abbreviation in which the image was collected

 

“AL”

Property

Type

Description

Options

Example Value

Property

Type

Description

Options

Example Value

PlantType

string

Plant type

weeds, cash crop or cover crops

“COVERCROPS“

CloudCover

string

Cloudcover at the time of capturing image

Scattered, Clear, Completely Obscured, Few Clouds

“Clear“

GroundResidue

string

Describes layer of vegetation covering the soil

grass, broadleaf, cotton, corn, soybean, others (add text)

“Grass”

GroundCover

string

One of four ranges (0 - 25; 26-50; 51-75; 76-100) representing the percentage of the ground cover remaining on soil surface

0 – 25
nan
26 – 50
51 – 75
76 – 100

“51 - 75”

CoverCropFamily

string

Family of cover crop

nan
Brassicas
Grass
Legume

“Grass“

GrowthStage

string

Growth stage of plant (cash crop or cover crop); if weeds then null

nan
Vegetative
Flowering
Squaring
Open Boll
Post Defoliation
First Flower

“Vegetative”

CottonVariety

string

Variety of cotton; if not cotton then null

FM Hairy, DP 2038, UA 107 Okra, DG 3528 B3XF, PHY 415 W3FE, DP 2038 B3XF, ST 5091 B3XF, PHY 443 W3FE, PHY 411 W3FE, ST 5707 B2XF

“FM Hairy“

CropOrFallow

string

Only of weeds; if weed is in a crop or fallow field

Crop; Fallow

“Fallow”

CropTypeSecondary

string

Only for field; Type of crop in which the weed is present; CropOrFallow must be "Crop" or null.

Corn, cotton, soyben, nan

“Cotton”

Species

string

Common name of the plant

see species info list

“barley”

Height

string

Height of cover crops or cash crop; null for weeds

0.3 – 0.6m
nan
0.91 – 1.2m
1.21 – 1.5m
0.61 – 0.9m

“0.3 - 0.6m”

SizeClass

string

small, medium, large for the weeds; null if not weed

SMALL
nan
MEDIUM
LARGE

“SMALL“

FlowerFruitOrSeeds

boolean

Presence of flower, fruit or seeds

True; False

true

Property

Type

Description

Example Value

Property

Type

Description

Example Value

bbox_xywh

number

list of bounding box coordinates, top left corner of bbox (xmin, ymin) and width and height of bbox, all in pixel values

“[3238, 20, 6354, 6136]

Property

Type

Description

Example Value

Property

Type

Description

Example Value

class_id

number

class id of the plant

“39“

USDA_symbol

string

USDA symbol for the plant

“HORDE“

EPPO

string

European and Mediterranean Plant Protection Organization code for the plant

"1HORG"

group

string

Group of plant

"monocot"

class

string

Class of plant

"Liliopsida"

subclass

string

Subclass of plant

"Commelinidae"

order

string

Order of plant

"Cyperales"

family

string

Family of plant

"Poaceae"

genus

string

Genus of plant

"Hordeum"

species

string

Species of plant

“vulgare“

common_name

string

Common name of plant

“Barley“

authority

string

Person who first named the plant scientifically

“Linnaeus”

growth_habit

string

Growth habit of plant

"graminoid"

duration

string

Plant life: annual, biennial, or perennial

"annual perenial"

category

string

Category of weed like warm season or cold season

"cool season cover crop"

multi_species_USDA_symbol

string

If the plant has multiple USDA symbols

“null”

link

string

Link to the USDA database

"https://plants.usda.gov/home/plantProfile?symbol=HORDE "

note

string

Any note to add to the metadata

“null”

hex

string

hex color is the average color of the

"#e172d8"

rgb

number

color in rgb for the color of

[225, 114, 216]

alias

list of strings

common name aliases

[“cereal barley”]


Schema


6. Data Usage

The dataset can be used for:

  • Training machine learning models for plant detection and classification.

  • Generating synthetic datasets using cutout images and masks.

  • Analyzing plant health, biomass estimation, and phenotyping.


7. Storage and Access

Data is not currently publicly accessible

The dataset is stored in the shared [Storage Location] and can be accessed via [Access Method]. Ensure you have the appropriate credentials before accessing the data.

Related content

Cutout Data
More like this
Full Sized Data
Full Sized Data
More like this
Synthetic Configuration
Synthetic Configuration
More like this
Field Image Processing Requirements
Field Image Processing Requirements
More like this