Data products and structure
1. Overview
This document explains how the dataset output from the Field-Annotation Pipeline is structured, formatted, and organized. Each batch, all images taken on a single day from a single location, of processed data is grouped by a unique batch ID, which links a specific set of images with its corresponding metadata.
2. Directory Structure
[dataset_root]/
│
├── [batch_id]/
│ ├── cutouts/ # Cropped plant images and masks
│ │ ├── *.json # Metadata for each cutout
│ │ ├── *_mask.png # Segmentation mask
│ │ ├── *.jpg # Raw cropout image
│ │ └── *.png # Processed cutout with mask applied
│ │
│ └── developed-images/ # Full-size pre-processed images
│ ├── *.jpg # Pre-processed field image
│ └── *.jpg.pp3 # RawTherapee processing settings
│
3. Data Product Details
Cutouts
Contains cropped plant images, masks, and associated metadata.
File Type | Format | Description | Example Filename |
---|---|---|---|
Metadata |
| Plant-specific information, including bounding boxes, class, and environmental metadata. |
|
Cutout Mask |
| Binary mask generated by the segmentation model. The mask has value ‘0' as background and ‘class_id’ as the generated mask. See the |
|
Cropout Image |
| Cropped image of the targeted plant, based on bounding box coordinates. |
|
Cutout Image |
| Final processed image with transparent background. |
|
Full-sized Images
Contains full-size pre-processed images and associated processing profiles.
File Type | Format | Description | Example Filename |
---|---|---|---|
Full-size image |
| Full-size image after color correction, sharpening, and exposure adjustment. |
|
Processing Profile |
| RawTherapee sidecar file with applied image adjustments. |
|
4. Data Examples
Full-size image
Cutouts
Metadata
5. Metadata Details
The tables below define the structure for metadata related to image cutouts which are output of the Field Annotation Pipeline.
Properties Table
Schema
6. Data Usage
The dataset can be used for:
Training machine learning models for plant detection and classification.
Generating synthetic datasets using cutout images and masks.
Analyzing plant health, biomass estimation, and phenotyping.
7. Storage and Access
Data is not currently publicly accessible
The dataset is stored in the shared [Storage Location] and can be accessed via [Access Method]. Ensure you have the appropriate credentials before accessing the data.