/
Overview

Overview

 

cutout-overlay.gif
Synthetic image generation

 

final_bounding_boxes.png
Bounding box labels

 

 

Hackathon Objective

Participants will use the Hackathon2024-SyntheticPipeline to generate synthetic agricultural images, train a detection model, and test the model on real-world data. The challenge emphasizes creativity and technical skills in leveraging synthetic data to solve real-world agricultural problems, such as plant detection and classification.


Why Use Synthetic Images?

1. Bridge the Data Gap

In agricultural machine learning, real-world annotated data is often scarce or expensive to collect. Creating large, labeled datasets manually is time-consuming and requires expert knowledge.

  • Synthetic images provide a cost-effective and scalable solution by generating datasets with realistic variability, saving time and resources.

2. Data Augmentation for Robust Models

Synthetic data allows for controlled variability by adjusting the placement, orientation, and appearance of cutouts. This helps:

  • Improve model generalization by exposing it to diverse conditions (e.g., lighting changes, rotations).

  • Compensate for imbalanced datasets by generating more images of underrepresented classes.

3. Prepare for Edge Cases

With synthetic images, participants can create complex scenarios (e.g., occlusions or overlapping plants) that may rarely appear in real-world datasets but are crucial for robust detection models. This improves the model’s ability to handle unexpected situations during real-world testing.

4. Rapid Prototyping

Synthetic pipelines enable quick iterations during model development. Participants can generate new datasets on-demand to explore what-if scenarios, refine models, and experiment with hyperparameters before testing on real-world data.


Hackathon Workflow

Step 1: Set Up Environment on Amazon SageMaker Notebooks

  • Access SageMaker Notebooks:

    • For this Hackathon you will be using AWS Workshop Studio to access an AWS account. You can find instructions on accessing Workshop Studio here: AWS Workshop Studio Lab Guide

    • Once you are logged into the AWS account, access the Amazon Sagemaker Console here: Amazon Sagemaker Console

    • On the left hand menu click on Notebooks:

    • There is a Jupyter Notebook that has already been created for you, click on Open JupyterLab

    • From the Jupyter Lab environment, click on Terminal

  • From within the terminal, navigate to the SageMaker folder:

cd SageMaker
  • Clone the Hackathon2024-SyntheticPipeline repository into your notebook instance:

    git clone https://github.com/precision-sustainable-ag/Hackathon2024-SyntheticPipeline.git

Step 2: Download Background Images

  • Use the following AWS S3 command to manually download background images to the correct folder:

    aws s3 cp s3://psi-hackathon/advanced_track/backgrounds/ /path/to/repo_root/data/backgrounds/ --recursive --no-sign-request

    Note: Replace /path/to/repo_root with the actual path to the cloned repository.

 

Step 3: Configure MongoDB

In the main configuration file (conf/config.yaml), set the host, username and password which will be provided to you via discord.

Step 4: Configure and Generate Synthetic Images

  • Set the configuration files as needed to control:

    • Cutout selection criteria

    • Number of cutouts per image

    (A detailed table of configuration settings is be provided here.)

  • Run the pipeline to generate your synthetic images:


Training and Model Development

Step 4: Train Your Model

  • You are free to use any model framework for training. However, we suggest YOLOv8 by Ultralytics for ease of use.

  • Pre-provided training scripts are available in a separate repository:
    Training Example Repository.


Validation Process

Step 5: Download Validation Images

  • Use this command to download the 10 validation images from S3:

  • Note: These 10 images are for validation only—do not use them for training.

  • Validate your model’s performance on this mini test set to fine-tune your approach.


Final Inference and Evaluation

Step 6: Download Final Test Images

  • In the last 30-60 minutes of the competition, we’ll provide the following command to download the final 100 test images:

Step 7: Perform Inference and Submit Results

  • Use your trained model to run inference on the final test images.

  • Format the results following the instructions in the Training Example Repository.

  • Submit your formatted results to us for evaluation.

  • All compute will end on Sunday at 11:00 am sharp.

Step 8: Present Results

  • Teams have 5 minutes to present their findings

  • Presentations start at 1:00 PM on Sunday.


Key Resources Available to Teams

  • ~1 Million Plant Cutouts: Use these cutouts to generate diverse synthetic datasets.

  • 50 Background Images: Add variety to your training data by experimenting with backgrounds.

  • 10 real-world images and labels pulled from the final test set for validation

  • YOLOv11 and Other Models: Feel free to explore other model frameworks beyond YOLOv11.

  • Amazon SageMaker: Provides powerful computing resources to build and train your models.


Winning Strategy Tips

  • Create Diverse Synthetic Data: Use transformations and shadow effects creatively.

  • Train Effectively: Leverage pre-trained models for faster convergence if needed.

  • Validate Carefully: Use the 10 validation images wisely to fine-tune your models.

  • Inference Efficiency: Ensure your model performs well on the final test images under time constraints.


What does a valid submission look like?

1. Deadline and Timing:

  • Submission Deadline: The link to submit your predictions will close on Sunday at 11 am. This means that teams have until that time to upload their prediction files. Any submissions made after this time will not be considered for grading or ranking.

  • Each team will also make a small presentation.

2. File Naming Convention:

  • File Name Format: Each team needs to submit one file only. The file must follow a strict naming format: <team-name>_predictions.csv.

  • Example: If your team name is team1, the file should be named team1_predictions.csv. A failure to follow this format (e.g., team_1_predictions.csv or team1_preds.csv) may result in disqualification or penalties.

A valid submission file looks like :

3. Where to submit?

  • The team will post a google form link in discord on the last day.


Rules

  1. No outside data can be used

  2. No misuse of AWS resources (including no crypto mining)

  3. No cross collaboration across teams

Breaking rules will result in an automatic disqualification for that entire team (no questions asked).


Takeaways

This hackathon is designed to demonstrate the power of synthetic images in bridging the gap between limited real-world datasets and robust machine learning solutions. It challenges participants to think creatively, leverage synthetic data effectively, and build models that thrive in real-world agricultural scenarios.

By combining synthetic data generation, machine learning techniques, and AWS infrastructure, participants will tackle real-world agricultural problems in an innovative way.

Good luck, and may the best team win! 🚀


 

Related content

Track Description
Track Description
More like this
Overview
More like this
The Bottleneck in Precision Agriculture AI
The Bottleneck in Precision Agriculture AI
More like this
Synthetic Configuration
Synthetic Configuration
More like this
Semi-Field Image Processing Pipeline
Semi-Field Image Processing Pipeline
Read with this
Field Image Processing Requirements
Field Image Processing Requirements
More like this