Overview

Synthetic image generation	Bounding box labels

Synthetic image generation	Bounding box labels

Hackathon Objective

Participants will use the Hackathon2024-SyntheticPipeline to generate synthetic agricultural images, train a detection model, and test the model on real-world data. The challenge emphasizes creativity and technical skills in leveraging synthetic data to solve real-world agricultural problems, such as plant detection and classification.

Why Use Synthetic Images?

1. Bridge the Data Gap

In agricultural machine learning, real-world annotated data is often scarce or expensive to collect. Creating large, labeled datasets manually is time-consuming and requires expert knowledge.

Synthetic images provide a cost-effective and scalable solution by generating datasets with realistic variability, saving time and resources.

2. Data Augmentation for Robust Models

Synthetic data allows for controlled variability by adjusting the placement, orientation, and appearance of cutouts. This helps:

Improve model generalization by exposing it to diverse conditions (e.g., lighting changes, rotations).
Compensate for imbalanced datasets by generating more images of underrepresented classes.

3. Prepare for Edge Cases

With synthetic images, participants can create complex scenarios (e.g., occlusions or overlapping plants) that may rarely appear in real-world datasets but are crucial for robust detection models. This improves the model’s ability to handle unexpected situations during real-world testing.

4. Rapid Prototyping

Synthetic pipelines enable quick iterations during model development. Participants can generate new datasets on-demand to explore what-if scenarios, refine models, and experiment with hyperparameters before testing on real-world data.

Hackathon Workflow

Step 1: Set Up Environment on Amazon SageMaker Notebooks

Access SageMaker Notebooks:
- For this Hackathon you will be using AWS Workshop Studio to access an AWS account. You can find instructions on accessing Workshop Studio here: AWS Workshop Studio Lab Guide
- Once you are logged into the AWS account, access the Amazon Sagemaker Console here: Amazon Sagemaker Console
- On the left hand menu click on Notebooks:
- There is a Jupyter Notebook that has already been created for you, click on Open JupyterLab
- From the Jupyter Lab environment, click on Terminal
From within the terminal, navigate to the SageMaker folder:

cd SageMaker

Clone the Hackathon2024-SyntheticPipeline repository into your notebook instance:
git clone https://github.com/precision-sustainable-ag/Hackathon2024-SyntheticPipeline.git

Step 2: Download Background Images

Use the following AWS S3 command to manually download background images to the correct folder:
aws s3 cp s3://psi-hackathon/advanced_track/backgrounds/ /path/to/repo_root/data/backgrounds/ --recursive --no-sign-request
Note: Replace /path/to/repo_root with the actual path to the cloned repository.

Step 3: Configure MongoDB

In the main configuration file (conf/config.yaml), set the host, username and password which will be provided to you via discord.

mongodb:
  host: 
  port: 27017
  db: hackathon_db
  collection: cutouts
  auth_source: hackathon_db
  auth_mechanism: SCRAM-SHA-1
  username: 
  password:

Step 4: Configure and Generate Synthetic Images

Set the configuration files as needed to control:
- Cutout selection criteria
- Number of cutouts per image
(A detailed table of configuration settings is be provided here.)
Run the pipeline to generate your synthetic images:
python main.py

Training and Model Development

Step 4: Train Your Model

You are free to use any model framework for training. However, we suggest YOLOv8 by Ultralytics for ease of use.
Pre-provided training scripts are available in a separate repository:
Training Example Repository.

Validation Process

Step 5: Download Validation Images

Use this command to download the 10 validation images from S3:
aws s3 cp s3://psi-hackathon/advanced_track/validation_data/ /path/to/validation --recursive --no-sign-request
Note: These 10 images are for validation only—do not use them for training.
Validate your model’s performance on this mini test set to fine-tune your approach.

Final Inference and Evaluation

Step 6: Download Final Test Images

In the last 30-60 minutes of the competition, we’ll provide the following command to download the final 100 test images:
aws s3 cp s3://psi-hackathon/advanced_track/test_data/ /path/to/test --recursive --no-sign-request

Step 7: Perform Inference and Submit Results

Use your trained model to run inference on the final test images.
Format the results following the instructions in the Training Example Repository.
Submit your formatted results to us for evaluation.
All compute will end on Sunday at 11:00 am sharp.

Step 8: Present Results

Teams have 5 minutes to present their findings
Presentations start at 1:00 PM on Sunday.

Key Resources Available to Teams

~1 Million Plant Cutouts: Use these cutouts to generate diverse synthetic datasets.
50 Background Images: Add variety to your training data by experimenting with backgrounds.
10 real-world images and labels pulled from the final test set for validation
YOLOv11 and Other Models: Feel free to explore other model frameworks beyond YOLOv11.
Amazon SageMaker: Provides powerful computing resources to build and train your models.

Winning Strategy Tips

Create Diverse Synthetic Data: Use transformations and shadow effects creatively.
Train Effectively: Leverage pre-trained models for faster convergence if needed.
Validate Carefully: Use the 10 validation images wisely to fine-tune your models.
Inference Efficiency: Ensure your model performs well on the final test images under time constraints.

What does a valid submission look like?

1. Deadline and Timing:

Submission Deadline: The link to submit your predictions will close on Sunday at 11 am. This means that teams have until that time to upload their prediction files. Any submissions made after this time will not be considered for grading or ranking.
Each team will also make a small presentation.

2. File Naming Convention:

File Name Format: Each team needs to submit one file only. The file must follow a strict naming format: <team-name>_predictions.csv.
Example: If your team name is team1, the file should be named team1_predictions.csv. A failure to follow this format (e.g., team_1_predictions.csv or team1_preds.csv) may result in disqualification or penalties.

A valid submission file looks like :

image_name,class,confidence,x,y,width,height
img1.jpg,7.0,0.7846097350120544,0.6045709252357483,0.8606472015380859,0.10770443081855774,0.27489542961120605
img2.jpg,0.0,0.6409725546836853,0.7783212661743164,0.8416421413421631,0.10649671405553818,0.12297263741493225
img3.jpg,0.0,0.5564624071121216,0.7094271183013916,0.4191618859767914,0.0850377157330513,0.15624332427978516

3. Where to submit?

The team will post a google form link in discord on the last day.

Rules

No outside data can be used
No misuse of AWS resources (including no crypto mining)
No cross collaboration across teams

Breaking rules will result in an automatic disqualification for that entire team (no questions asked).

Takeaways

This hackathon is designed to demonstrate the power of synthetic images in bridging the gap between limited real-world datasets and robust machine learning solutions. It challenges participants to think creatively, leverage synthetic data effectively, and build models that thrive in real-world agricultural scenarios.

By combining synthetic data generation, machine learning techniques, and AWS infrastructure, participants will tackle real-world agricultural problems in an innovative way.

Good luck, and may the best team win! 🚀

Hackathon-2024