Overview
|
|
---|
Hackathon Objective
Participants will use the Hackathon2024-SyntheticPipeline to generate synthetic agricultural images, train a detection model, and test the model on real-world data. The challenge emphasizes creativity and technical skills in leveraging synthetic data to solve real-world agricultural problems, such as plant detection and classification.
Why Use Synthetic Images?
1. Bridge the Data Gap
In agricultural machine learning, real-world annotated data is often scarce or expensive to collect. Creating large, labeled datasets manually is time-consuming and requires expert knowledge.
Synthetic images provide a cost-effective and scalable solution by generating datasets with realistic variability, saving time and resources.
2. Data Augmentation for Robust Models
Synthetic data allows for controlled variability by adjusting the placement, orientation, and appearance of cutouts. This helps:
Improve model generalization by exposing it to diverse conditions (e.g., lighting changes, rotations).
Compensate for imbalanced datasets by generating more images of underrepresented classes.
3. Prepare for Edge Cases
With synthetic images, participants can create complex scenarios (e.g., occlusions or overlapping plants) that may rarely appear in real-world datasets but are crucial for robust detection models. This improves the model’s ability to handle unexpected situations during real-world testing.
4. Rapid Prototyping
Synthetic pipelines enable quick iterations during model development. Participants can generate new datasets on-demand to explore what-if scenarios, refine models, and experiment with hyperparameters before testing on real-world data.
Hackathon Workflow
Step 1: Set Up Environment on Amazon SageMaker Notebooks
Access SageMaker Notebooks:
For this Hackathon you will be using AWS Workshop Studio to access an AWS account. You can find instructions on accessing Workshop Studio here: AWS Workshop Studio Lab Guide
Once you are logged into the AWS account, access the Amazon Sagemaker Console here: Amazon Sagemaker Console
On the left hand menu click on Notebooks:
There is a Jupyter Notebook that has already been created for you, click on Open JupyterLab
From the Jupyter Lab environment, click on Terminal
From within the terminal, navigate to the SageMaker folder:
cd SageMaker
Clone the Hackathon2024-SyntheticPipeline repository into your notebook instance:
git clone https://github.com/precision-sustainable-ag/Hackathon2024-SyntheticPipeline.git
Step 2: Download Background Images
Use the following AWS S3 command to manually download background images to the correct folder:
aws s3 cp s3://psi-hackathon/advanced_track/backgrounds/ /path/to/repo_root/data/backgrounds/ --recursive --no-sign-request
Note: Replace
/path/to/repo_root
with the actual path to the cloned repository.
Step 3: Configure MongoDB
In the main configuration file (conf/config.yaml
), set the host
, username
and password
which will be provided to you via discord.
Step 4: Configure and Generate Synthetic Images
Set the configuration files as needed to control:
Cutout selection criteria
Number of cutouts per image
(A detailed table of configuration settings is be provided here.)
Run the pipeline to generate your synthetic images:
Training and Model Development
Step 4: Train Your Model
You are free to use any model framework for training. However, we suggest YOLOv8 by Ultralytics for ease of use.
Pre-provided training scripts are available in a separate repository:
Training Example Repository.
Validation Process
Step 5: Download Validation Images
Use this command to download the 10 validation images from S3:
Note: These 10 images are for validation only—do not use them for training.
Validate your model’s performance on this mini test set to fine-tune your approach.
Final Inference and Evaluation
Step 6: Download Final Test Images
In the last 30-60 minutes of the competition, we’ll provide the following command to download the final 100 test images:
Step 7: Perform Inference and Submit Results
Use your trained model to run inference on the final test images.
Format the results following the instructions in the Training Example Repository.
Submit your formatted results to us for evaluation.
All compute will end on Sunday at 11:00 am sharp.
Step 8: Present Results
Teams have 5 minutes to present their findings
Presentations start at 1:00 PM on Sunday.
Key Resources Available to Teams
~1 Million Plant Cutouts: Use these cutouts to generate diverse synthetic datasets.
50 Background Images: Add variety to your training data by experimenting with backgrounds.
10 real-world images and labels pulled from the final test set for validation
YOLOv11 and Other Models: Feel free to explore other model frameworks beyond YOLOv11.
Amazon SageMaker: Provides powerful computing resources to build and train your models.
Winning Strategy Tips
Create Diverse Synthetic Data: Use transformations and shadow effects creatively.
Train Effectively: Leverage pre-trained models for faster convergence if needed.
Validate Carefully: Use the 10 validation images wisely to fine-tune your models.
Inference Efficiency: Ensure your model performs well on the final test images under time constraints.
What does a valid submission look like?
1. Deadline and Timing:
Submission Deadline: The link to submit your predictions will close on Sunday at 11 am. This means that teams have until that time to upload their prediction files. Any submissions made after this time will not be considered for grading or ranking.
Each team will also make a small presentation.
2. File Naming Convention:
File Name Format: Each team needs to submit one file only. The file must follow a strict naming format:
<team-name>_predictions.csv
.Example: If your team name is
team1
, the file should be namedteam1_predictions.csv
. A failure to follow this format (e.g.,team_1_predictions.csv
orteam1_preds.csv
) may result in disqualification or penalties.
A valid submission file looks like :
3. Where to submit?
The team will post a google form link in discord on the last day.
Rules
No outside data can be used
No misuse of AWS resources (including no crypto mining)
No cross collaboration across teams
Breaking rules will result in an automatic disqualification for that entire team (no questions asked).
Takeaways
This hackathon is designed to demonstrate the power of synthetic images in bridging the gap between limited real-world datasets and robust machine learning solutions. It challenges participants to think creatively, leverage synthetic data effectively, and build models that thrive in real-world agricultural scenarios.
By combining synthetic data generation, machine learning techniques, and AWS infrastructure, participants will tackle real-world agricultural problems in an innovative way.
Good luck, and may the best team win! 🚀