Table of Contents | ||||
---|---|---|---|---|
|
“The scarcity of public image datasets remains a key bottleneck in developing next-generation computer vision and intelligent systems for precision agriculture.” (Lu and Young 2020)
Why?
Developing image datasets is difficult, especially for agricultural applications. Collecting and annotating images pose major challenges.
Image Collection is Difficult
Data hungry deep learning models require hundreds of thousands, sometimes millions, of images to train and become invariant to changes in location, lighting, backgrounds, and other factors. Devising high-throughput systems is necessary to capture hundreds of thousands of images of weeds at various growth stages while also capturing the effects of field-like conditions. A network of agronomic, computer hardware, and software engineers is needed to automate large scale image collection.
Agricultural Scenes are Diverse
...
Plant look different depending on growth stage
...
Biotic and abiotic factors effect morphology
...
Weeds are both diverse and similar
High intra- and inter-species variation but
Different species look similar like palmer and waterhemp or Johnsongrass and corn.
Agricultural scenes are complex and dense
...
While the potential of computer vision and artificial intelligence to revolutionize precision agriculture is immense, the lack of accessible and comprehensive image datasets is a significant roadblock. Why is this the case?
The Challenges of Image Collection and Annotation
Image Collection is Demanding:
Deep learning models thrive on massive amounts of data, often requiring hundreds of thousands or even millions of images to learn effectively. This demands high-throughput systems that can efficiently capture the diversity of agricultural scenes, from various growth stages to varying field conditions.
Creating such systems requires a multidisciplinary effort, involving agronomists, hardware engineers, and software developers.
Agricultural Scenes are Inherently Diverse:
Plants undergo significant changes in appearance throughout their growth cycle.
Environmental factors like pests, diseases, and weather can drastically alter plant morphology.
Weeds present a particular challenge due to both high intra-species (within a species) and inter-species (between species) variations.
Agricultural fields are complex environments, influenced by climate, soil, ground residue, and
...
the presence of other plant species.
Labeling is a Time-Consuming and Expensive Bottleneck:
Pixel-wise
...
annotations, essential for training segmentation models, are especially labor-intensive. Even with specialized tools, manually labeling a single image can take minutes to hours
...
.
The need for high accuracy,
...
coupled with the complexity of plant structures,
...
Source
...
Annotation Technique
...
Scene type
...
Time per image
...
...
manual segmentation of cutouts
...
Simple
...
5-30 min
...
Skovsen et al. (2019) (in conversation)
...
manual segmentation of cutouts
...
very complex (field)
...
hour(s)
...
...
manual segmentation of drone images of 3 classes
...
complex (field)
...
60 min
...
...
manual segmentation 3 classes
...
complex (field)
...
2-4 hrs
Labeling is Expensive
Texas A&M Case Study
TAMU used a third-party labeling service to semantically label images of agricultural weeds and plants at early growth stages roughly 2-6 weeks after emergence. Details are as follows:
Company: Precise BPO Solution
Images: 1,000
Total segments: 11,726
Cost per segement: $0.125*
*While TAMU was given a discount of $0.095 per segment, $0.125, the non-discounted price, is used here. It is unlikely the company will provide the same discount for images over 1000.
Labeling time: 2.5 weeks
Number of workers: unknown
Total cost: $1465.75
Other Labeling Services
Precise BPO Solutions were relatively inexpensive compared to other more known labeling services. However, the turn-around-time, 2.5 weeks for 11,726 segments, is not scalable.
Other third-party services like Google AI Platform and Amazon SageMaker let you choose the number of workers to decrease turn-around-time on labels. However, using more workers significantly increases costs.
Google Cloud (AI Platform)
Prices start at $870 for 1,000 units (e.i. 2 workers x 3 segments = 6 units)
Image segmentation is their most expensive labeling task
Amazon SageMaker (Mechanical Turk)
$0.08 per segment review + ( $0.84 per segment x workers)
For example, 1,000 segments and 2 workers = (0.08 x 1000) + (0.84 x 1000 x 2) = $1760
semantic segmentation is most expensive labeling task
Amazon SageMaker (vendor: Cogito)
Amazon allows you to go with other vendors that set their own pricing schedule
highest rated labeling vendor in amazon marketplace
V7Labs
Found with random google search
...
Company
...
Workers
...
Segments
...
Worker hours
...
Cost
...
...
1
...
1000
...
$870
...
Amazon SageMaker (Mechanical Turk)
...
1
...
1
...
$0.08 + $0.84
...
Amazon SageMaker (vendor: Cogito)
...
1
...
$5.04
...
...
3,600 hrs or 360,000 annotations
...
$5400 / year
Labeling Cost Projections for SemiField
We can estimate the expected costs and time for SemiField data collection using the various third-party labeling options.
We use a smaller average of 8 segments per image (instead of 11.726 like TAMU)
$0.125 per segment
32 seconds per segment (very conservative)
2.5 weeks / 11,726 segments
2.5 weeks = 13 working days (from 8am - 5pm) = ~104 worker hours
...
Images
...
Segments
...
Worker Hours estimate
...
BPO Solutions
...
Google Cloud AI Platform**
...
Amazon Turk**
...
1,000*
...
11,726
...
6,249.958
...
$1,465.750
...
$20,403
...
$20,638
...
1,000
...
8,000
...
4,264
...
$1,000
...
$13,920
...
$14,080
...
10,000
...
80,000
...
42,640
...
$10,000
...
$139,200
...
$140,800
...
25,000
...
200,000
...
106,600
...
$25,000
...
$348,000
...
$352,000
...
50,000
...
400,000
...
213,200
...
$50,000
...
$696,000
...
$704,000
...
100,000
...
800,000
...
426,400
...
$100,000
...
$1,392,000
...
$1,408,000
...
250,000
...
2,000,000
...
1,066,000
...
$250,000
...
$3,480,000
...
$3,520,000
...
makes labeling agricultural images a particularly demanding task.
Numerous studies have documented the significant time investment required for labeling weed images.
High Costs: Third-party labeling services, while offering a solution, can be prohibitively expensive, especially for large datasets. The Texas A&M case study illustrates this, with costs exceeding $1400 for labeling just 1000 images.
AgIR: A Solution to the Data Challenge
The Agricultural Image Repository (AgIR) addresses these challenges by providing a comprehensive, well-annotated dataset that combines:
High-throughput Semi-Field Data: Collected under semi-controlled conditions for scalable and automated annotation.
Real-World Field Data: Capturing the complexity and diversity of agricultural environments.
By combining these two data sources and utilizing innovative annotation strategies, AgIR aims to accelerate research and development in precision agriculture, ultimately leading to more efficient, sustainable, and productive farming practices.