Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel7

“The scarcity of public image datasets remains a key bottleneck in developing next-generation computer vision and intelligent systems for precision agriculture.” (Lu and Young 2020)

Why?

Developing image datasets is difficult, especially for agricultural applications. Collecting and annotating images pose major challenges.

Labeling is Time Consuming

Time consuming

  • Need tens - hundreds of thousands, sometimes millions, of images to train data hungry deep learning models

  • other have noted that manually labeling images can take minutes to hours (Table 1)

Labeling Time from the literature

...

Source

...

Annotation Technique

...

Scene type

...

Time

...

Cicco et al. (2017)

...

manual segmentation of cutouts

...

Simple

...

5-30 min / real image

...

Skovsen et al. (2019) (in conversation)

...

manual segmentation of cutouts

...

complex (field)

...

hour(s) / real images

...

Sa et al. (2017)

...

manual segmentation of drone images of 3 classes

...

complex (field)

...

60 min / image

...

Bosilj et al. (2020)

...

manual segmentation 3 classes

...

complex (field)

...

2-4 hrs / image

Labeling is Expensive

Texas A&M Case Study

Used a third-party labeling service to sematically label images of agricultural weeds and plants at early growth stages roughly 2-6 weeks after emergence. Details are as follows:

  • Company: Precise BPO Solution

  • 1000 images

  • 11726 segments

  • $0.125 per segment*

    • *While TAMU was given a discount of $0.095 per segment, $0.125, the non-discounted price, is used here. It is unlikely the company will provide the same discount for images over 1000.

  • Time to label all images - 2.5 weeks

  • $1465.75 total cost

Agricultural Scenes are Diverse

...

High individual diversity from differences in growth stages and effects of biotic and abiotic factors on morphology

...

Large intra-species variation and similarity with other species

...

While the potential of computer vision and artificial intelligence to revolutionize precision agriculture is immense, the lack of accessible and comprehensive image datasets is a significant roadblock. Why is this the case?

The Challenges of Image Collection and Annotation

  1. Image Collection is Demanding:

    • Deep learning models thrive on massive amounts of data, often requiring hundreds of thousands or even millions of images to learn effectively. This demands high-throughput systems that can efficiently capture the diversity of agricultural scenes, from various growth stages to varying field conditions.

    • Creating such systems requires a multidisciplinary effort, involving agronomists, hardware engineers, and software developers.

  2. Agricultural Scenes are Inherently Diverse:

    • Plants undergo significant changes in appearance throughout their growth cycle.

    • Environmental factors like pests, diseases, and weather can drastically alter plant morphology.

    • Weeds present a particular challenge due to both high intra-species (within a species) and inter-species (between species) variations.

    • Agricultural fields are complex environments, influenced by climate, soil, ground residue, and

...

    • the presence of other plant species.

  1. Labeling

...

We can estimate the expected costs and time for SemiField data collection using the Texas A&M numbers.

  • here we use a smaller average of 8 segments per image (instead of 11.726 like TAMU)

  • $0.125 per segment

  • 2.5 weeks (13 working days from 8am - 5pm) = ~104 worker hours

  • 104 hours translates to 32 seconds per segment (104 / 11726)

...

Images

...

Worker Hours estimate

...

Total Cost ($)

...

1,000

...

6249.958

...

$1,465.750

...

1,000

...

4264

...

$1,000.000

...

10,000

...

42640

...

$10,000.000

...

25,000

...

106600

...

$25,000.000

...

100,000

...

426400

...

$100,000.000

...

250,000

...

1066000

...

$250,000.000

...

  1. is a Time-Consuming and Expensive Bottleneck:

    • Pixel-wise annotations, essential for training segmentation models, are especially labor-intensive. Even with specialized tools, manually labeling a single image can take minutes to hours.

    • The need for high accuracy, coupled with the complexity of plant structures, makes labeling agricultural images a particularly demanding task.

    • Numerous studies have documented the significant time investment required for labeling weed images.

  • High Costs: Third-party labeling services, while offering a solution, can be prohibitively expensive, especially for large datasets. The Texas A&M case study illustrates this, with costs exceeding $1400 for labeling just 1000 images.

AgIR: A Solution to the Data Challenge

The Agricultural Image Repository (AgIR) addresses these challenges by providing a comprehensive, well-annotated dataset that combines:

  • High-throughput Semi-Field Data: Collected under semi-controlled conditions for scalable and automated annotation.

  • Real-World Field Data: Capturing the complexity and diversity of agricultural environments.

By combining these two data sources and utilizing innovative annotation strategies, AgIR aims to accelerate research and development in precision agriculture, ultimately leading to more efficient, sustainable, and productive farming practices.