Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Developing image datasets is difficult, especially for agricultural applications. Collecting and annotating images pose major challenges.

Image Collection is Difficult

Data hungry deep learning models require hundreds of thousands, sometimes millions, of images to train and become invariant to changes in location, lighting, backgrounds, or and other factors. Devising high-throughput systems is necessary to capture hundreds of thousands of images of weeds at various growth stages while affected by also capturing the effects of field-like conditions. A network of agronomic, computer hardware, and software engineers is needed to automate large scale image collection.

Agricultural Scenes are Diverse

  • Plant look different depending on growth stage

  • Biotic and abiotic factors effect morphology

  • Weeds are both diverse and similar

    • High intra- and inter-species variation but

    • Different species look similar like palmer and waterhemp or Johnsongrass and corn.

  • Agricultural scenes are complex and dense

    • Climate, soil, ground residue, and other weed populations

Labeling is Time Consuming

Pixel-wise labels must be accurate. The process of labeling by hand can take minutes to hours even when using third-part annotation tools. Labels for semantic segmentation are the most time consuming. The need for high accuracy, complex leaf structures, and amorphous shapes make labeling plants one of the most difficult labeling tasks. Many have reported the high time requirement needed demands for labeling images of weeds.

...

Source

Annotation Technique

Scene type

Time per image

Cicco et al. (2017)

manual segmentation of cutouts

Simple

5-30 min / real image

Skovsen et al. (2019) (in conversation)

manual segmentation of cutouts

very complex (field)

hour(s) / real images

Sa et al. (2017)

manual segmentation of drone images of 3 classes

complex (field)

60 min / image

Bosilj et al. (2020)

manual segmentation 3 classes

complex (field)

2-4 hrs / image

Labeling is Expensive

Texas A&M Case Study

Used TAMU used a third-party labeling service to sematically semantically label images of agricultural weeds and plants at early growth stages roughly 2-6 weeks after emergence. Details are as follows:

  • Company: Precise BPO Solution

  • 1000 images

  • 11726 segments

  • Images: 1,000

  • Total segments: 11,726

  • Cost per segement: $0.125 per segment*

    • *While TAMU was given a discount of $0.095 per segment, $0.125, the non-discounted price, is used here. It is unlikely the company will provide the same discount for images over 1000.

  • Time to label all images - Labeling time: 2.5 weeks

  • Number of workers: unknown

  • Total cost: $1465.75 total cost

Other Labeling Services

Precise BPO Solutions were relatively inexpensive compared to other more known labeling services. However, their total time (the turn-around-time, 2.5 weeks for 11,726 segments) , is not scalable. Using more workers may decrease turn-around-time but brings increased costs. Using

Other third-party labeling services like Google AI Platform and Amazon SageMaker come with high costs when considering scale and time. Less time for labeling requires more workers which let you choose the number of workers to decrease turn-around-time on labels. However, using more workers significantly increases costs.

Google Cloud (AI Platform)

  • uses “unit” pricing. For example, 2 segments x 2 workers = 4 units.

  • Prices start at $870 for 1,000 units (e.i. 2 workers x 3 segments = 6 units)

  • Image segmentation is their most expensive labeling task

Amazon SageMaker (Mechanical Turk)

...

  • $0.08 per object segment review + ( $0.84 per semantic labeling taskreviewing 16272 per week * (730 hours in a month / 168 hours in a week) = 70705.71 per monthsegment x workers)

  • For example, 1,000 segments and 2 workers = (0.08 x 1000) + (0.84 x 1000 x 2) = $1760

  • semantic segmentation is most expensive labeling task

Amazon SageMaker (vendor: Cogito)

  • Amazon allows you to go with other vendors that set their own pricing schedule

  • highest rated labeling vendor in amazon marketplace

V7Labs

  • Found with random google search

Company

Workers

Segments

Worker hours

Cost

Google Cloud

1

1000

$870

Amazon SageMaker (Mechanical Turk)

1

1

$0.08 + $0.84

Amazon SageMaker (vendor: Cogito)

1

$5.04

V7Labs

3,600 hrs or 360,000 annotations

$5400 / year

Agricultural Scenes are Diverse

...

...

Large intra-species variation and similarity with other species

...

Complex agricultural scenes as the result of effect from climate, soil, ground residue, and other weed populations.

Labeling Cost Projections for SemiField

Precise BPO Solutions

We can estimate the expected costs and time for SemiField data collection using the Texas A&M numbersvarious third-party labeling options.

  • here we We use a smaller average of 8 segments per image (instead of 11.726 like TAMU)

  • $0.125 per segment

  • 32 seconds per segment (very conservative)

    • 2.5 weeks / 11,726 segments

    • 2.5 weeks

    (
    • = 13 working days (from 8am - 5pm) = ~104 worker hours

    104 hours translates to 32 seconds per segment (104 / 11726)

Images

Segments

Worker Hours estimate

BPO Solutions

Google Cloud AI Platform**

Amazon Turk**

1,000*

11,726

6,249.958

$1,465.750

$20,403

$20,638

1,000

8,000

4,264

$1,000

$13,920

$14,080

10,000

80,000

42,640

$10,000

$139,200

$140,800

25,000

200,000

106,600

$25,000

$348,000

$352,000

50,000

400,000

213,200

$50,000

$696,000

$704,000

100,000

800,000

426,400

$100,000

$1,392,000

$1,408,000

250,000

2,000,000

1,066,000

$250,000

$3,480,000

$3,520,000

...