Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Github repo: https://github.com/precision-sustainable-ag/SemiF-DataReporting

Azure blob containers are read using pre-authorized URL which is set to expire on 2025-06-30

Semifield data reporting tool goes through semifield data in azure blob containers, and longterm storage locations: longterm_images, longterm_images2, GROW_DATA to get the details about the files stores.

...

All the information gathered is sent to slack as it is generated to #semifield-datareports channel.

Reports generated

Data report

  • storage used by each category of data (in TB)

    • uploaded images

    • cutouts

    • developed images

  • Graph depicting number of processed vs unprocessed batches grouped by location

Actionable items report

  • Number of batches not preprocessed yet

  • Number of batches not processed yet

  • Number of batches present in azure but not in longterm storage

    • semifield-uploads

    • semifield-developed-images

    • semifield-cutouts

  • Number of uploaded and developed images

  • batch_details.csv: Master table showing batch details across LTS, azure (includes “deduplicated batch”)

  • semif_developed_duplicates_lts.csv: Details about batches present in multiple LTS locations

General distribution stats

  • general_dist_stats.txt

    • developed image stats (total processed images): total count, by common name, by category, by location, by year

    • cutouts stats: total count, by common name, by category, by location, by year, area bucket by grouped by common name

    • <TODO>

Configuration

Code usage

Github repo: https://github.com/precision-sustainable-ag/SemiF-DataReporting

The tool runs automatically using a cronjob running on SUNNY server.

The current cron job runs on jbshah user’s crontab at 9 am every Monday

  • list_blob_contents.py

  • local_batch_table_generator.py

  • report.py

<TODO>