Github repo: https://github.com/precision-sustainable-ag/SemiF-DataReporting
Azure blob containers are read using pre-authorized URL which is set to expire on
2025-06-30
Semifield data reporting tool goes through semifield data in azure blob containers, and longterm storage locations: longterm_images
, longterm_images2
, GROW_DATA
to get the details about the files stores.
...
All the information gathered is sent to slack as it is generated to #semifield-datareports
channel.
Reports generated
Data report
storage used by each category of data (in TB)
uploaded images
cutouts
developed images
Graph depicting number of processed vs unprocessed batches grouped by location
Actionable items report
Number of batches not preprocessed yet
Number of batches not processed yet
Number of batches present in azure but not in longterm storage
semifield-uploads
semifield-developed-images
semifield-cutouts
Number of uploaded and developed images
batch_details.csv
: Master table showing batch details across LTS, azure (includes “deduplicated batch”)semif_developed_duplicates_lts.csv
: Details about batches present in multiple LTS locations
General distribution stats
general_dist_stats.txt
developed image stats (total processed images): total count, by common name, by category, by location, by year
cutouts stats: total count, by common name, by category, by location, by year, area bucket by grouped by common name
<TODO>
Configuration
Azure blob containers are read using pre-authorized URL which is set to expire on
2025-06-30
slack bot configuration: https://crownteamworkspace.slack.com/marketplace/A0831MD2TPZ-semif-datareporting?settings=1&tab=settings
Code usage
Github repo: https://github.com/precision-sustainable-ag/SemiF-DataReporting
The tool runs automatically using a cronjob
running on SUNNY server.
The current cron job runs on jbshah
user’s crontab at 9 am every Monday
list_blob_contents.py
local_batch_table_generator.py
report.py
<TODO>