...
general_dist_stats.txt
developed image stats (total processed images): total count, by common name, by category, by location, by year
cutouts stats: total count, by common name, by category, by location, by year
primary cutouts stats: total count, area bucket by by common name, by category, by location, by year
<TODO>area_by_common_name.csv
: Counts grouped by common name, primary vs non-primary cutouts, bucketed by bbox area.
Configuration
Azure blob containers are read using pre-authorized URL which is set to expire on
2025-06-30
slack bot configuration: https://crownteamworkspace.slack.com/marketplace/A0831MD2TPZ-semif-datareporting?settings=1&tab=settings
...
The current cron job runs on jbshah
user’s crontab at 9 am every Monday
list_blob_contents.py
:goes through azure blob containers
categorizes into processed, unprocessed, preprocessed, unpreprocessed
adds batch stats
creates separate csvs
local_batch_table_generator.py
:goes through lts locations
categorizes into different types
generates batch stats
creates separate csvs
report.py
:copies the separate csvs generated from
list_blob_contents.py
...
and
local_batch_table_generator.py
into a report foldercombines azure and lts csvs for uploads, developed images and cutouts
performs deduplication between batches when they are present in multiple lts locations
generates summary report, actionable report messages and sends it to slack
queries the
agir.db
SQLite database (expected in the code folder) to get general distribution stats for developed images, cutouts and primary cutoutsgenerates
area_by_common_name
csv after fetching the results from the databasesends general distribution stats to slack.
cronjob.sh
:cron shell script to trigger in crontab. currently goes to
jbshah
user’s codebasecopies database to the codebase to reduce network latency
triggers the pipeline
crontab:
0 9 * * 1 /home/jbshah/SemiF-DataReporting/cronjob.sh