Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • general_dist_stats.txt

    • developed image stats (total processed images): total count, by common name, by category, by location, by year

    • cutouts stats: total count, by common name, by category, by location, by year

    • primary cutouts stats: total count, area bucket by by common name, by category, by location, by year

  • area_by_common_name.csv: Counts grouped by common name

    <TODO>

    , primary vs non-primary cutouts, bucketed by bbox area.

Configuration

...

The current cron job runs on jbshah user’s crontab at 9 am every Monday

  • list_blob_contents.py:

    • goes through azure blob containers

    • categorizes into processed, unprocessed, preprocessed, unpreprocessed

    • adds batch stats

    • creates separate csvs

  • local_batch_table_generator.py:

    • goes through lts locations

    • categorizes into different types

    • generates batch stats

    • creates separate csvs

  • report.py:

    • copies the separate csvs generated from list_blob_contents.py

...

    • and local_batch_table_generator.py into a report folder

    • combines azure and lts csvs for uploads, developed images and cutouts

    • performs deduplication between batches when they are present in multiple lts locations

    • generates summary report, actionable report messages and sends it to slack

    • queries the agir.db SQLite database (expected in the code folder) to get general distribution stats for developed images, cutouts and primary cutouts

    • generates area_by_common_name csv after fetching the results from the database

    • sends general distribution stats to slack.

  • cronjob.sh:

    • cron shell script to trigger in crontab. currently goes to jbshah user’s codebase

    • copies database to the codebase to reduce network latency

    • triggers the pipeline

    • crontab: 0 9 * * 1 /home/jbshah/SemiF-DataReporting/cronjob.sh