Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 3 Current »

Semifield data reporting tool goes through semifield data in azure blob containers, and longterm storage locations: longterm_images, longterm_images2, GROW_DATA to get the details about the files stores.

The tool then runs a report to get insights from the files to generate “batch stats”, and goes through the sqlite database to generate “general distribution stats” on cutouts and developed images.

All the information gathered is sent to slack as it is generated to #semifield-datareports channel.

Reports generated

Data report

  • storage used by each category of data (in TB)

    • uploaded images

    • cutouts

    • developed images

  • Graph depicting number of processed vs unprocessed batches grouped by location

Actionable items report

  • Number of batches not preprocessed yet

  • Number of batches not processed yet

  • Number of batches present in azure but not in longterm storage

    • semifield-uploads

    • semifield-developed-images

    • semifield-cutouts

  • Number of uploaded and developed images

  • batch_details.csv: Master table showing batch details across LTS, azure (includes “deduplicated batch”)

  • semif_developed_duplicates_lts.csv: Details about batches present in multiple LTS locations

General distribution stats

  • general_dist_stats.txt

    • developed image stats (total processed images): total count, by common name, by category, by location, by year

    • cutouts stats: total count, by common name, by category, by location, by year, area bucket by grouped by common name

    • <TODO>

Configuration

Code usage

Github repo: https://github.com/precision-sustainable-ag/SemiF-DataReporting

The tool runs automatically using a cronjob running on SUNNY server.

The current cron job runs on jbshah user’s crontab at 9 am every Monday

  • list_blob_contents.py

  • local_batch_table_generator.py

  • report.py

<TODO>

  • No labels