/
SemiField Data Reporting

SemiField Data Reporting

Semifield data reporting tool goes through semifield data in azure blob containers, and longterm storage locations: longterm_images, longterm_images2, GROW_DATA to get the details about the files stores.

The tool then runs a report to get insights from the files to generate “batch stats”, and goes through the sqlite database to generate “general distribution stats” on cutouts and developed images.

All the information gathered is sent to slack as it is generated to #semifield-datareports channel.

 

Reports generated

Data report

  • storage used by each category of data (in TB)

    • uploaded images

    • cutouts

    • developed images

  • Graph depicting number of processed vs unprocessed batches grouped by location

Actionable items report

  • Number of batches not preprocessed yet

  • Number of batches not processed yet

  • Number of batches present in azure but not in longterm storage

    • semifield-uploads

    • semifield-developed-images

    • semifield-cutouts

  • Number of uploaded and developed images

  • batch_details.csv: Master table showing batch details across LTS, azure (includes “deduplicated batch”)

  • semif_developed_duplicates_lts.csv: Details about batches present in multiple LTS locations

General distribution stats

  • general_dist_stats.txt

    • developed image stats (total processed images): total count, by common name, by category, by location, by year

    • cutouts stats: total count, by common name, by category, by location, by year

    • primary cutouts stats: total count, by common name, by category, by location, by year

  • area_by_common_name.csv: Counts grouped by common name, primary vs non-primary cutouts, bucketed by bbox area.

 

Configuration

 

Code usage

Github repo: https://github.com/precision-sustainable-ag/SemiF-DataReporting

The tool runs automatically using a cronjob running on SUNNY server.

The current cron job runs on jbshah user’s crontab at 9 am every Monday

 

  • list_blob_contents.py:

    • goes through azure blob containers

    • categorizes into processed, unprocessed, preprocessed, unpreprocessed

    • adds batch stats

    • creates separate csvs

  • local_batch_table_generator.py:

    • goes through lts locations

    • categorizes into different types

    • generates batch stats

    • creates separate csvs

  • report.py:

    • copies the separate csvs generated from list_blob_contents.py and local_batch_table_generator.py into a report folder

    • combines azure and lts csvs for uploads, developed images and cutouts

    • performs deduplication between batches when they are present in multiple lts locations

    • generates summary report, actionable report messages and sends it to slack

    • queries the agir.db SQLite database (expected in the code folder) to get general distribution stats for developed images, cutouts and primary cutouts

    • generates area_by_common_name csv after fetching the results from the database

    • sends general distribution stats to slack.

  • cronjob.sh:

    • cron shell script to trigger in crontab. currently goes to jbshah user’s codebase

    • copies database to the codebase to reduce network latency

    • triggers the pipeline

    • crontab: 0 9 * * 1 /home/jbshah/SemiF-DataReporting/cronjob.sh

Related content

Azure weedsimagerepo Storage
Azure weedsimagerepo Storage
More like this
SemiField Preprocessing
SemiField Preprocessing
More like this
Overview
Read with this
Field data shepherding
Field data shepherding
More like this
Data Examples
Data Examples
Read with this
Semi-Field Data Overview
Semi-Field Data Overview
More like this