Drone pilot project software requirements
Project Description
Design and build a Data flow and Data analytics pipeline to meet the scientific objectives of the Plant Science Initiative Breeders project. This should ultimately serve as a template for other interdisciplinary science initiative projects. (Feel free to alter)
Data Source
Please indicate your data source and format.
Batch data: Data collected on recurring basis
Only batch data during drone piloting phase
Scheduled data collection on weekly basis
Mostly images 2.35 - 30.4 MB in size
Number of snapshots depends on altitude of flight
Estimate to cover 100 acres a day(roughly 7000 images)
About 163 GB per flight
Stream data: Data continuously generated
None during drone phase. Pipeline can be extended in future years to IoT devices, rovers, or human collected data.
Data Ingestion
Here we describe the process(es) and the tools available to move the data from the point of creation to the point of storage.
Tools:
UAV USB (holds data from the drone)
Globus Application to transfer data
Processes:
Catalog data offloaded from the UAV USB (concertation with PIs)
Need to define how offloaded data is labeled
Meta data associated entire flight
<flight#>: f1, f2, f3, etc..
<date>: mmddyyyy
<platform>: p4p, mat3, m210, I2, M210, M300, etc..
<sensor>: rgb, re, re-p, reduel, altum, altump, etc..
<gcp_available>
Meta data associated with each image
Policies on source (original) data
No analysis or data processing in source folder
Create a work space, move needed data, process and clean up when done.
Transfer data or not:
If everything is processed onsite we will end up having about 18 servers at 18 locations to manage with multiplication of administrative effort.
If we make 100 flights per year we will generate roughly 20TB of data. We can easily transfer 163 GB of data after each flight to a research storage on campus.
Data collection is independent of the analytics. Data can be collected without knowing what it would be used for.
Data Processing
ETL( Extract Transform and Load) is the main process on UAV USB
ELT(Extract Load and Transform) is the main process carried out on stored data
Onsite Processing: We have agreed that a desktop at the station will be used to offload and minimally preprocess data.
Metadata shall be stored in a database and associated with raw image files
Filenames shall be rewritten to prevent duplicate filenames
Filenames should serve as a secondary check on whether metadata was correctly associated with the correct images
Identified applications are Metashape (commercial) or OpenDroneMap (open source) hardware requirements for Metashape have a Good discussion here
Station Desktop:
Disk type and size: 20 TB of attached storage & 2TB of OS disk
Processor type and size: 8 cores Intel(R) Xeon(R) W-2275 CPU @ 3.30GHz & NVIDIA RTX A6000
Operating System: Ubuntu 20.04.6 LTS
Data Repository
Unstructured Data:
All raw data from the sensors SHALL be copied from the research stations to RC storage on campus into a read only folder title ‘RAW_DATA’. These files will inherit the filenames from the application deployed on the research station workstation that renames files with a unique combination of metadata
Most data products will be automatically created for each date. Products SHALL be stored in folder titled ‘DATA_PRODUCTS’:
RGB orthomosaic
Point clouds and DSM’s created from SfM algorithms applied to RGB data
Point clouds and DSM’s created from LiDAR sensor
Thermal infrared overlay when utilized by flight crew
Multispectral overlay when utilized by flight crew
Other future, unanticipated sensors
Structured Data:
Metadata associated with flights and images (see data ingestion section) SHALL be recorded in a database with pointers to the files stored in ‘RAW_DATA’ and ‘DATA_PRODUCTS’.
PLOT LEVEL DATA
An application will be created, with as much synergy with ImageBreed as possible, to allow researchers to associate their plot maps with UAV collected data. Key features SHALL include:
A method for uploading a complete plot map with gps coordinates of all plots already embedded
A method of deriving such a map as that above through the use of a GUI where users can define plots with a mouse cursor.
The ability to query the database storing all metadata for any data associated with the defined coordinates in the plot maps defined in the preceding bullets.
The ability to download any ‘RAW_DATA’ or ‘DATA_PRODUCTS’ from RC storage that were found in the preceding query.
An API will be developed to support the features in this application. That API SHALL comply with BrAPI-CORE and Br-API-Phenotyping specifications. Any additional features of our API should be submitted as pull requests to the ImageBreed repository.