1.Extract_HLS_L30 .R | 1.Extract_HLS_L30 .py |
---|
Code Block |
---|
| library(dplyr)
library(terra)
library(sf)
library(exactextractr)
library(glue) |
| Code Block |
---|
import pandas as pd
import rasterio
import geopandas as gpd
import os
import glob
import numpy as np
from rasterstats import zonal_stats |
|
Explained previously.Syntax is easy. Knowing which libraries to use – not so much! | Code Block |
---|
| setwd(dirname(rstudioapi::getActiveDocumentContext()$path)) |
| Here’s a quick description of each Python library used for this program: pandas: Provides data structures such as DataFrames. Aliased as pd . rasterio: Provides functions for processing geospatial raster data. geopandas: Extends pandas to support spatial data. Aliased as gpd . os: Provides operating system features such as changing directories. glob: Provides functions for listing files that match specific patterns. numpy: Provides functions for multi-dimensional arrays and matrices. Aliased as np . rasterstats: Summarizes geospatial raster datasets based on vector geometries. We only need the zonal_stats function from this library.
|
Code Block |
---|
| process_shapefile <- function(sensor, NIR_band) { |
| Code Block |
---|
def process_shapefile(sensor, NIR_band): |
|
Defines a function that takes the sensor (L30 or S30) and NIR_band ( B05 or B8A) as parameters. |
Code Block |
---|
setwd(base_dir)
tiles <- list.files() |
| Code Block |
---|
os.chdir(base_dir)
tiles = os.listdir() |
|
Changes to the base directory and get a list of the tiles. | |
Code Block |
---|
extract = fields %>% #set up the output data frame extractions
st_drop_geometry() %>%
dplyr::select() |
| Code Block |
---|
extract = fields.iloc[:, 0:0] |
|
Creates a dataframe with the number of rows in fields but with no columns. Note the simple syntax in Python. The single colon : means “all rows.” 0:0 means “no columns.” |
Code Block |
---|
for(i in 1:length(tiles)) { |
| Code Block |
---|
for i, tile in enumerate(tiles): |
|
Creates a loop to iterate through the tiles. Note that the Python code could be written as for tile in tiles . However, the i variable gives us the opportunity of showing the status of the program, as described later. |
Code Block |
---|
tile_dir <- file.path(base_dir, tiles[i])
setwd(tile_dir) |
| Code Block |
---|
tile_dir = os.path.join(base_dir, tile)
os.chdir(tile_dir) |
|
Changes to the directory of the current tile. Note that the Python enumerate function allows us to grab both the tile and its index into the tiles list. Therefore, we don’t need to refer to tiles[i] , although that would work just as well. |
Code Block |
---|
scenes <- list.files(pattern = glob2rx(glue("HLS.{sensor}*"))) |
| Code Block |
---|
scenes = glob.glob(f"HLS.{sensor}*") |
|
Grabs the files that match the pattern HLS.{sensor} . Python’s f strings work similarly to R’s glue function. |
Code Block |
---|
for (j in 1:length(scenes)) { |
| Code Block |
---|
for j, scene in enumerate(scenes): |
|
Creates a loop to iterate through each tile's scenes. |
Code Block |
---|
print(glue("Processing scene {j} of {length(scenes)} in tile {i} of {length(tiles)}")) |
| Code Block |
---|
print(f"Processing scene {j + 1} of {len(scenes)} in tile {i + 1} of {len(tiles)}") |
|
Shows the current status. |
Code Block |
---|
scene_dir <- file.path(tile_dir, scenes[j])
setwd(scene_dir) |
| Code Block |
---|
scene_dir = os.path.join(tile_dir, scene)
os.chdir(scene_dir) |
|
Changes to the scene/date folder. |
Code Block |
---|
bands = list.files() #list rasters in the scene/date folder
list_NIR <- bands[substring(bands, 36, 38) == NIR_band]
list_B04 <- bands[substring(bands, 36, 38) == "B04"] |
| Code Block |
---|
list_NIR = glob.glob(f"*.{NIR_band}.tif")[0]
list_B04 = glob.glob("*.B04.tif")[0] |
|
Grabs the filenames of the NIR and B04 bands. Note how Python’s glob.glob function simplifies this. |
Code Block |
---|
rast.stk <- rast(c(list_B04, list_NIR)) |
| Code Block |
---|
rast_stk = [rasterio.open(list_B04), rasterio.open(list_NIR)] |
|
Loads the raster stack of the B04 and NIR bands. |
Code Block |
---|
names(rast.stk) <- c(substring(list_B04, 1, 38), substring(list_NIR, 1, 38)) |
| Code Block |
---|
rast_stk_named = {name: rast for name, rast in zip(band_names, rast_stk)} |
|
Sets the names of the raster stack based on file names. |
Code Block |
---|
list_cloud_mask <- which(substring(bands, 36, 38) == "Fma")
cloud_mask <- rast(bands[list_cloud_mask]) |
| Code Block |
---|
list_cloud_mask = glob.glob("*.Fmask.tif")[0]
with rasterio.open(list_cloud_mask) as cloud_mask_src:
cloud_mask = cloud_mask_src.read(1).astype(float) |
|
|
Code Block |
---|
cloud_mask0 <- cloud_mask
cloud_mask64 <- cloud_mask
cloud_mask128 <- cloud_mask |
| Code Block |
---|
cloud_mask0 = np.copy(cloud_mask)
cloud_mask64 = np.copy(cloud_mask)
cloud_mask128 = np.copy(cloud_mask) |
|
The R code creates references to cloud_mask . So cloud_mask0 , cloud_mask64 , and cloud_mask128 essentially become “aliases” of cloud_mask . This is an optimization technique, which prevents R from copying data that might not change. However, the data does change in the following code. When that happens, R will automatically change the references to copies of cloud_mask . The Python code makes this explicit by creating copies of cloud_mask to begin with. |
Code Block |
---|
cloud_mask0[cloud_mask0 != 0] <- NA
cloud_mask64[cloud_mask64 != 64] <- NA
cloud_mask128[cloud_mask128 != 128] <- NA |
| Code Block |
---|
cloud_mask0[cloud_mask0 != 0] = np.nan
cloud_mask64[cloud_mask64 != 64] = np.nan
cloud_mask128[cloud_mask128 != 128] = np.nan |
|
Sets values not matching the conditions to NaN. |
Code Block |
---|
cloud_mask_merge <- terra::merge(cloud_mask0, cloud_mask64, cloud_mask128, na.rm = TRUE) |
| Code Block |
---|
stacked_masks = np.stack([cloud_mask128, cloud_mask64, cloud_mask0])
cloud_mask_merge = np.nanmax(stacked_masks, axis=0) |
|
Merges the cloud mask layers for 0, 64, and 128. |
Code Block |
---|
stk.mask <- mask(rast.stk, mask = cloud_mask_merge) |
| Code Block |
---|
stk_mask = [np.where(~np.isnan(cloud_mask_merge), band, np.nan) for band in rast_stk] |
|
Applies the cloud mask to the raster stack |
Code Block |
---|
ext <- exact_extract(stk.mask,fields %>% st_transform(., crs = crs(stk.mask)),'median') |
| Code Block |
---|
fields_transformed = fields.to_crs(crs)
ext = zonal_stats(fields_transformed, stk_mask.read(1), stats=['median']) |
|
CONTINUE FROM HERE |
|
|
Code Block |
---|
| fields = st_read("MDAEnrollment2023-2024_Buff30m.shp") |
| Code Block |
---|
| fields = gpd.read_file("MDAEnrollment2023-2024_Buff30m.shp") |
|
Reads the buffered shapefile. |
Code Block |
---|
plot(fields[1]) |
| Code Block |
---|
fields.plot(edgecolor='black') |
|
Plots the shapefile. edgecolor isn’t required in the Python code. It simply makes the output consistent with the R output.
|