Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Example conversions

The following examples demonstrate how to convert the R code at https://github.com/jjennewein/MDA_2023-24 into equivalent Python code.

...

0.field buffer.R

...

0.field buffer.py

...

Code Block
languager
library(sf)

...

Code Block
import geopandas as gpd

...

Loads a library for handling geospatial data, including working with shapefiles.

In the Python code, gpd is an alias for the geopandas library Using the alias simply reduces typing and improves code readability.

...

Code Block
languager
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))

...

RStudio defaults to a global working directory, which means file paths must be specified explicitly.

Python defaults to the folder where the source code is located.

Jupyter Notebook defaults to where you launch the notebook server.

  • You could launch it from the command line within the source folder:

    Code Block
    cd \psa\MDA_2023-24\python
    jupyter notebook
  • Or you could specify the folder directly when launching Jupyter:

    Code Block
    jupyter notebook \psa\MDA_2023-24\python

The original file path was "E:/Eastern_Shore/MD_enrollment_data_2023_24/fields/", but I don't have access to that location.

To avoid referencing "E:/Eastern_Shore..." explicitly, I saved all the necessary project files to the same folder as the source code, and I added a setwd() statement to the R source. I then removed all references to "E:/Eastern_Shore..."

This way, the R program can run without requiring the full path in filenames.

...

Code Block
languager
mda = st_read("MDAEnrollment2023-2024.shp") %>%
  st_transform(., crs = 32618) %>%
  st_buffer(., dist = -30) %>%
  dplyr::filter(!st_is_empty(.))

...

Code Block
languagepy
mda = gpd.read_file("MDAEnrollment2023-2024.shp") \
      .to_crs(epsg=32618)
mda["geometry"] = mda.geometry.buffer(-30)
mda = mda[~mda.geometry.is_empty]

...

This code:

  • Reads a shapefile into a spatial data frame (R) or a GeoDataFrame (Python).

  • Converts the coordinate reference system (CRS) to EPSG 32618 (WGS 84).

  • Applies a negative buffer of 30 meters to shrink the geometries.
    Note that sf objects in R have a designated geometry column, but in GeoPandas you need to explicitly reference the geometry column.

  • Filters out any empty geometries resulting from the buffering.
    In Python, ~ is a bitwise NOT operator.

R uses %>% for chaining operations, while Python uses . (dot notation).

In Python, a backslash (\) means line continuation. I used it here to align the code with the R version. But I could have also written it as:
mda = gpd.read_file("MDAEnrollment2023-2024.shp").to_crs(epsg=32618)

Also, note that the dots in st_transform(., crs = 32618) and st_buffer(., dist = -30) are optional but may improve readability.

...

Code Block
st_write(mda, "MDAEnrollment2023-2024_Buff30m.shp", append=FALSE)

...

Code Block
mda.to_file("MDAEnrollment2023-2024_Buff30m.shp")

...

Writes the buffered data frame to a new shapefile.

I added append=FALSE to the original source so the file would be overwritten if it exists. GeoPandas automatically overwrites existing files.

...

1.Extract_HLS_L30 .R

...

1.Extract_HLS_L30 .py

...

Code Block
languager
library(dplyr)
library(terra)
library(sf)
library(exactextractr)
library(glue)

...

Code Block
import pandas as pd
import rasterio
import geopandas as gpd
import os
import glob
import numpy as np
from rasterstats import zonal_stats

...

Syntax is easy. Knowing which libraries to use – not so much!

Here’s a quick description of each Python library used for this program:

  • pandas: Provides data structures such as DataFrames. Aliased as pd .

  • rasterio: Provides functions for processing geospatial raster data.

  • geopandas: Extends pandas to support spatial data. Aliased as gpd.

  • os: Provides operating system features such as changing directories.

  • glob: Provides functions for listing files that match specific patterns.

  • numpy: Provides functions for multi-dimensional arrays and matrices. Aliased as np.

  • rasterstats: Summarizes geospatial raster datasets based on vector geometries. We only need the zonal_stats function from this library.

...

Code Block
languager
process_shapefile <- function(sensor, NIR_band) {

...

Code Block
def process_shapefile(sensor, NIR_band):

...

Defines a function that takes the sensor (L30 or S30) and NIR_band ( B05 or B8A) as parameters.

...

Code Block
  setwd(base_dir)  
  tiles <- list.files()

...

Code Block
    os.chdir(base_dir)
    tiles = os.listdir()

...

Changes to the base directory and get a list of the tiles.

...

Code Block
  extract = fields %>% #set up the output data frame extractions
    st_drop_geometry() %>%
    dplyr::select()

...

Code Block
    extract = fields.iloc[:, 0:0]

...

Creates a dataframe with the number of rows in fields but with no columns.

Note the simple syntax in Python. The single colon : means “all rows.” 0:0 means “no columns.”

...

Code Block
  for(i in 1:length(tiles)) {

...

Code Block
    for i, tile in enumerate(tiles):

...

Creates a loop to iterate through the tiles.

Note that the Python code could be written as for tile in tiles. However, the i variable gives us the opportunity of showing the status of the program, as described later.

...

Code Block
    tile_dir <- file.path(base_dir, tiles[i])
    setwd(tile_dir)

...

Code Block
        tile_dir = os.path.join(base_dir, tile)
        os.chdir(tile_dir)

...

Changes to the directory of the current tile.
Note that the Python enumerate function allows us to grab both the tile and its index into the tiles list.
Therefore, we don’t need to refer to tiles[i], although that would work just as well.

...

Code Block
    scenes <- list.files(pattern = glob2rx(glue("HLS.{sensor}*")))

...

Code Block
        scenes = glob.glob(f"HLS.{sensor}*")

...

Grabs the files that match the pattern HLS.{sensor}.

Python’s f strings work similarly to R’s glue function.

...

Code Block
    for (j in 1:length(scenes)) {

...

Code Block
        for j, scene in enumerate(scenes):

...

Creates a loop to iterate through each tile's scenes.

...

Code Block
      print(glue("Processing scene {j} of {length(scenes)} in tile {i} of {length(tiles)}"))

...

Code Block
            print(f"Processing scene {j + 1} of {len(scenes)} in tile {i + 1} of {len(tiles)}")

...

Shows the current status.

...

Code Block
      scene_dir <- file.path(tile_dir, scenes[j])
      setwd(scene_dir)

...

Code Block
            scene_dir = os.path.join(tile_dir, scene)
            os.chdir(scene_dir)

...

Changes to the scene/date folder.

...

Code Block
      bands = list.files() #list rasters in the scene/date folder
      list_NIR <- bands[substring(bands, 36, 38) == NIR_band]
      list_B04 <- bands[substring(bands, 36, 38) == "B04"]

...

Code Block
            list_NIR = glob.glob(f"*.{NIR_band}.tif")[0]
            list_B04 = glob.glob("*.B04.tif")[0]

...

Grabs the filenames of the NIR and B04 bands.

Note how Python’s glob.glob function simplifies this.

...

Code Block
      rast.stk <- rast(c(list_B04, list_NIR))

...

Code Block
            rast_stk = [rasterio.open(list_B04), rasterio.open(list_NIR)]

...

Loads the raster stack of the B04 and NIR bands.

...

Code Block
      names(rast.stk) <- c(substring(list_B04, 1, 38), substring(list_NIR, 1, 38))

...

Code Block
            rast_stk_named = {name: rast for name, rast in zip(band_names, rast_stk)}

...

Sets the names of the raster stack based on file names.

...

Code Block
      list_cloud_mask <- which(substring(bands, 36, 38) == "Fma")
      cloud_mask <- rast(bands[list_cloud_mask])

...

Code Block
            list_cloud_mask = glob.glob("*.Fmask.tif")[0]
            with rasterio.open(list_cloud_mask) as cloud_mask_src:
                cloud_mask = cloud_mask_src.read(1).astype(float)

...

Code Block
      cloud_mask0 <- cloud_mask
      cloud_mask64 <- cloud_mask
      cloud_mask128 <- cloud_mask

...

Code Block
            cloud_mask0 = np.copy(cloud_mask)
            cloud_mask64 = np.copy(cloud_mask)
            cloud_mask128 = np.copy(cloud_mask)

...

The R code creates references to cloud_mask. So cloud_mask0, cloud_mask64, and cloud_mask128 essentially become “aliases” of cloud_mask.

This is an optimization technique, which prevents R from copying data that might not change.

However, the data does change in the following code. When that happens, R will automatically change the references to copies of cloud_mask.

The Python code makes this explicit by creating copies of cloud_mask to begin with.

...

Code Block
      cloud_mask0[cloud_mask0 != 0] <- NA
      cloud_mask64[cloud_mask64 != 64] <- NA
      cloud_mask128[cloud_mask128 != 128] <- NA

...

Code Block
            cloud_mask0[cloud_mask0 != 0] = np.nan
            cloud_mask64[cloud_mask64 != 64] = np.nan
            cloud_mask128[cloud_mask128 != 128] = np.nan

...

Sets values not matching the conditions to NaN.

...

Code Block
      cloud_mask_merge <- terra::merge(cloud_mask0, cloud_mask64, cloud_mask128, na.rm = TRUE)

...

Code Block
            stacked_masks = np.stack([cloud_mask128, cloud_mask64, cloud_mask0])
            cloud_mask_merge = np.nanmax(stacked_masks, axis=0)

...

Merges the cloud mask layers for 0, 64, and 128.

...

Code Block
      stk.mask <- mask(rast.stk, mask = cloud_mask_merge)

...

Code Block
            stk_mask = [np.where(~np.isnan(cloud_mask_merge), band, np.nan) for band in rast_stk]

...

Applies the cloud mask to the raster stack

...

Code Block
      ext <- exact_extract(stk.mask,fields %>% st_transform(., crs = crs(stk.mask)),'median')

...

Code Block
            fields_transformed = fields.to_crs(crs)
            ext = zonal_stats(fields_transformed, stk_mask.read(1), stats=['median'])

...

CONTINUE FROM HERE

...

Code Block
languager
fields = st_read("MDAEnrollment2023-2024_Buff30m.shp")

...

Code Block
languagepy
fields = gpd.read_file("MDAEnrollment2023-2024_Buff30m.shp")

...

Reads the buffered shapefile.

...

Code Block
plot(fields[1])

...

Code Block
fields.plot(edgecolor='black')

Plots the shapefile.

...

This section provides guidance for converting R programs to Python.

It assumes the user is already familiar with R syntax.