Installing Python and Jupyter Notebook
You can download and install Python from https://www.python.org/downloads/
Check Add python.exe to PATH, then Install Now.
To confirm installation, from a command prompt enter:
py --version python --version
Both commands should show the version that you downloaded.
Jupyter Notebook is a popular tool for running Python scripts, offering the added benefit of displaying plots directly within the interface. You can install it by running:
pip install notebook
You can then run:
jupyter lab /path/to/folder
Basic conversions
R | Python | |
---|---|---|
Arithmetic operators | x + y x - y x * y x / y x ^ y | x + y x - y x * y x / y x ** y |
The only difference is that R uses | ||
Comparison operators | # R and Python x == y x != y x > y x < y x >= y x <= y | |
No differences here. | ||
Logical operators | x && y x || y !x v1 & v2 v1 | v2 | import numpy as np x and y x or y not x np.logical_and(v1, v2) np.logical_or(v1, v2) |
Python doesn’t have built-in logical operations for vectors. numpy is a popular library for this purpose, but it’s not as clean as R’s syntax. | ||
Assignment operator |
|
|
R does accept the For example, my_function <- function(y = 3, x = 5) { print(paste(y, x)) } my_function(x = 10) # Passes 10 to the x parameter, and outputs "3 10" my_function(x <- 20) # Assigns 20 to the global x variable, which it passes to the y parameter. Outputs "20 5" To avoid confusion, always use | ||
Ranges | 1:10 seq(1, 20, 2) print(1:10) | range(1, 11) range(1, 21, 2) print(list(range(1, 11))) |
Ouch. Python’s code is not only longer, but it excludes the final number.
| ||
Conditional Statements | if (x > 5) { print("x is greater than 5") } else if (x >= 2) { print("x is between 2 and 5") } else { print("x is less than 2") } | if x > 5: print("x is greater than 5") elif x >= 2: print("x is between 2 and 5") else: print("x is less than 2") |
R requires the condition to be in parentheses. That would also work in Python, but it’s not required. R begins a block statement with an opening curly brace ( Python is the oddball here, using a colon ( This can take some getting used to, because very few computer languages have similar syntax. Note that the recommended indentation for Python is 4 spaces, but consistency is all that matters. | ||
For Loops | for (i in 1:5) { print(i) } | for i in range(1, 6): print(i) |
Note the similarity to the | ||
Functions | my_function <- function(x, y) { return(x + y) } | def my_function(x, y): return x + y |
In Python, the | ||
Indexing | my_list[1] | my_list[0] |
R uses 1-based indexing. Python uses 0-based indexing. R is actually the oddball here. Most modern languages use 0-based indexing. | ||
Slicing | my_list <- c(1, 2, 3, 4, 5) print(my_list[2:4]) # Outputs 2 3 4 | my_list = [1, 2, 3, 4, 5] print(my_list[1:4]) # Outputs [2, 3, 4] |
Because Python uses 0-based indexing, the second element is at position 1 instead of 2. Python supports similar | ||
Comments | # This is a comment | |
No difference. | ||
Libraries | library(dplyr) library(terra) library(sf) library(exactextractr) | import pandas as pd import rasterio import geopandas as gpd from rasterstats import zonal_stats |
For data manipulation and geospatial analysis, R and Python rely on external libraries. You can alias libraries in Python using You can selectively import functions from a Python library using |
Example conversions
The following examples demonstrate how to convert the R code at https://github.com/jjennewein/MDA_2023-24 into equivalent Python code.
0.field buffer.R | 0.field buffer.py |
---|---|
library(sf) | import geopandas as gpd |
Loads a library for handling geospatial data, including working with shapefiles. In the Python code, gpd is an alias for the geopandas library Using the alias simply reduces typing and improves code readability. | |
setwd(dirname(rstudioapi::getActiveDocumentContext()$path)) | |
RStudio defaults to a global working directory, which means file paths must be specified explicitly. Python defaults to the folder where the source code is located. Jupyter Notebook defaults to where you launch the notebook server.
The original file path was To avoid referencing This way, the R program can run without requiring the full path in filenames. | |
mda = st_read("MDAEnrollment2023-2024.shp") %>% st_transform(., crs = 32618) %>% st_buffer(., dist = -30) %>% dplyr::filter(!st_is_empty(.)) | mda = gpd.read_file("MDAEnrollment2023-2024.shp") \ .to_crs(epsg=32618) mda["geometry"] = mda.geometry.buffer(-30) mda = mda[~mda.geometry.is_empty] |
This code:
R uses In Python, a backslash ( Also, note that the dots in | |
st_write(mda, "MDAEnrollment2023-2024_Buff30m.shp", append=FALSE) | mda.to_file("MDAEnrollment2023-2024_Buff30m.shp") |
Writes the buffered data frame to a new shapefile. I added |
1.Extract_HLS_L30 .R | 1.Extract_HLS_L30 .py |
---|---|
library(dplyr) library(terra) library(sf) library(exactextractr) library(glue) | import pandas as pd import rasterio import geopandas as gpd import os import glob import numpy as np from rasterstats import zonal_stats |
Syntax is easy. Knowing which libraries to use – not so much! Here’s a quick description of each Python library used for this program:
| |
process_shapefile <- function(sensor, NIR_band) { | def process_shapefile(sensor, NIR_band): |
Defines a function that takes the | |
setwd(base_dir) tiles <- list.files() | os.chdir(base_dir) tiles = os.listdir() |
Changes to the base directory and get a list of the tiles. | |
extract = fields %>% #set up the output data frame extractions st_drop_geometry() %>% dplyr::select() | extract = fields.iloc[:, 0:0] |
Creates a dataframe with the number of rows in fields but with no columns. Note the simple syntax in Python. The single colon | |
for(i in 1:length(tiles)) { | for i, tile in enumerate(tiles): |
Creates a loop to iterate through the tiles. Note that the Python code could be written as | |
tile_dir <- file.path(base_dir, tiles[i]) setwd(tile_dir) | tile_dir = os.path.join(base_dir, tile) os.chdir(tile_dir) |
Changes to the directory of the current tile. | |
scenes <- list.files(pattern = glob2rx(glue("HLS.{sensor}*"))) | scenes = glob.glob(f"HLS.{sensor}*") |
Grabs the files that match the pattern Python’s f strings work similarly to R’s glue function. | |
for (j in 1:length(scenes)) { | for j, scene in enumerate(scenes): |
Creates a loop to iterate through each tile's scenes. | |
print(glue("Processing scene {j} of {length(scenes)} in tile {i} of {length(tiles)}")) | print(f"Processing scene {j + 1} of {len(scenes)} in tile {i + 1} of {len(tiles)}") |
Shows the current status. | |
scene_dir <- file.path(tile_dir, scenes[j]) setwd(scene_dir) | scene_dir = os.path.join(tile_dir, scene) os.chdir(scene_dir) |
Changes to the scene/date folder. | |
bands = list.files() #list rasters in the scene/date folder list_NIR <- bands[substring(bands, 36, 38) == NIR_band] list_B04 <- bands[substring(bands, 36, 38) == "B04"] | list_NIR = glob.glob(f"*.{NIR_band}.tif")[0] list_B04 = glob.glob("*.B04.tif")[0] |
Grabs the filenames of the NIR and B04 bands. Note how Python’s | |
rast.stk <- rast(c(list_B04, list_NIR)) | rast_stk = [rasterio.open(list_B04), rasterio.open(list_NIR)] |
Loads the raster stack of the B04 and NIR bands. | |
names(rast.stk) <- c(substring(list_B04, 1, 38), substring(list_NIR, 1, 38)) | rast_stk_named = {name: rast for name, rast in zip(band_names, rast_stk)} |
Sets the names of the raster stack based on file names. | |
list_cloud_mask <- which(substring(bands, 36, 38) == "Fma") cloud_mask <- rast(bands[list_cloud_mask]) | list_cloud_mask = glob.glob("*.Fmask.tif")[0] with rasterio.open(list_cloud_mask) as cloud_mask_src: cloud_mask = cloud_mask_src.read(1).astype(float) |
cloud_mask0 <- cloud_mask cloud_mask64 <- cloud_mask cloud_mask128 <- cloud_mask | cloud_mask0 = np.copy(cloud_mask) cloud_mask64 = np.copy(cloud_mask) cloud_mask128 = np.copy(cloud_mask) |
The R code creates references to This is an optimization technique, which prevents R from copying data that might not change. However, the data does change in the following code. When that happens, R will automatically change the references to copies of The Python code makes this explicit by creating copies of | |
cloud_mask0[cloud_mask0 != 0] <- NA cloud_mask64[cloud_mask64 != 64] <- NA cloud_mask128[cloud_mask128 != 128] <- NA | cloud_mask0[cloud_mask0 != 0] = np.nan cloud_mask64[cloud_mask64 != 64] = np.nan cloud_mask128[cloud_mask128 != 128] = np.nan |
Sets values not matching the conditions to NaN. | |
cloud_mask_merge <- terra::merge(cloud_mask0, cloud_mask64, cloud_mask128, na.rm = TRUE) | stacked_masks = np.stack([cloud_mask128, cloud_mask64, cloud_mask0]) cloud_mask_merge = np.nanmax(stacked_masks, axis=0) |
Merges the cloud mask layers for 0, 64, and 128. | |
stk.mask <- mask(rast.stk, mask = cloud_mask_merge) | stk_mask = [np.where(~np.isnan(cloud_mask_merge), band, np.nan) for band in rast_stk] |
Applies the cloud mask to the raster stack | |
ext <- exact_extract(stk.mask,fields %>% st_transform(., crs = crs(stk.mask)),'median') | fields_transformed = fields.to_crs(crs) ext = zonal_stats(fields_transformed, stk_mask.read(1), stats=['median']) |
CONTINUE FROM HERE | |
fields = st_read("MDAEnrollment2023-2024_Buff30m.shp") | fields = gpd.read_file("MDAEnrollment2023-2024_Buff30m.shp") |
Reads the buffered shapefile. | |
plot(fields[1]) | fields.plot(edgecolor='black') |
Plots the shapefile.
|
Add Comment