Installing Python and Jupyter Notebook
You can download and install Python from https://www.python.org/downloads/
Check Add python.exe to PATH, then Install Now.
To confirm installation, from a command prompt enter:
py --version python --version
Both commands should show the version that you downloaded.
Jupyter Notebook is a popular tool for running Python scripts, offering the added benefit of displaying plots directly within the interface. You can install it by running:
pip install notebook
You can then run:
jupyter lab /path/to/folder
Basic conversions
R | Python | |
---|---|---|
Arithmetic operators | x + y x - y x * y x / y x ^ y | x + y x - y x * y x / y x ** y |
The only difference is that R uses | ||
Comparison operators | # R and Python x == y x != y x > y x < y x >= y x <= y | |
No differences here. | ||
Logical operators | x && y x || y !x v1 & v2 v1 | v2 | import numpy as np x and y x or y not x np.logical_and(v1, v2) np.logical_or(v1, v2) |
Python doesn’t have built-in logical operations for vectors. numpy is a popular library for this purpose, but it’s not as clean as R’s syntax. | ||
Assignment operator |
|
|
R does accept the For example, my_function <- function(y = 3, x = 5) { print(paste(y, x)) } my_function(x = 10) # Passes 10 to the x parameter, and outputs "3 10" my_function(x <- 20) # Assigns 20 to the global x variable, which it passes to the y parameter. Outputs "20 5" To avoid confusion, always use | ||
Ranges | 1:10 seq(1, 20, 2) print(1:10) | range(1, 11) range(1, 21, 2) print(list(range(1, 11))) |
Ouch. Python’s code is not only longer, but it excludes the final number.
| ||
Conditional Statements | if (x > 5) { print("x is greater than 5") } else if (x >= 2) { print("x is between 2 and 5") } else { print("x is less than 2") } | if x > 5: print("x is greater than 5") elif x >= 2: print("x is between 2 and 5") else: print("x is less than 2") |
R requires the condition to be in parentheses. That would also work in Python, but it’s not required. R begins a block statement with an opening curly brace ( Python is the oddball here, using a colon ( This can take some getting used to, because very few computer languages have similar syntax. Note that the recommended indentation for Python is 4 spaces, but consistency is all that matters. | ||
For Loops | for (i in 1:5) { print(i) } | for i in range(1, 6): print(i) |
Note the similarity to the | ||
Functions | my_function <- function(x, y) { return(x + y) } | def my_function(x, y): return x + y |
In Python, the | ||
Indexing | my_list[1] | my_list[0] |
R uses 1-based indexing. Python uses 0-based indexing. R is actually the oddball here. Most modern languages use 0-based indexing. | ||
Slicing | my_list <- c(1, 2, 3, 4, 5) print(my_list[2:4]) # Outputs 2 3 4 | my_list = [1, 2, 3, 4, 5] print(my_list[1:4]) # Outputs [2, 3, 4] |
Because Python uses 0-based indexing, the second element is at position 1 instead of 2. Python supports similar | ||
Comments | # This is a comment | |
No difference. | ||
Libraries | library(dplyr) library(terra) library(sf) library(exactextractr) | import pandas as pd import rasterio import geopandas as gpd from rasterstats import zonal_stats |
For data manipulation and geospatial analysis, R and Python rely on external libraries. You can alias libraries in Python using You can selectively import functions from a Python library using |
Example conversions
The following examples demonstrate how to convert the R code at https://github.com/jjennewein/MDA_2023-24 into equivalent Python code.
0.field buffer.R | 0.field buffer.py |
---|---|
library(sf) | import geopandas as gpd |
Loads a library for handling geospatial data, including working with shapefiles. In the Python code, gpd is an alias for the geopandas library Using the alias simply reduces typing and improves code readability. | |
setwd(dirname(rstudioapi::getActiveDocumentContext()$path)) | |
RStudio defaults to a global working directory, which means file paths must be specified explicitly. Python defaults to the folder where the source code is located. Jupyter Notebook defaults to where you launch the notebook server.
The original file path was To avoid referencing This way, the R program can run without requiring the full path in filenames. | |
mda = st_read("MDAEnrollment2023-2024.shp") %>% st_transform(., crs = 32618) %>% st_buffer(., dist = -30) %>% dplyr::filter(!st_is_empty(.)) | mda = gpd.read_file("MDAEnrollment2023-2024.shp") \ .to_crs(epsg=32618) mda["geometry"] = mda.geometry.buffer(-30) mda = mda[~mda.geometry.is_empty] |
This code:
R uses In Python, a backslash ( Also, note that the dots in | |
st_write(mda, "MDAEnrollment2023-2024_Buff30m.shp", append=FALSE) | mda.to_file("MDAEnrollment2023-2024_Buff30m.shp") |
Writes the buffered data frame to a new shapefile. I added |
1.Extract_HLS_L30 .R | 1.Extract_HLS_L30 .py |
---|---|
library(dplyr) library(terra) library(sf) library(exactextractr) | import pandas as pd import rasterio import geopandas as gpd from rasterstats import zonal_stats |
Syntax is easy. Knowing which libraries to use – not so much! | |
setwd(dirname(rstudioapi::getActiveDocumentContext()$path)) | |
Explained previously. | |
fields = st_read("MDAEnrollment2023-2024_Buff30m.shp") | fields = gpd.read_file("MDAEnrollment2023-2024_Buff30m.shp") |
Reads the buffered shapefile. | |
plot(fields[1]) | fields.plot(edgecolor='black') |
Plots the shapefile.
|
Add Comment