Database

The PLANTS database

On Apr 14, 2023, Gerry Moore emailed the PLANTS 3 schema (dated Oct 17, 2018), along with an Excel output of the plant_characteristic table.

A few of the column names were different – for example, the schema had "anerobic_tolerance_extent_cd" while the Excel had "ANERB_TOLR_CD".

This wasn't surprising, because he had written,

It needs to be updated, as we have added some new data types (e.g., pollinator, ethnobotany) and made a few other minor tweaks.

This wasn't a problem at the time – I simply built the database based on the column names in the spreadsheet. However, there were a number of issues – specifically the lack of cultivars.

In September, we determined that we actually did have cultivar information – we just had to pull the data from three Excel files:

PLANTS_Characteristics_Data_2022.xlsx (column SYMBOL)
plant master.xlsx (columns plant_symbol and plant_master_id)
plant_growth_requirements.xlsx (columns plant_master_id and cultivar_name)

This was especially tricky, because PLANTS_Characteristics_Data_2022.xlsx had separate rows for each cultivar, but without the cultivar name.

So not only did I need to match SYMBOL → plant_symbol → plant_master_id → cultivar_name, but I also needed to match on the following columns – and even that wasn't straightforward:

PLANTS_Characteristics_Data_2022	plant_growth_requirements	Notes

PLANTS_Characteristics_Data_2022	plant_growth_requirements	Notes
TEMP_TOLR_MIN	temperature_tolerance_min
PRECIP_TOLR_MIN	precipitation_tolerance_min
PRECIP_TOLR_MAX	precipitation_tolerance_max
FROST_FREE_DAY_MIN	frost_free_days_min
PLNT_DEN_LOW	planting_density_min
ROOT_DPTH_MIN	root_depth_min
SOIL_PH_TOLR_MIN	soil_ph_tolerance_min
SOIL_ADP_C_TXT_IND	coarse_texture_soil_adaptable_ind	['No', 'Yes'] in PLANTS_Characteristics_Data_2022 corresponds to [0, 1] in plant_growth_requirements.
SOIL_ADP_F_TXT_IND	fine_texture_soil_adaptable_ind	”
SOIL_ADP_M_TXT_IND	medium_texture_soil_adaptable_ind	”
FIRE_TOLR_CD	fire_tolerance_id	['High', 'Low', 'Medium', 'None'] in PLANTS_Characteristics_Data_2022 corresponds to [1, 2, 3, 4] in plant_growth_requirements.
HEDG_TOLR_CD	hedge_tolerance_id	”
CACO3_TOLR_CD	caco3_tolerance_id	”
DRGHT_TOLR_CD	drought_tolerance_id	”
ANERB_TOLR_CD	anaerobic_tolerance_id	”
SLIN_TOLR_CD	salinity_tolerance_id	”

In addition, multiple cultivars would occasionally match on all columns. Monica Pokorny went to the trouble of differentiating them.

I now had everything needed to build VegSpec, except there was the nagging issue of how to get updates when the PLANTS database changed, and how to get our work back into the PLANTS database.

We eventually got a SQL dump of the database on Sep 27, 2023.

Unfortunately, everything changed. None of the columns are named the same. For example, what was originally grwth_prd_actv_cd was now active_growth, but that was based on a view that included the d_season table's season_id column.

To replicate the existing functionality of VegSpec, I now needed to do 42 JOINs on 18 tables: plant_master_tbl, plant_classifications_tbl, plant_duration, d_plant_duration, plant_growth_habit, d_plant_growth_habit, plant_morphology_physiology, plant_growth_requirements, plant_reproduction, plant_suitability_use, plant_location_characteristic, plant_location, d_plant_nativity, d_season, d_rate, d_extent, d_color, and d_foliage_porosity.

This is the code I came up with, which gives us this output.

Database transition challenges

The PLANTS database is in SQL Server. The schema and data were exported as .sql files, which were then imported into a PostgreSQL database.

SQL Server has unique SQL syntax, so I wrote a program to translate it into PostgreSQL syntax.

Some of the differences between the database systems are listed below.

CREATE TABLE differences

SQL Server	PostgreSQL	Notes

SQL Server	PostgreSQL	Notes
CREATE TABLE [dbo].[d_plant_reference_type](	CREATE TABLE plants3.d_plant_reference_type(	SQL Server uses square brackets to delimit columns. PostgreSQL uses double quotes, which is the ANSI standard. But quotes aren't necessary unless the columns have special characters such as spaces.
[plant_reference_type_id] [int] IDENTITY(1,1) NOT NULL,	plant_reference_type_id SERIAL PRIMARY KEY NOT NULL	In SQL Server, [int] IDENTITY(1,1) creates an auto-incrementing column. The PostgreSQL equivalent is SERIAL.
[plant_reference_media_type] [varchar](10) NULL,	plant_reference_media_type VARCHAR(10) NULL,
[plant_reference_source_type] [varchar](25) NULL,	plant_reference_source_type VARCHAR(25) NULL,
[active_record_ind] [bit] NOT NULL,	active_record_ind BOOLEAN NOT NULL,	In SQL Server, [bit] creates a Boolean (true/false) column. The PostgreSQL equivalent is BOOLEAN.
[creation_date] [datetime2](7) NOT NULL,	creation_date TIMESTAMP(6) NOT NULL,	In SQL Server, [datetime2](7) specifies a date/time column with 7 digits of fractional seconds (100 nanoseconds). The PostgreSQL equivalent to DATETIME2 is TIMESTAMP. PostgreSQL is limited to 6 digits of precision (microseconds), but I can't imagine why we'd need to go below that.
[created_by] [varchar](23) NULL,	created_by VARCHAR(23) NULL,
[last_change_date] [datetime2](7) NULL,	last_change_date TIMESTAMP(6) NULL,
[last_changed_by] [varchar](23) NULL,	last_changed_by VARCHAR(23) NULL);
CONSTRAINT [PK_REFERENCE_TYPE_ID1] PRIMARY KEY CLUSTERED ( [plant_reference_type_id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY]		Constraints aren't necessary for VegSpec purposes.

INSERT differences

SQL Server	PostgreSQL	Notes

SQL Server	PostgreSQL	Notes
INSERT [dbo].[d_plant_reference_type] (	INSERT INTO plants3.d_plant_reference_type (
[plant_reference_type_id],	plant_reference_type_id,
[plant_reference_media_type],	plant_reference_media_type,
[plant_reference_source_type],	plant_reference_source_type,
[active_record_ind],	active_record_ind,
[creation_date],	creation_date,
[created_by],	created_by,
[last_change_date],	last_change_date,
[last_changed_by]	last_changed_by
)	)
VALUES (	VALUES (
1,	1,
N'CD-ROM',	'CD-ROM',	"N" is required in SQL Server to specify a Unicode string. It's not needed in PostgreSQL.
N'database',	'database',
1,	true,	active_record_ind is a BIT column in SQL Server. BIT columns take on the values 1 (true) or 0 (false). PostgreSQL doesn't have BIT columns. The equivalent is Boolean. The translator determines if a column is a BIT column and changes it to true or false as appropriate. Note that MS Access uses Yes/No for a Boolean column, where negative 1 represents true and 0 represents false. Don't know why Microsoft doesn't like the word "Boolean," or why they can't be consistent across their own products.
CAST(N'2018-08-29T20:20:28.4733333' AS DateTime2),	'2018-08-29T20:20:28.4733333',	PostgreSQL doesn't have a DateTime2 type. But casting isn't even necessary here.
N'Original 3.0 Migration ',	'Original 3.0 Migration ',
CAST(N'2021-01-15T13:31:32.2716512' AS DateTime2),	'2021-01-15T13:31:32.2716512',
N'SQL ETL'	'SQL ETL'
)	);	Semicolons are needed at the end of PostgreSQL statements.

Decision Support Tools

Database

The PLANTS database

Database transition challenges

CREATE TABLE differences

INSERT differences

Related content