Dynamic SQL queries with R

Dynamic SQL queries with R

Programmatic querying with dynamic schema, table and column names.

I’m looking into creating some functions to make it easier to carry out quality checks on our database tables. This involves a lot of similar queries across tables and I want the flexibility to change the column and table names, as well as modifying the WHEREclause.
I’m using SQL Server, where tables are referred to in a [database].[schema].[tablename] format.

Let’s say you want to run a simple query:

SELECT columns,
FROM yourschema.yourtablename;

Easy enough, but you might want to run it over several tables, and vary the columns.
You could write dynamic SQL, which can be messy, and usually frowned upon.
Or you could use R, and write code that is less messy, and feels a bit magical (even though it may still be frowned upon).

I’ve seen several examples of setting dynamic WHERE clauses, but I want dynamic schema names, table names, and column names too.

How do we do it?

First, you specify your SQL Server connection:

con <- DBI::dbConnect(odbc::odbc(), 
                      Driver = "SQL Server", 
                      Server = "SERVER", 
                      Database = "DB_NAME", 
                      Trusted_Connection = "True", 
                      Port = 1433)

Now let’s set up some variables, for the columns we need to SELECT, and some filtering variables for future use

vars <- c("columns", "you", "want", "to", "select")
date_var <- 'date_col'

start_date <- as.Date('2022-01-01')
today <- Sys.Date()

Now the crux of the problem is the SQL Server naming convention. If you are querying a schema other than dbo, then you may have problems with your queries parsing correctly.

So what can you do?

Option 1 - You can define the schema and table names separately and refer to them using glue_sql from the glue package

tablename <- "yourtablename"
schema_name <- "yourschema"

Your query now looks like this

query <- glue_sql(.con = con, "SELECT TOP(10) {`vars`*} FROM {`schema_name`}.{`tablename`} ")
DBI::dbGetQuery(con, query)

Note the use of {`vars`*}, which collapses the vars vector, separated by commas, so that it resembles the SELECT statement we would normally write in SQL.

Option 2 - Alternately, we can use DBI::SQL

tbl2 <- DBI::SQL("yourschema.yourtablename")
test <- glue::glue_sql(.con = con, "SELECT TOP(10) {`vars`*} FROM {`tbl2`}")
DBI::dbGetQuery(con, test)

The difference, with option 1, we refer to the variable (e.g. schema_name) whereas in option 2 we refer to the value of the variable (e.g. yourschema)

Both work, so take your pick.

With these basic queries working, we can continue to experiment, with the aim of using dplyr and purrr to query, wrangle and visualise the data all from the same tool.

As a further example, here’s how to add a WHERE clause:

query <- glue_sql(.con = con, "SELECT TOP (10) {`col_vars`*} FROM {`schema_name`}.{`tablename`}
                  WHERE {`schema_name`}.{`tablename`}.{`date_var`} BETWEEN {start_date} AND {today} ")  
data <- DBI::dbGetQuery(con, query)                  

Note that start_date and today do not have back ticks, e.g. we use {start_date} rather than {`start_date`}

Or we can assign another date variable, with DBI, and use that instead :

date_var2 <- DBI::SQL("date_col")

query2 <- glue_sql(.con = con, "SELECT TOP (10) {`col_vars`*} FROM {`schema_name`}.{`tablename`}
                  WHERE {`tbl2`}.{`date_var2`} BETWEEN {start_date} AND {today} ")

We can also jazz up the SQL itself - here I CAST an imaginary datetime column to date

query <- glue_sql(.con = con, "SELECT {`col_vars`*} 
                  FROM {`schema_name`}.{`tablename`}
                  WHERE CAST({`schema_name`}.{`tablename`}.{`date_var`} AS DATE) BETWEEN {start_date} AND {today}")

© 2016 - 2022. All rights reserved.