Skip toΒ content

BigQuery: Table with Constraints Custom Materialization

6 mins

GitHub Repo: https://gist.github.com/sungchun12/f7ea081773ae824a83294649530d6e41

View Comments and Reply

Transcript

Show Transcript

All right. They folks, this is sung speaking here. Um, what I'm going to do today is demo how to create required, not no columns within big query using a custom materialization called table with constraints.

And so let's just take a step back. What is the problem I'm trying to solve here in the first place?

It's one, how do I have DVT automatically create a table with the right DDL to determine if these two out of the 10 columns should be required in a big query context or in plainer language?

Not, not. And there's a couple of reasons why we care about this one. Is it proactively built in data integrity into your big query tables?

Um, too, apparently you can get more performing joins and query, uh, explain plans by having these required, not null columns within big query.

And so with that in mind, uh, I'll just walk you through exactly how this works. And so I'm just going to run this.

I'll explain what's happening behind the scenes, or essentially this is picking up This respective SQL file. And I set it to table with constraint, which is the custom materialization I made, and I'm just making a really generic table.

Okay. In addition to that, I add in some configs here within your Yammel file that you do for testing and documentation or add this thing called has constraints.

And that's where it picks up the data type and the constraint itself. So that will add the nominal, uh, kind of append logic into the DDL.

And that eventually runs code that looks like this, or it's going to create a replace this target, um, project data set, and table name.

And then from there, it will include all the DTL as appropriate, and then use the SQL select statement as the logic to actually create the table.

And I'm going to toggle to big query itself to show you that it does exactly that. And I can even rerun it A dozen times over to make sure that this works I'll even do a plus sign to run the things downstream of it.

Okay. And it looks like everything worked as expected, but I'm going to go over here and you notice the schema for this that has required, But for the same table, which is essentially just doing a copy and paste of the first example I showed you it's nullable.

And that's because, uh, I didn't add any Yammel can fix for this in order to tell DBT that that should be required in the first place.

So keep in mind the trade-off of having this functionality is the fact that you must, hardcode another trade-off to consider this.

If I add another generic column to this, That won't work, And I'm going to do a full refresh To drop this table and do it again, and this should air out as expected because you need to include another column within your schema file in order to make sure that this works.

So I'll do this Call. I'll make this a string. Heck I'll even add a constraint to it. I'm going to do this again this time.

I won't even have the full refresh. Let's see what happens from here. Oh, Hm. I need a comma. Let's try this again.

Okay. Let's talk, go back. And Shizam we added the other column. Remember, keep in mind that you need to hardcode everything in the schema in order to make sure this works properly so that DBT can pick up which columns are not known versus not Other thing to note, just explain this macro materialization logic as I make it specific to big query.

What I do is I say pick up this old relation, which is the database schema identifier in the context of big query it's project, dataset, table name, view name, and then I drop it and then replace it with the new target.

I set up some partition cluster, SQL header logic, and all of this is, um, consistent with what's existing in the big gray table materialization.

I just did a bunch of copy and pasting there. And then from there, big shout outs to Jeremy Cohen for writing a lot of this code that's making this work in the first place is within this call statement.

This is all the dynamic logic to create the DDL that you saw over here or not this one here. Okay, I'm going to go back to the macro.

And from here we create a replace the target table, and then we get the config and scope. We're actually let me do a side-by-side Critter place.

The table in scope We add in the DDL Options will be populated based off partitions and clusters as needed, And then inject the sequel.

That's part of the core DBT logic, and then you just return the relation. So, um, yeah, it knows where to create it and that's it.

Transcript

More than 25 million people across 400,000 companies choose Loom

My teammates and I love using Loom! It has saved us hundreds of hours by creating informative video tutorials instead of long emails or 1-on-1 trainings with customers.
Erica Goodell

Erica GoodellCustomer Success, Pearson

Loom creates an ongoing visual and audible experience across our business and enables our employees to feel part of a unified culture and company.
Tyson Quick

Tyson QuickCEO, Postclick

My new daily email habit. Begin writing an email. Get to the second paragraph and think 'what a time suck.' Record a Loom instead. Feel like 😎.
Kieran Flanagan

Kieran FlanaganVP of Marketing, HubSpot

Loom amplifies my communication with the team like nothing else has. It's a communication tool that should be in every executive's toolbox.
David Okuinev

David OkuinevCo-CEO, Typeform

My teammates and I love using Loom! It has saved us hundreds of hours by creating informative video tutorials instead of long emails or 1-on-1 trainings with customers.
Erica Goodell

Erica GoodellCustomer Success, Pearson

Loom creates an ongoing visual and audible experience across our business and enables our employees to feel part of a unified culture and company.
Tyson Quick

Tyson QuickCEO, Postclick