Tuesday, November 28, 2023
No Result
View All Result
AI CRYPTO BUZZ
  • Home
  • Bitcoins
  • Crypto
    • Altcoins
    • Ethereum
    • Crypto Exchanges
  • NFT
  • Blockchain
  • AI
  • ML
  • Cyber Security
  • Web3
  • Metaverse
  • DeFi
  • Analysis
Marketcap
  • Home
  • Bitcoins
  • Crypto
    • Altcoins
    • Ethereum
    • Crypto Exchanges
  • NFT
  • Blockchain
  • AI
  • ML
  • Cyber Security
  • Web3
  • Metaverse
  • DeFi
  • Analysis
Marketcap
No Result
View All Result
AI CRYPTO BUZZ
No Result
View All Result

How to Scale Your Data Pipelines and Data Products with Contract Testing and Dbt

October 25, 2023
in Artificial Intelligence
Reading Time: 7 mins read
0 0
A A
0
Home Artificial Intelligence
Share on FacebookShare on Twitter


First, we have to add two new dbt packages, dbt-expectations and dbt-utils, that can enable us to make assertions on the schema of our sources and the accepted values.

# packages.yml

packages:- bundle: dbt-labs/dbt_utilsversion: 1.1.1

– bundle: calogica/dbt_expectationsversion: 0.8.5

Testing the info sources

Let’s begin by defining a contract take a look at for our first supply. We pull information from raw_height, a desk that comprises top info from the customers of the health club app.

We agree with our information producers that we are going to obtain the peak measurement, the items for the measurements, and the consumer ID. We agree on the info varieties and that solely ‘cm’ and ‘inches’ are supported as items. With all this, we are able to outline our first contract within the dbt supply YAML file.

The constructing blocks

Trying on the earlier take a look at, we are able to see a number of of the dbt-unit-testing macros in use:

dbt_expectations.expect_column_values_to_be_of_type: This assertion permits us to outline the anticipated column information kind.accepted_values: This assertion permits us to outline an inventory of the accepeted values for a selected column.dbt_utils.accepted_range: This assertion permits us to outline a numerical vary for a given column. Within the instance, we anticipated the column’s worth to not be lower than 0.not null: Lastly, built-in assertions like ‘not null’ enable us to outline column constraints.

Utilizing these constructing blocks, we added a number of checks to outline the contract expectations described above. Discover additionally how we have now tagged the checks as “contract-test-source”. This tag permits us to run all contract checks in isolation, each domestically, and as we’ll see later, within the CI/CD pipeline:

dbt take a look at –select tag:contract-test-source

We now have seen how rapidly we are able to create contract checks for the sources of our dbt app, however what in regards to the public interfaces of our information pipeline or information product?

As information producers, we wish to be sure we’re producing information in accordance with the expectations of our information shoppers so we are able to fulfill the contract we have now with them and make our information pipeline or information product reliable and dependable.

A easy approach to make sure that we’re assembly our obligations to our information shoppers is so as to add contract testing for our public interfaces.

Dbt lately launched a brand new characteristic for SQL fashions, mannequin contracts, that enables to outline the contract for a dbt mannequin. Whereas constructing your mannequin, dbt will confirm that your mannequin’s transformation will produce a dataset matching up with its contract, or it can fail to construct.

Let’s see it in motion. Our mart, body_mass_indexes, produces a BMI metric from the load and top measure information we get from our sources. The contract with our supplier establishes the next:

Information varieties for every column.Consumer IDs can’t be nullUser IDs are at all times larger than 0

Let’s outline the contract of the body_mass_indexes mannequin utilizing dbt mannequin contracts:

The constructing blocks

Trying on the earlier mannequin specification file, we are able to see a number of metadata that enable us to outline the contract.

contract.enforced: This configuration tells dbt that we wish to implement the contract each time the mannequin is run.data_type: This assertion permits us to outline the column kind we expect to supply as soon as the mannequin runs.constraints: Lastly, the constraints block provides us the prospect to outline helpful constraints like {that a} column can’t be null, set major keys, and customized expressions. Within the instance above we outlined a constraint to inform dbt that the user_id should at all times be larger than 0. You may see all of the obtainable constraints right here.

A distinction between the contract checks we outlined for our sources and those outlined for our marts or output ports is when the contracts are verified an enforced.

Mannequin contracts are enforced when the mannequin is being generated by dbt run, whereas contracts based mostly on the dbt checks are enforced when the dbt checks run.

If one of many mannequin contracts isn’t glad, you will note an error while you execute ‘dbt run’ with particular particulars on the failure. You may see an instance within the following dbt run console output.

1 of 4 START sql desk mannequin dbt_testing_example.stg_gym_app__height ……….. [RUN]2 of 4 START sql desk mannequin dbt_testing_example.stg_gym_app__weight ……….. [RUN]2 of 4 OK created sql desk mannequin dbt_testing_example.stg_gym_app__weight …… [SELECT 4 in 0.88s]1 of 4 OK created sql desk mannequin dbt_testing_example.stg_gym_app__height …… [SELECT 4 in 0.92s]3 of 4 START sql desk mannequin dbt_testing_example.int_weight_measurements_with_latest_height [RUN]3 of 4 OK created sql desk mannequin dbt_testing_example.int_weight_measurements_with_latest_height [SELECT 4 in 0.96s]4 of 4 START sql desk mannequin dbt_testing_example.body_mass_indexes …………. [RUN]4 of 4 ERROR creating sql desk mannequin dbt_testing_example.body_mass_indexes …. [ERROR in 0.77s]

Completed working 4 desk fashions in 0 hours 0 minutes and 6.28 seconds (6.28s).

Accomplished with 1 error and 0 warnings:

Database Error in mannequin body_mass_indexes (fashions/marts/body_mass_indexes.sql)new row for relation “body_mass_indexes__dbt_tmp” violates verify constraint “body_mass_indexes__dbt_tmp_user_id_check1″DETAIL: Failing row comprises (1, 2009-07-01, 82.5, null, null).compiled Code at goal/run/dbt_testing_example/fashions/marts/body_mass_indexes.sql

Till now we have now a take a look at suite of highly effective contract checks, however how and when will we run them?

We are able to run contract checks in two kinds of pipelines.

CI/CD pipelinesData pipelines

For instance, you’ll be able to execute the supply contract checks on a schedule in a CI/CD pipeline focusing on the info sources obtainable in decrease environments like take a look at or staging. You may set the pipeline to fail each time the contract isn’t met.

These failures gives useful details about contract-breaking adjustments launched by different groups earlier than these adjustments attain manufacturing.



Source link

Tags: ContractDataDbtPipelinesProductsscaletesting
Previous Post

Frontier Model Forum updates

Next Post

How to Build a Multi-Modal Search App with Chroma?

Related Posts

‘Lookahead Decoding’: A Parallel Decoding Algorithm to Accelerate LLM Inference
Artificial Intelligence

‘Lookahead Decoding’: A Parallel Decoding Algorithm to Accelerate LLM Inference

November 28, 2023
Defending your voice against deepfakes
Artificial Intelligence

Defending your voice against deepfakes

November 28, 2023
entre las principales habilidades a desarrollar por los CXO para triunfar en la era de la IA generativa
Artificial Intelligence

entre las principales habilidades a desarrollar por los CXO para triunfar en la era de la IA generativa

November 28, 2023
LMQL — SQL for Language Models. Yet another tool that could help you… | by Mariya Mansurova | Nov, 2023
Artificial Intelligence

LMQL — SQL for Language Models. Yet another tool that could help you… | by Mariya Mansurova | Nov, 2023

November 27, 2023
New method uses crowdsourced feedback to help train robots | MIT News
Artificial Intelligence

New method uses crowdsourced feedback to help train robots | MIT News

November 27, 2023
Researchers from UC San Diego Introduce EUGENe: An Easy-to-Use Deep Learning Genomics Software
Artificial Intelligence

Researchers from UC San Diego Introduce EUGENe: An Easy-to-Use Deep Learning Genomics Software

November 26, 2023
Next Post
How to Build a Multi-Modal Search App with Chroma?

How to Build a Multi-Modal Search App with Chroma?

Bitcoin breaks horizontal resistance at the $32k area. The rally extended above $35k.

Bitcoin breaks horizontal resistance at the $32k area. The rally extended above $35k.

Mastercard explores partnerships with crypto wallets MetaMask, Ledger: CoinDesk

Mastercard explores partnerships with crypto wallets MetaMask, Ledger: CoinDesk

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Facebook Twitter Instagram Youtube RSS
AI CRYPTO BUZZ

The latest news and updates about the Cryptocurrency and AI Technology around the world... The AI Crypto Buzz keeps you in the loop.

CATEGORIES

  • Altcoins
  • Analysis
  • Artificial Intelligence
  • Bitcoins
  • Blockchain
  • Crypto Exchanges
  • Cyber Security
  • DeFi
  • Ethereum
  • Machine Learning
  • Metaverse
  • NFT
  • Web3

SITE MAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 AI Crypto Buzz.
AI Crypto Buzz is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • Bitcoins
  • Crypto
    • Altcoins
    • Ethereum
    • Crypto Exchanges
  • NFT
  • Blockchain
  • AI
  • ML
  • Cyber Security
  • Web3
  • Metaverse
  • DeFi
  • Analysis

Copyright © 2023 AI Crypto Buzz.
AI Crypto Buzz is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In