Configuring the Adapters and Seeds
Core
and implememented by the cssXX.dasboards_store
project. If you don't implement them, the ETL will fail. Please refer to the Marts documentation to know which seeds and adapters are required by each one.Both the human_resources
and the education_services
marts use adapters and seeds to configure the ETL. As explained in the "Architecture" page, both are used to provided Schoolboards-specific business rules to the ETL. Seeds are csv files used to create dimensions later used to filter or map data. Adapters are SQL files with arbitrary SQL used to create dimensions or facts. We tend to prefer seeds over adapters as they are easier to maintain and to understand. But sometimes, you just need to write some SQL like it's the 80's again.
Seeds
Implementing a seed
Core
provide you with the seed's definition. It's up to you to populate-it. The seed definition is located in the core.dashboards_store/seeds/**/*/schema.yml
project, in the data
folder.To implement a seed, you have to create a csv
, with columns as per the seed's definition. The csv
file must be located in the cssXX.dashboards_store/seeds/**/*/data
folder. The csv
file must be named as per the seed's definition. The definition file gives a lot of information about the expected types of the seed's column, as well as a description of each field to help you populating it. Seeds schould be put under the same path as the one used by the seeds's definition yaml.
Once populated, you can load in the database and test the seed with the command :
dbt build --full-refresh --select <the name of the seed>
Exemple :
Let's say I want to implement a seed named exemple_seed
and defined by core.dashboards_store/seeds/marts/exemple/schema.yml
.
The content of the schema.yml
file is :
seeds:
- name: exemple_seed
description: >
This is the exemples seed.
This seed is a completely useless seed. It consists of a set any integer you can think of.
# The fields' description
columns:
- name: seq_value
description: The random integer
tests:
- not_null # Null values are not allowed
- unique # The integer must be unique
# The seed's data type
config:
column_types:
value: int
I first create an empty csv
and save-it under cssXX.dashboards_store/seeds/marts/exemple/exemple_seed.csv
. Note that the seed is saved in the cssXX
project folder, not in the core
project. The name of the file is the one indicated in the second line of the seed's definition (exemple_seed).
From the seed definition, I know that my implementation schould have only one column named seq_value
filled with a set of unique, not null, integers. I can now populate the csv
file with the following content :
seq_value
1
2
3
4
6
7
9
I then load, and test, my seed in the database. A table will be automatically created and populated with the content of the csv
file. The table will be named exemple_seed
. To do so, I just enter the following command :
dbt build --full-refresh --select exemple_seed
Adatpters
Mandatory adapters
The next table show the required adapters's definition and the path where the sql
schould be put.
Implementing an adapter
Core
can only provide you with the adapter's definition only. The concrete implementation is usually quite specific to your schoolboard.Adatpers are described in the adapters.yml
files. Thoose files are located in the core.dashboards_store/adapters/**/*/adapters.yml
project.
To implement an adapter, you have to create a sql
file, with arbitrary SQL. The sql
file schould be saved under in cssXX.dashboards_store/models/**/*/<adapter_name>.sql
folder. The exact path file's path schould be the one from the definition yml
suffixed by the name of the adapter. The sql
file must be named as per the adapter's definition.
The definition file gives a lot of information about the expected types of the seed's column, as well as a description of each field to help you populating it. Please refer to the definition file to get the column specification and description.
We try to add a dbt analysis
as an exemple of how to implement an adapter. When available, the analysis is saved under core.dashboards_store/analysis/**/*/<adapter_name>.sql
where **/*/<adapter_name>.sql
is the path from the adapter's definition file. The analysis is named as per the adapter's definition.
Exemple :
Let's say I have to implement the stg_ele_prescolaiore
adapter. This adapter is defined in core.dashboard_store/models/marts/education_services/adapters.yml
. The adapter's definition is :
# core.dashboards_store/models/marts/education_services/adapters.yml
sources:
- name: populations
description: >
The populations identify groups of students with consistent profiles, such as regular students, students in special education or students in general adult education.
schema: "{{ target.schema }}_educ_serv_staging"
tables:
- name: stg_ele_prescolaire
description: >
Identification of the population in 4-year-old kindergarten / passe partout / 5-year-old kindergarten.
columns:
- name: code_perm
description: The specific unique identifier for a student
- name: id_eco
description: The specific unique identifier for a school by year
- name: annee
description: The years the student was active
models:
- name: stg_ele_prescolaire
config:
schema: "educ_serv_staging"
<<: *x-common-tags
description: >
Identification of the population in 4-year-old kindergarten / passe partout / 5-year-old kindergarten.
tests:
- resolution:
combination_of_columns:
- code_perm
- id_eco
- annee
- I first create a file under
cssXX.dashboards_store/models/marts/education_services/stg_ele_prescolaire.sql
. Note that the file is created incssXX
with the same name as the one in the definition file. - I then check for the definition :
- From the definition (
source
key), I know that the adapter schould have the three columns :code_perm
,id_eco
,annee
- From the definition (
models
key), I know the table schould pass a resolution test on the three columns, meaning that my table schouln't have any duplicate on the three columns.
- From the definition (
- I then check under the
core.dashboards_store/analysis/marts/education_services/stg_ele_prescolaire.sql
file to see if there's an analysis available. In this case, there's one. I copy-paste the analysis in mysql
file, to bootstrap the adapter's implementation.
-- cssXX.dashboards_store/models/marts/education_services/stg_ele_prescolaire.sql
select distinct ele.code_perm, eco.id_eco, eco.annee
from {{ ref("i_gpm_e_dan") }} as eledan
left join {{ ref("i_gpm_t_eco") }} as eco on eledan.id_eco = eco.id_eco
left join
{{ ref("i_gpm_e_ele") }} as ele on eledan.fiche = ele.fiche
/*WHERE
eledan.statut_don_an = 'A' AND (
(
eledan.ordre_ens = '1'
AND eledan.grp_rep IN ('MA4','MA5','M41','M42')
)
OR (
eledan.ordre_ens = '2'
AND (eledan.grp_rep NOT LIKE '9%' OR eledan.grp_rep IS NULL)
)
)*/
- I then fine tune (or completely rewrite) the code I have just copy past to match my Schoolboard's reality.