Adapters and seeds
Configuring the Adapters and Seeds
Core and implemented by the cssXX.dashboards_store project. If you don't implement them, the ETL will fail. Please refer to the Marts documentation to know which seeds and adapters are required by each one.Both the human_resources and the educ_serv marts use adapters and seeds to configure the ETL. As explained in the "Architecture" page, both are used to provide school board-specific business rules to the ETL. Seeds are csv files used to create dimensions later used to filter or map data. Adapters are SQL files with arbitrary SQL used to create dimensions or facts. We tend to prefer seeds over adapters as they are easier to maintain and understand. But sometimes, you just need to write SQL.
Seeds
Implementing a seed
Core provides the seed definition. It is up to you to populate it. The seed definition is located in core.dashboards_store/seeds/**/*/schema.yml.To implement a seed, create a csv file with the columns defined by the seed contract. The csv file must be located in the matching path under cssXX.dashboards_store/seeds/**/*/ and must use the seed name defined in the YAML file. The definition file gives information about the expected column types and describes each field to help you populate it.
Once populated, you can load in the database and test the seed with the command :
dbt build --full-refresh --select <the name of the seed>
Example:
Let's say I want to implement a seed named example_seed and defined by core.dashboards_store/seeds/marts/example/schema.yml.
The content of the schema.yml file is :
seeds:
- name: example_seed
description: >
This is the example seed.
This seed is a deliberately useless seed. It consists of any set of integers you can think of.
# The fields' description
columns:
- name: seq_value
description: The random integer
tests:
- not_null # Null values are not allowed
- unique # The integer must be unique
# The seed's data type
config:
column_types:
value: int
I first create an empty csv and save it under cssXX.dashboards_store/seeds/marts/example/example_seed.csv. Note that the seed is saved in the cssXX project folder, not in the core project. The name of the file is the one indicated in the second line of the seed definition: example_seed.
From the seed definition, I know that my implementation should have only one column named seq_value filled with a set of unique, not null, integers. I can now populate the csv file with the following content :
seq_value
1
2
3
4
6
7
9
I then load, and test, my seed in the database. A table will be automatically created and populated with the content of the csv file. The table will be named exemple_seed. To do so, I just enter the following command :
dbt build --full-refresh --select example_seed
Adapters
Mandatory adapters
The next table shows the required adapter definitions and the paths where the sql files should be placed.
Implementing an adapter
Core can only provide you with the adapter definition. The concrete implementation is usually quite specific to your school board.Adapters are described in adapters.yml files. Those files are located under core.dashboards_store/models/**/adapters.yml.
To implement an adapter, create a sql file with arbitrary SQL. The sql file should be saved under cssXX.dashboards_store/models/**/*/<adapter_name>.sql. The exact file path should match the path from the YAML definition, suffixed by the name of the adapter.
The definition file gives information about the expected column types and describes each field. Please refer to the definition file for the column specification and description.
We try to add a dbt analysis as an example of how to implement an adapter. When available, the analysis is saved under core.dashboards_store/analyses/**/*/<adapter_name>.sql, where **/*/<adapter_name>.sql is the path from the adapter definition file.
Example:
Let's say I have to implement the stg_ele_prescolaire adapter. This adapter is defined in core.dashboards_store/models/marts/educ_serv/adapters.yml. The adapter's definition is :
# core.dashboards_store/models/marts/educ_serv/adapters.yml
sources:
- name: populations
description: >
The populations identify groups of students with consistent profiles, such as regular students, students in special education or students in general adult education.
schema: "{{ target.schema }}_educ_serv_staging"
tables:
- name: stg_ele_prescolaire
description: >
Identification of the population in 4-year-old kindergarten / passe partout / 5-year-old kindergarten.
columns:
- name: code_perm
description: The specific unique identifier for a student
- name: id_eco
description: The specific unique identifier for a school by year
- name: annee
description: The years the student was active
models:
- name: stg_ele_prescolaire
config:
schema: "educ_serv_staging"
<<: *x-common-tags
description: >
Identification of the population in 4-year-old kindergarten / passe partout / 5-year-old kindergarten.
tests:
- resolution:
combination_of_columns:
- code_perm
- id_eco
- annee
- I first create a file under
cssXX.dashboards_store/models/marts/educ_serv/stg_ele_prescolaire.sql. Note that the file is created incssXXwith the same name as the one in the definition file. - I then check the definition:
- From the definition (
sourcekey), I know that the adapter should have the three columns:code_perm,id_eco,annee. - From the definition (
modelskey), I know the table should pass a resolution test on the three columns, meaning that my table should not have duplicates on the three columns.
- From the definition (
- I then check under
core.dashboards_store/analyses/marts/educ_serv/stg_ele_prescolaire.sqlto see whether an analysis is available. In this case, there is one. I copy and adapt the analysis in mysqlfile to bootstrap the adapter implementation.
-- cssXX.dashboards_store/models/marts/educ_serv/stg_ele_prescolaire.sql
select distinct ele.code_perm, eco.id_eco, eco.annee
from {{ ref("i_gpm_e_dan") }} as eledan
left join {{ ref("i_gpm_t_eco") }} as eco on eledan.id_eco = eco.id_eco
left join
{{ ref("i_gpm_e_ele") }} as ele on eledan.fiche = ele.fiche
/*WHERE
eledan.statut_don_an = 'A' AND (
(
eledan.ordre_ens = '1'
AND eledan.grp_rep IN ('MA4','MA5','M41','M42')
)
OR (
eledan.ordre_ens = '2'
AND (eledan.grp_rep NOT LIKE '9%' OR eledan.grp_rep IS NULL)
)
)*/
- I then fine-tune or completely rewrite the code I have just copied to match my school board's reality.
Linking the databases
Enabling a resource
By default, the Store does not materialize anything. You need to enable the models you want to materialize. We do this to avoid materializing data you do not need: if you are interested in only one dashboard, then you do not need to materialize the whole core_dashboards_store.
