Theme
Description
A specific issue participants mentioned are the interpreter blocks in Jayvee data pipeline models such as TextFileInterpreter and CSVInterpreter, as seen here in task 2:
pipeline RescueStationPipeline {
HttpDataSource
->TextInterpreter
->CSVFileInterpreter
->ValuetypeValidator
->IsPubliclyFundedColumnAdder
->SQLiteSink;
block TextInterpreter oftype TextFileInterpreter {}
block CSVFileInterpreter oftype CSVInterpreter {
// properties omitted
}
// more block definitions
}These blocks are described as confusing and decreasing understanding because more than one block is needed to implement “one task” (S30).
A good balance regarding the level of abstraction for understanding seems to be one block per task, with a task being defined as what users expect in their mental model of a data pipeline (see also HU2.1 - Mental model of an ETL pipeline aligns with Jayvee). Technically users want to download binary data, interpret the data as a text file and then consider the content of the text file to be formatted according to the CSV standard. Practically they do not consider these very low-level steps and think of downloading a CSV file and using the data inside immediately.
Interpreter blocks are a negative example of low density of functionality effecting understanding. The lessons here are also relevant to PL3.4 - Scoped code (blocks, functions) should execute one operation, not multiple, clarifying that “one operation” means one task as users expect it.
Representative Quotes
I had some issues creating a jayvee pipeline at first, because of the fact that sometimes multiple pipeline steps are needed to execute ‘just’ one task (file interpreter etc).
- S30
Sometimes the blocks look similar, (extracting, interpreting) which is not quite intuitive what might be happening.
- S31