Overview
The Transposition User-Defined Function (UDF) is a tabular function that enables transposing data within ksqlDB. This function is versatile, providing two different variations to suit a range of data transformation needs.
Regarding the behavior of the UDF, for both variations, it’s important to be mindful of any additional columns in the initial row. You may find a detailed example here (example 3).
To get more details about this UDF directly within ksqlDB, you can use the command:
DESCRIBE FUNCTION IGRAFX_TRANSPOSITION;
To ensure any modifications to a STREAM are applied to all previously inserted data, set the offset configuration to the earliest position with this command:
SET 'auto.offset.reset'='earliest';
Variation 1¶
UDF Signature :
igrafxTransposition(input: util.List[Struct]): util.List[Struct]
Both the input and output structures are formatted as follows:
"STRUCT<TASK VARCHAR(STRING), TIME VARCHAR(STRING)>"
This variation is designed to explode a row’s columns, transforming each into multiple rows where each row contains the Task and its associated Timestamp.
You may check out the example to see how the UDF works.
Variation 2¶
UDF Signature :
igrafxTransposition(input: util.List[Struct], dateFormat: String, isStartInformation: Boolean, isTaskNameAscending: Boolean): util.List[Struct]
The input structure is formatted as follows:
"STRUCT<TASK VARCHAR(STRING), TIME VARCHAR(STRING)>"
The output structure is formatted as follows:
"STRUCT<TASK VARCHAR(STRING), START VARCHAR(STRING), STOP VARCHAR(STRING)>"
This function is designed to explode a row with multiple columns into multiple rows, each containing four columns: the case, the activity, the starting date, and the ending date.
The UDF requires the following parameters: * input : corresponds as for the first variation to the input row we want to explode * dateFormat : corresponds to the date format (for instance : for an activity having for date 12/01/2020, the date format is "dd/MM/yyyy" ) * isStartInformation : true indicates that the date associated to the activity corresponds to the beginning of the activity, and that we hence need to calculate the end of the activity. false indicates that the date corresponds to the end of the activity meaning we have to calculate its start date (calculations are made when possible in function of the dates of the other activities) * isTaskNameAscending : true indicates that in case of identical dates for two (or more) rows, the order of the rows is determined in an ascending manner according to the activity's name, while false means that the order is determined in a descending manner according to the activity's name
You may check out the example to see how the UDF works.