Overview
The iGrafx Sessions UDF is a tabular user-defined function that divides a collection of ordered lines into sessions. Each session is assigned a unique ID and represents a grouping of lines sharing a common attribute. Regular expressions (regex) are used to determine which lines belong to the same session, which lines start and end sessions, and which lines should be ignored.
To retrieve information about this UDF directly in ksqlDB, use the following command:
DESCRIBE FUNCTION IGRAFX_SESSIONS;
The UDF requires the following parameters:
- inputLines : Corresponds to the initial collection of rows
- ignorePattern : Regex describing the rows to ignore. Rows verifying this pattern won't be used for the sessions creation and won't be returned by the function
- groupSessionPattern : Regex to regroup lines having the same values for the specified columns. The session will be determined within these groups. For instance for lines with the following format :
timeStamp;userID;targetApp;eventType
and for the following pattern :
.\*;(.\*);.\*;(.\*)
The group of a row will be determined by concatenating its userId and eventType columns values (because those columns are into brackets in the Regex)
* startSessionPattern : Regex describing the lines that can be considered as a Start of a session
* endSessionPattern : Regex describing the lines that can be considered as End of a session
* sessionIdPattern : Regex informing about the parts of the lines that will be used to create the sessionId. For instance for lines with the following format :
timeStamp;userID;targetApp;eventType
and for the following pattern :
.\*;(.\*);(.\*);.\*
The sessionID will be created by concatenating the userId and targetApp columns (which are into brackets in the Regex) * isSessionIdHash : A sessionId is created according to the columns specified in the sessionIdPattern parameter. If isSessionIdHash is false, then the sessionId will only correspond to the concatenation of the values of the columns specified in sessionIdPattern. But if isSessionIdHash is true, the result of this concatenation is hashed to create the sessionId. The Hash function used is MD5 * isIgnoreIfNoStart : Boolean indicating if sessions that don't have a line matching the startSessionPattern are kept or not. If true, the corresponding sessions are not returned. If false, they are returned * isIgnoreIfNoEnd : Boolean indicating if sessions that don't have a line matching the endSessionPattern are kept or not. If true, the corresponding sessions are not returned. If false, they are returned
For more information about Regex follow this link.