Grouped Tasks Example with the AggregationMain Connector
While defining the Column Mapping (when columnMapping.create equals true),
we can also optionally define, with the columnMapping.groupedTasksColumns property,
a set of columns to use for grouping events that have the same value (in the defined columns) together.
The events are grouped within their case, and grouped tasks aggregations need to be defined for the dimensions and metrics.
The idea of this functionality is to regroup several similar events into only one event.
Let's take the following events as an example :
| CaseId | Activity | StartDate | EndDate | Country | City | Price |
|---|---|---|---|---|---|---|
| 1 | A | 10/10/10 08:38 | 11/10/10 08:38 | France | Paris | 10 |
| 1 | B | 10/10/10 09:40 | 11/10/10 09:40 | Germany | Berlin | 20 |
| 1 | A | 10/10/10 10:42 | 11/10/10 10:42 | France | Toulouse | 30 |
| 1 | C | 10/10/10 11:50 | 11/10/10 11:50 | Germany | Munich | 10 |
| 1 | C | 10/10/10 12:50 | 11/10/10 12:50 | Germany | Hamburg | 20 |
| 2 | A | 10/10/10 08:20 | 11/10/10 08:20 | France | Rennes | 5 |
| 2 | B | 10/10/10 09:30 | 11/10/10 09:30 | Germany | Berlin | 10 |
| 2 | A | 10/10/10 10:40 | 11/10/10 10:40 | France | Bordeaux | 25 |
| 2 | A | 10/10/10 11:50 | 11/10/10 11:50 | USA | New York | 10 |
And let's say that the column mapping properties of the connector look like this :
columnMapping.create = "true",
columnMapping.caseIdColumnIndex = "0",
columnMapping.activityColumnIndex = "1",
columnMapping.timeInformationList = "{2;dd/MM/yy HH:mm},{3;dd/MM/yy HH:mm}",
columnMapping.dimensionsInformationList = "[{"columnIndex": 4, "name": "Country", "isCaseScope": false, "groupedTasksAggregation": "FIRST"},{"columnIndex": 5, "name": "City", "isCaseScope": false, "groupedTasksAggregation": "FIRST"}]",
columnMapping.metricsInformationList = "[{"columnIndex": 6, "name": "Price", "unit": "Euros", "isCaseScope": true, "aggregation": "MIN", "groupedTasksAggregation": "AVG"}]",
columnMapping.groupedTasksColumns = "[1, 4]"
Here the columns to use for grouping are the ones matching the indexes 1 and 4 which are respectively the Activity and Country columns.
They are defined through the columnMapping.groupedTasksColumns property.
When columnMapping.groupedTasksColumns is defined, we also need to define the groupedTasksAggregation argument for each dimension/metric.
With this example, here are the grouped tasks aggregations defined for the dimension and metric columns: * FIRST for the Country dimension column * FIRST for the City dimension column * AVG for the Price metric column
For the dimension columns the valid grouped tasks aggregation values are - FIRST - LAST
For the metric columns the valid grouped tasks aggregation values are - FIRST - LAST - MIN - MAX - SUM - AVG - MEDIAN
Consequently, within a case, all the events that have the same values for the Activity and Country columns will be grouped together,
and the new values for the dimension and metric columns are computed according to their related groupedTasksAggregation.
If the timestamp columns are not defined in the columns to use for grouping (here columns 2 and 3 are not defined in the columnMapping.groupedTasksColumns property),
we don't have to define an aggregation as for the dimension or metrics:
* The lowest timestamp of all the events of a group will be used as the new start timestamp of the new single event.
* The highest timestamp of all the events of a group will be used as the new end timestamp of the new single event.
After the creation of the connector, a Mining project that has the column mapping defined above will receive those events and will regroup some of them in the following way:
For CaseId 1: * The first and third events of this case have the same values for their Activity (A) and Country (France) columns. Consequently, they are grouped together to only make one event of activity A and of country France. * The second event is not grouped, as no other event in this case has an Activity named B and a Country named Germany. * The fourth and fifth events of this case have the same values for their Activity (C) and Country (Germany) columns. Consequently, they are grouped together to only make one event of activity C and of country Germany.
For CaseId 2: * The first and third events of this case have the same values for their Activity (A) and Country (France) columns. Consequently, they are grouped together to only make one event of activity A and of country France. * The second event is not grouped, as no other event in this case has an Activity named B and a Country named Germany. * The fourth event is not grouped, it has the same Activity (A) as the first and third events, but its Country (USA) is different.
After grouping the similar events together, it gives us this list of events:
| CaseId | Activity | StartDate | EndDate | Country | City | Price |
|---|---|---|---|---|---|---|
| 1 | A | 10/10/10 08:38 | 11/10/10 10:42 | France | Paris | 20 |
| 1 | B | 10/10/10 09:40 | 11/10/10 09:40 | Germany | Berlin | 20 |
| 1 | C | 10/10/10 11:50 | 11/10/10 12:50 | Germany | Munich | 15 |
| 2 | A | 10/10/10 08:20 | 11/10/10 10:40 | France | Rennes | 15 |
| 2 | B | 10/10/10 09:30 | 11/10/10 09:30 | Germany | Berlin | 10 |
| 2 | A | 10/10/10 11:50 | 11/10/10 11:50 | USA | New York | 10 |
For CaseId 1: * The first event of this case in the new list of events was created by grouping the first and third events of this case in the initial list of events (before grouping). * CaseId was 1 for the two events that were grouped, so it stays at 1 for the new single event. * Activity was A for the two events that were grouped, so it stays at A for the new single event. * StartDate was 10/10/10 08:38 for the first event that was grouped, and 10/10/10 10:42 for the second one. The lowest timestamp (10/10/10 08:38) is used as the start timestamp of the new single event. * EndDate was 11/10/10 08:38 for the first event that was grouped, and 11/10/10 10:42 for the second one. The highest timestamp (11/10/10 10:42) is used as the end timestamp of the new single event. * Country was France for the two events that were grouped, so it stays at France for the new single event. * City was Paris for the first event that was grouped, and Toulouse for the second one. In the column mapping, FIRST was defined as the groupedTasksAggregation for this dimension, consequently, as Paris is the first value to come, it is the one used for the new single event. * Price was 10 for the first event that was grouped, and 30 for the second one. In the column mapping, AVG was defined as the groupedTasksAggregation for this metric, consequently, 20 is the value of this metric for the new single event (20 being the result of the average of 10 and 30).
-
The second event of this case in the new list of events is identical to the second event of this case in the initial list of events (before grouping), as we couldn't group it with other events.
-
The third event of this case in the new list of events was created by grouping the fourth and fifth events of this case in the initial list of events (before grouping).
- CaseId was 1 for the two events that were grouped, so it stays at 1 for the new single event.
- Activity was C for the two events that were grouped, so it stays at C for the new single event.
- StartDate was 10/10/10 11:50 for the first event that was grouped, and 10/10/10 12:50 for the second one. The lowest timestamp (10/10/10 11:50) is used as the start timestamp of the new single event.
- EndDate was 11/10/10 11:50 for the first event that was grouped, and 11/10/10 12:50 for the second one. The highest timestamp (11/10/10 12:50) is used as the end timestamp of the new single event.
- Country was Germany for the two events that were grouped, so it stays at Germany for the new single event.
- City was Munich for the first event that was grouped, and Hamburg for the second one. In the column mapping, FIRST was defined as the groupedTasksAggregation for this dimension, consequently, as Munich is the first value to come, it is the one used for the new single event.
- Price was 10 for the first event that was grouped, and 20 for the second one. In the column mapping, AVG was defined as the groupedTasksAggregation for this metric, consequently, 15 is the value of this metric for the new single event (15 being the result of the average of 10 and 20).
For CaseId 2: * The first event of this case in the new list of events was created by grouping the first and third events of this case in the initial list of events (before grouping). * CaseId was 2 for the two events that were grouped, so it stays at 2 for the new single event. * Activity was A for the two events that were grouped, so it stays at A for the new single event. * StartDate was 10/10/10 08:20 for the first event that was grouped, and 10/10/10 10:40 for the second one. The lowest timestamp (10/10/10 08:20) is used as the start timestamp of the new single event. * EndDate was 11/10/10 08:20 for the first event that was grouped, and 11/10/10 10:40 for the second one. The highest timestamp (11/10/10 10:40) is used as the end timestamp of the new single event. * Country was France for the two events that were grouped, so it stays at France for the new single event. * City was Rennes for the first event that was grouped, and Bordeaux for the second one. In the column mapping, FIRST was defined as the groupedTasksAggregation for this dimension, consequently, as Rennes is the first value to come, it is the one used for the new single event. * Price was 5 for the first event that was grouped, and 25 for the second one. In the column mapping, AVG was defined as the groupedTasksAggregation for this metric, consequently, 15 is the value of this metric for the new single event (15 being the result of the average of 5 and 25). * The second event of this case in the new list of events is identical to the second event of this case in the initial list of events (before grouping), as we couldn't group it with other events. * The third event of this case in the new list of events is identical to the fourth event of this case in the initial list of events (before grouping), as we couldn't group it with other events.
This new list of events is then used as the data in the Mining project.
As a side note, if for the same initial list of events we don't want to group any events together, the column mapping should be:
columnMapping.create = "true",
columnMapping.caseIdColumnIndex = "0",
columnMapping.activityColumnIndex = "1",
columnMapping.timeInformationList = "{2;dd/MM/yy HH:mm},{3;dd/MM/yy HH:mm}",
columnMapping.dimensionsInformationList = "[{"columnIndex": 4, "name": "Country", "isCaseScope": false},{"columnIndex": 5, "name": "City", "isCaseScope": false}]",
columnMapping.metricsInformationList = "[{"columnIndex": 6, "name": "Price", "unit": "Euros", "isCaseScope": true, "aggregation": "MIN"}]"
Remarks regarding the Column Mapping for grouped tasks
- An error will appear at the creation of the connector if the
columnMapping.groupedTasksColumnsproperty is defined but doesn't contain at least one column index of a time or dimension or metric column. - An error will appear at the creation of the connector if the
columnMapping.groupedTasksColumnsproperty is defined but not the groupedTasksAggregation argument of all the dimensions and/or metrics. -
An error will appear at the creation of the connector if the
columnMapping.groupedTasksColumnsproperty is not defined but at least one dimension/metric defined its groupedTasksAggregation argument. -
If the
columnMapping.groupedTasksColumnsproperty is defined without the column index of the activity column, the connector will automatically add it to the set of grouped tasks columns indexes that is sent to the Mining. - If the
columnMapping.groupedTasksColumnsproperty is defined with the column index of the caseId column, the connector will automatically remove it from the set of grouped tasks columns indexes that is sent to the Mining.