Statistics

Statistics

Statistics

Group by

Description

Group by is a step in the Statistics Plugin for Process Studio Workflows. This step groups rows over a specified field or a group of fields. Group by step requires a sorted input only. If the input is not sorted, only consecutive rows with same value for grouping field are handled correctly. 

Examples of common use cases are: calculate the total sales per region or get the number of students with 75% marks.

Configurations

No.

Field Name

Description

1

Step name

Specify the name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow.

2

Include all rows?

Enable if you want all rows in the output, not just the aggregation; to differentiate between the two types of rows in the output, a flag is required in the output. You must specify the name of the flag field in that case (the type is boolean).

3

Temporary files directory

Specify the directory in which the temporary files are stored (needed when the Include all rows option is enabled and the number or grouped rows exceed 5000 rows); the default is the standard temporary directory for the system

4

TMP-file prefix

Specify the file prefix used when naming temporary files

5

Add line number, restart in each group

Enable this checkbox to add a line number that restarts at 1 in each group

6

Line number field name

Enable to add a line number that restarts at 1 in each group

7

Always give back a row

If you enable this option, the Group By step will always give back a result row, even if there is no input row.  
This can be useful if you want to count the number of rows.  Without this option you would never get a count of zero (0). 

8

Group fields table

Click Get Fields to add all fields from the input stream(s).

  1. Group field: Specify the fields over which you want to group.

9

Aggregates table

Specify the fields that must be aggregated, the method and the name of the resulting new field. 

  1. Name: Specify the name you want this new field to be named on the stream
  2. Subject: Specify the fields which you want to aggregate.
  3. Type: Here are the available aggregation method types :
  1. Sum
  2. Name: Specify the name you want this new field to be named on the stream
  3. Subject: Specify the fields which you want to aggregate.
  4. Type: Here are the available aggregation method types :
  5. Sum
  6. Average (Mean)
  7. Median
  8. Percentile
  9. Minimum
  10. Maximum
  11. Number of values (N)
  12. Concatenate strings separated by , (comma)
  13. First non-null value
  14. Last non-null value
  15. First value (including null)
  16. Last value (including null)
  17. Cumulative sum (all rows option only!)
  18. Cumulative average (all rows option only!)
  19. Standard deviation
  20. Concatenate strings separated by <Value>: specify the separator in the Value column
  21. Number of distinct values 
  22. Number of rows (without field argument)

Memory Group by

Description

Memory Group by is a step in the Statistics Plugin for Process Studio Workflows. This step groups rows based on specified fields. This step builds aggregates in the same way as group by step. However, it does not require a sorted input since it processes all rows within memory. When the number of rows is too large to fit into memory, you need to use the combination of the Sort rows and Group by steps.

Configurations

No.

Field Name

Description

1

Step name

Specify the name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow.

2

Always give back a result row

If you enable this option, the Group By step will always give back a result row, even if there is no input row.  
This can be useful if you want to count the number of rows.  Without this option you would never get a count of zero (0).

3

The field that make up the group

Click Get Fields to add all fields from the input stream(s).

  1. Group field: Specify the fields over which you want to group.

4

Aggregates

Specify the fields that must be aggregated, the method and the name of the resulting new field. 

  1. Name: Specify the name you want this new field to be named on the stream
  2. Subject: Specify the fields which you want to aggregate.
  3. Type: Here are the available aggregation method types :
  1. Sum
  2. Average (Mean)
  3. Median
  4. Percentile
  5. Minimum
  6. Maximum
  7. Number of values (N)
  8. Concatenate strings separated by , (comma)
  9. First non-null value
  10. Last non-null value
  11. First value (including null)
  12. Last value (including null)
  13. Cumulative sum (all rows option only!)
  14. Cumulative average (all rows option only!)
  15. Standard deviation
  16. Concatenate strings separated by <Value>: specify the separator in the Value column
  17. Number of distinct values 
  18. Number of rows (without field argument) 

Output steps metrics

Description

‘Output steps metrics’ is a step in the Statistics Plugin for Process Studio Workflows. This step returns metrics of one or several steps in a workflow.

Configurations

General Tab: 

No.

Field Name

Description

1

Step name

Specify the name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow.

2

Copy Nr

Specify the copy number of the step. The default number of copies is 1, (i.e. the CopyNr equals 0). Leave it to the default value 0.

3

Required

Is step requires. Select Y/N.


Fields Tab: 

No.

Field Name

Description

1

Step name

Specify the name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow.

2

Step id

Specify the step instance ID

3

Lines input

Specify the fieldname to store the number of rows input

4

Lines output

Specify the fieldname to store the number of rows output

5

Lines read

Specify the fieldname to store the number of rows read

6

Lines updated

Specify the fieldname to store the number of rows updated

7

Lines written

Specify the fieldname to store the number of rows written

8

Lines rejected

Specify the fieldname to store the number of rows rejected

9

Duration(ms)

Specify the fieldname to store the execution time(ms)


Sample rows

Description

Sample rows, is a step in the Statistics Plugin for Process Studio Workflows. Sample rows step samples rows based on individual row numbers.  You can specify (one or more) individual row numbers or ranges.

Configurations

No.

Field Name

Description

1

Step name

Name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow.

2

The lines range

The range or ranges or row numbers.  You can separate the ranges or individual row numbers with commas.  Ranges are specified with two decimals between the row numbers, for example: 5..10

3

Line nr fieldname

Specify the name of the output field that will contain the line number.








      Links to better reach 

            Bot Store

             EPD