Machine Learning

Machine Learning



Machine Learning

The following section covers four machine learning steps,


  1. Classification Model Builder
  2. Prediction
  3. Build Model For Intent Classification And Entity Extraction
  4. Intent Classification And Entity Extraction
The first two Machine Learning plugin steps namely ‘Classification Model Builder’ and ‘Prediction’ let you build a classification model and then use this model for prediction. These plugin steps solve classification type problem where value to be predicted takes a set of discrete values as opposed to continues values (when the value being predicted is a continuous variable, such problem is called as regression problem). Below are a few examples where these steps can be used,
  1. Predict support group based on issue description
  2. Predict customer churn based on past customer data 
  3. Predict occupational class of the person being insured based on various attributes of the person
  4. Predict customers from your customer list which are likely to show interest in your new promotions

The last two Machine Learning plugin steps namely ‘Build Model for Intent Classification’ and Entity Extraction’ and ‘Intent Classification and Entity Extraction’ let you build a model for Intent Classification and Entity Extraction and then use this model for Intent Classification and Entity Extraction. Identification of Intent and entity has a huge variety of use cases in industry wherever there is a need to understand the intention behind the utterances from users and automate certain processes. 

Prerequisites:

  1. Get Python Setup zip(Python36.zip) file from AutomationEdge
  2. Extract Python36
  3. Add the following filepaths to the PATH Environment Variable as below: <path_till_python_directory>\Python36;<path_till_python_directory>\Python36\Scripts;
  4. Create symbolic link for spacy en_core_web_sm model:
  5. Traverse to below path:
  6. ..\Python36\lib\site-packages\spacy\data
  7. Delete folder/directory viz., en.
  8. On Command line execute the following command to create symbolic link: python -m spacy link en_core_web_sm en --force
  9. For older generation Pentium machines hardware specific Tensorflow libraries
  10. Microsoft Visual Studio C++ distribution specific to Windows OS
  11. Steps viz., Intent Entity Model Builder and Intent Entity Prediction use Tensorflow libraries. For compilation of these Tensorflow libraries, the processing machine’s processor requires, AVX –Instruction Set Extension support. You may refer your Processor manual to check for AVX support. (for e.g. in case of intel processor the details can be obtained from the following URL- https://ark.intel.com/content/www/us/en/ark.html#@Processors)

Classification Model Builder

Description

This step lets you build a classification model based on training data. One column or attribute of your data set can typically be considered as one feature. Features should ideally be independent. Features are also referred to as dimensions. Value which you want to predict is called label. This step can be used to build the model when features are either of Number type or String type or mixed. 

Configurations


Configuration Tab

No.

Field Name

Description


Row Handling

1

Step name

Used to specify the name of the step. The step name should be unique within the workflow.

2

Number of Rows to Process

Can have following two values.

  1. All
  2. Batch

Governs if all the rows of dataset are passed in one shot or they are batched. Typically if you are building model on a very large dataset, you can use Batch row processing.

3

Size

It has meaning only when Batch is selected for ‘Number of Rows to Process’. If your dataset has 50,000 rows, 1,000 can be a good batch size candidate.


Data Model Location

4

File name

Used to specify name and location of the file which will contain the model


Algorithm

5

Algorithm

Used to specify algorithm to be used for building the model. Step supports following algorithms

  1. Linear SVC
  2. SVC
  3. Decision Tree Classifier
  4. Random Forest Classifier
  5. Logistic Regression
  6. Multinomial NB
  7. SGD Classifier
  8. K Neighbors Classifier


6

Algorithm Parameters*

Based on the algorithm selected, corresponding algorithm parameters are shown. These are described in the last table of this plugin description.



Fields Tab

No.

Field Name

Description


Fields

1

Name

Name of the field

2

Incoming Type

Used to specify data type of the field. It can either be Number or String

3

Text Processing

All the classification algorithms work on vectors of numbers. Fields which are of type String need to be converted internally to numeric vectors and this cell lets you specify all the Text Processing attributes on that field. This cell can be clicked only for fields with String data type. Ensuing dialog when you click on it has two tabs. 

  1. 11.First tab lets you specify one or more text processing options.
  1. Remove punctuation: removes standard punctuation marks from the text
  2. Remove Stop Words: removes stop words like ‘the’, ‘as’, ‘in’ etc.
  3. Additional Stop Words: this lets you choose a simple text file where every additional stop word is there on a separate line. These are your domain specific stop words.
  4. Lemmatization: this converts words like mice to mouse, houses to house etc.
  5. Stemming: this gets stem of the word no matter what word form is used in the text. So going, went, goes etc. would be converted to go
  1. Second tab lets you Test your text processing options. In the text box next to ‘Value:’ you can type any text. Clicking on ‘Test’ button will give you the text in the text box next to ‘Result:’ taking into account text processing options you have selected.

When you are processing a feature of type string, as mentioned in ‘Text Processing’ section of above table, this feature needs to be converted into numeric features. Text Vectorization Tab governs how all string features get converted into numeric features. An n-gram is a contiguous sequence of n items from a given sample of text or speech. Table below shows how internally a string gets tokenized given different values of n-gram

No.

String

N Gram Start/End

Tokens

1

Weather today is good

1-1

'Weather', 'today', 'good'

2

Weather today is good

1-2

'Weather', 'today', 'good', 'Weather today', 'today good'

3

Weather today is good

1-3

'Weather', 'today', 'good', 'Weather today', 'today good', 'Weather today good'

4

Weather today is good

2-3

'Weather today', 'today good', 'Weather today good'

*is treated as stop word and not considered


Text Vectorization Tab

No.

Field Name

Description

1

N Gram start

Should be a numeric value with minimum of 1

2

N Gram end

Should be a numeric value greater than or equal to N Gram start

3

Vectorization

N-Gram operation tokenizes input string feature. Vectorization is the operation where these tokens are converted to numeric features which are needed by the algorithms. There are three types of vectorizers supported 




  1. Count Vectorizer: It counts the number of times a token shows up in the document and uses this value as its weight.
  2. Tfidf Vectorizer: TF-IDF stands for “term frequency-inverse document frequency”, meaning the weight assigned to each token not only depends on its frequency in a document but also how recurrent that term is in the entire corpora. 
  3. Hashing Vectorizer: It is designed to be as memory efficient as possible. Instead of storing the tokens as strings, the vectorizer applies the hashing trick to encode them as numerical indexes. The downside of this method is that once vectorized, the features’ names can no longer be retrieved.


Evaluation Tab

No.

Field Name

Description

1

Evaluation Type

Choose an Evaluation Algorithm Type from the drop down list as seen in the snapshot below,



  1. None – Choose None if Evaluation is not needed


  1. Train/Test Split – This Evaluation Algorithm splits the data into Train and Test as per parameters specified below.

The data we use is usually split into training data and test data. The training set contains a known output and the model learns on this data in order to be generalized to other data later on. We have the test dataset (or subset) in order to test our model’s prediction on this subset.


  1. Stratified k-Fold Cross-Validation – In this Evaluation Algorithm we split our data into k different subsets (or folds). We use k-1 subsets to train our data and leave the last subset (or the last fold) as test data. We then average the model against each of the folds and then finalize our model. After that we test it against the test set. 

2

Test Percentage

For Train/Test Split: 

Data Types allowed: default value float, int or None, optional (default=None)

  1. If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. 
  2. If int, represents the absolute number of test samples. 
  3. If None, it will be set to 0.25.

3

Number of Folds

For Stratified k-Fold Cross-Validation:

Data Types allowed: int, default=3

  1. Must be at least 2. Default value is 3.

4

Random State

For Train/Test Split:

Data Types allowed: int, RandomState instance or None, optional (default=None)

  1. If int, random_state is the seed used by the random number generator; 
  2. If RandomState instance, random_state is the random number generator; 
  3. If None, the random number generator is the RandomState instance used by np.random.

5

Shuffle

For Stratified k-Fold Cross-Validation:

Data Types allowed:  boolean, optional (default=True)

  1. Whether to shuffle each class’s samples before splitting into batches.

6

Evaluation Output File Name

Absolute html report output file path.



For Train/Test Split:



For Stratified k-Fold Cross-Validation:


7

Add output filename to result

Enable checkbox to display downloadable link of html report output file on AE portal.


*The following rows list the algorithms along with a description and snapshots of corresponding parameters. The right hand column has the description of these parameters.


Algorithm Description

Algorithm Parameter Description

1

Linear SVC

Firstly, by any chance if data is linearly separable in any dimension(s) of the features, undoubtedly, one should choose Linear SVM or Logistic Regression. Even though one might achieve similar results with the other complex algorithms, they are not recommended for two reasons; 1) Complexity often leads to more computation time 2) Overfitting

Linear SVM is an extremely fast machine learning (data mining) algorithm for solving multiclass classification problems from ultra large data sets. 




Loss: It specifies the loss function. ‘hinge’ is the standard SVM loss (used e.g. by the SVC class) while ‘squared_hinge’ is the square of the hinge loss. 

In machine learning, loss function measures the quality of your solution, while penalty function is mainly responsible to minimize the misclassification error (It imposes some constraints on your solution for regularization). 

C is the penalty parameter of error term. It maximizes the kernel margin while keeping the misclassification error minimum. C is 1 by default and it’s a reasonable default choice. It works well for the majority of the common datasets. If you have a lot of noisy observations in the data set you should decrease it. Lower the C value, better the results are for noisy data and exactly opposite in case of clean data.


max_iter (int, default=1000) is the maximum number of iterations to be run for convergence. 

2

SVC: 

The objective of a Linear (kernel) SVC (Support Vector Classifier) is to fit to the training data provided, returning a "best fit" hyperplane that divides, or categorizes, your training data. From there, after getting the hyperplane, you can then feed some features to your classifier to see what the "predicted" class is.


SVC Use Cases:

  1. Predicting if student passes/fails based on previous exam scores
  1. Used mainly with numerical data and when there is a need for high classification accuracy without compromising on efficiency.



Kernel (string, optional (default=’rbf’)) Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape (n_samples, n_samples).

Currently, the plugin supports ‘linear’, ‘poly’ and ‘rbf’ as explained below,

  1. Linear Kernel works well only when the data is linearly separable (in any dimension of feature space). This hyperplane which is a learned model can be used for prediction.
  2. RBF kernel of SVM especially might do a decent job in most of the other datasets that are non-linear. RBF is widely used kernel with Non Linear datasets.
  3. Poly kernel is suitable if data is separable by higher order functions.

Practical usage or benefits are pretty less. Hence it is not the most commonly used kernel.


C is the penalty parameter. It maximizes the margin while keeping the misclassification error minimum. C is 1 by default and it’s a reasonable default choice. It works well for the majority of the common datasets. If you have a lot of noisy observations in the data set you should decrease it. Lower the C value, better the results are for noisy data and exactly opposite in case of clean data.


Probability: This is a Boolean and optional. Choose True or False from the drop down list (default=False). It is about whether to enable probability estimates. This must be enabled prior to calling fit (Fit the SVM model according to the given training data). 

3

Decision Tree Classifier:  

It is one of the predictive modeling approaches used in machine learning. Decision tree learning uses a decision tree to go from observations about an item to conclusions about the item's target value. 


Decision Tree Classifier Use Cases:

  1. Decision Tree Classifier /Random Forest Classifier are predominantly used in recommendation systems/problem.

max_depth: It is an integer or None (default=None).  max_depth is optional. The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure.

4

Random Forest Classifier:

Random Forest Classifier is ensemble algorithm. Ensembled algorithms are those which combine more than one algorithms of same or different kind for classifying objects. 

Random Forest is a flexible, easy to use machine learning algorithm that produces, even without hyper-parameter tuning, a great result most of the time. It is also one of the most used algorithms, because it’s simplicity and the fact that it can be used for both classification and regression tasks.

RFC mainly overcomes some of the limitations that Decision Tree Classifiers has: 

  1. Only One tree and one decision for the entire data as well as feature set Overfitting. 
  2. Computational efficiency(not all cases) 
  3. Improper decision rules (in some cases)


Random Forest Classifier Use Cases:

  1. Decision Tree Classifier /Random Forest Classifier are predominantly used in recommendation systems/problems.
  2. Predicting the risk(high/low/medium) of a loan application
  3. Predicting social media share scores etc.

max_depth int or None, optional (default=None). It is the maximum depth of each tree in the Random Forest. If None, then nodes are expanded until all leaves are pure.

5

Logistic Regression:

A classification model that uses a sigmoid function to convert a linear model's raw prediction into a value between 0 and 1. You can interpret the value between 0 and 1 in either of the following two ways:


  1. As a probability that the example belongs to the positive class in a binary classification problem.
  2. As a value to be compared against a classification threshold. If the value is equal to or above the classification threshold, the system classifies the example as the positive class. Conversely, if the value is below the given threshold, the system classifies the example as the negative class



Logistic Regression Use Cases:

  1. Classifying words as nouns, pronouns, and verbs.
  2. Weather forecasting applications for predicting rainfall and weather conditions.

C is the penalty parameter. It maximizes the margin while keeping the misclassification error minimum. C is 1 by default and it’s a reasonable default choice. It works well for the majority of the common datasets. If you have a lot of noisy observations in the data set you should decrease it. Lower the C value, better the results are for noisy data and exactly opposite in case of clean data.



max_iter (int, default=1000) is the maximum number of iterations to be run for convergence.

6

Multinominal NB:

Naive Bayes: The Naive Bayes classifier is a simple probabilistic classifier which is based on Bayes theorem with strong and naïve independence assumptions.

MultinomialNB: A variant of Naive Bayes which is mainly used for text classification. This variation, estimates the conditional probability of a particular word/term/token given a class as the relative frequency of term t in documents belonging to class c.

The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf (term frequency–inverse document frequency) may also work.



Multinomial NB Use Cases:

  1. illness forecast
  2. Grouping information (blog posts etc.)

alpha (float, optional (default=1.0)) Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).


7

SGD Classifier:

Gradient descent is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost). Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

In situations when you have large amounts of data, you can use a variation of gradient descent called stochastic gradient descent.


Stochastic Gradient Descent (SGD) is a simple yet very efficient approach to discriminative learning of linear classifiers under convex loss functions such as (linear) Support Vector Machines and Logistic Regression. 





SGD Classifier Use Cases:

SGD has been successfully applied to large-scale and sparse machine learning problems often encountered in text classification and natural language processing.

max_iter (int, default=1000) is the maximum number of iterations to be run for convergence. 


In machine learning, loss function measures the quality of your solution, while penalty function is mainly responsible to minimize the misclassification error (It imposes some constraints on your solution for regularization). 


penalty: string, ‘l1’ or ‘l2’ (default=’l2’) Specifies the norm used in the penalization. The ‘l2’ penalty is the standard used in SVC. The ‘l1’ leads to coef_vectors that are sparse.


loss: It specifies the loss function. Options are hinge, log, modified_huber, squared_hinge, perception. 

8

K Neighbours Classifier:

KNN is not really a training algorithm. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions). 


In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space.




K Neighbors Classifier Use Cases:

  1. Retail analytics (Finding a similar product which customer is likely to buy or put in the basket).


n_neighbours: It defines the no. of nearest neighbors to be considered for prediction based on the distance.





Glossary:

  1. Loss: A measure of how far a model's predictions are from its label. Or, to phrase it more pessimistically, a measure of how bad the model is. To determine this value, a model must define a loss function. For example, linear regression models typically use mean squared error for a loss function, while logistic regression models use Log Loss.
  2. Penalty: A type of regularization that penalizes weights in proportion to the sum of the absolute values of the weights.
  3. Kernel: A classification algorithm that seeks to maximize the margin between positive and negative classes by mapping input data vectors to a higher dimensional space. For example, consider a classification problem in which the input dataset has a hundred features. To maximize the margin between positive and negative classes, a KSVM could internally map those features into a million-dimension space. KSVMs uses a loss function called hinge loss.
  4. Conversion: A convergence of a model's predictions to its labels.


Prediction

Description

Prediction step lets you predict the label based on the model built in ‘Classification Model Builder’ step.


Configurations

Model Tab

No.

Field Name

Description

1

Model File

Used to specify path of the model file built with ‘Classification Model Builder’ Step

2

Load Model

Used to load the model and show all the relevant information of the model, like Algorithm, Vectorization algorithm, N Gram, Model parameters. All these values are read-only and only show you the values you had selected during ‘Classification Model Builder’ step


Field Mapping Tab

No.

Field Name

Description

1

Feature

Feature name used during model building step

2

Type

Type of the feature, it can be either String or Number

3

Field

Field name you want to map to the corresponding feature. It is important you map right field to a feature.

4

Text Preprocessing

If type is String, preprocessing options to be used to process the string. This is explained in detail in ‘Classification Model Builder’ step.

5

Target Field

Used to specify field name where value of the predicted label will be put

6

Prediction Confidence

Used to indicate if you would also want prediction confidence. This field is clickable only when algorithm used for model building supports prediction confidence

7

Prediction Confidence for all classes

Used to indicate if you would also like prediction confidence for all the classes. Say possible prediction values are ‘A’, ‘B’ and ‘C’, clicking this field will give you prediction confidence for all these labels/classes. This field is clickable only when algorithm used for model building supports prediction confidence



Intent Entity Model Builder


Introduction:

Identification of Intent and entity has a huge variety of use cases in industry wherever there is a need to understand the intention behind the utterances from users and automate certain processes.

Following are the terminology used in this plugin.

Utterance: Anything the user says. For example, if a user types “What's the weather outside today in SanFrancisco ”, the entire sentence is the utterance.

Intent: An intent is the user’s intention. For example, if a user types “What's the weather outside today in San Francisco”, the user’s intent is to get the weather reports. Intents are given a name, often a verb and a noun, such as “getWeather”.

Entity: An entity modifies a intent. For example, if a user types “What's the weather outside today in San Francisco”, the entities are “today” and “San Francisco”. Entities are given a name, such as “dateTime” and “location”. Entities are sometimes referred to as slots.


Description

This step builds a model for Intent Classification and Entity Extraction.


Configurations

No.

Field Name

Description

1

Step name

Specify the name of the step. Step names should be unique within a workflow.


Input Fields:


2

Use custom configuration file to build model?

Select this checkbox to enable ‘Custom Configuration FileName’ field below to provide a custom configuration file to build the model.

3

Custom Configuration FileName

This field is editable if the checkbox Use custom configuration files to build model? Is selected.

A default configuration file is used to build the intent entity model. However, you may specify the path of a custom configuration file (.yml) here to build the model.

4

JSON Filename

Specify path of a JSON Filename containing Intent and Entities data.

Sample JSON file contents:

{

 "nlu_data": {

   "common_examples": [

 

 {

       "text": "i'm looking for a place to eat",

       "intent": "restaurant_search",

       "entities": []

     },

     {

       "text": "i'm looking for a place in the north of town",

       "intent": "restaurant_search",

       "entities": [

         {

           "start": 31,

           "end": 36,

           "value": "north",

           "entity": "location"

         }

   ]

 }

}

5

Button: Browse

Click to browse for a JSON filename.

6

Model Directory Name

Specify or Browse for a Directory for the built Model file.

7

Button: Browse

Click to browse for a Model Directory.


Output Field:


8

Model Directory Field Name

Specify a fieldname to hold the complete path of the model (including the directory and model filename). The default value is outputModelDirectoryFieldName.


Common Buttons:

No.

Field Name

Description


Buttons:


1

OK

On click of this button. It will check the field values.  If any required field values are missing then it will display validation error message.

If all the required field values are provided then it will save the field values.

2

Cancel

On click of this button, it will cancel the window and do not save any values



Intent Entity Prediction

Description

This step predicts Intent Classification and Entity Extraction based on the model built in ‘Build Model for Intent Classification and Entity Extraction’ step.


Configurations


Model Tab

No.

Field Name

Description

1

Step name

Specify the name of the step. Step names should be unique within a workflow.


Input Fields:


1

Model Directory Name

Specify path of the model file built with ‘Build model for Intent Classification and Entity Extraction’ Step

2

Button: Browse

Click to browse for a Model file.

3

Input Data to Parse

Specify an input data (string) to be parsed for Intent Classification and Entity.


Output Fields:


4

Intent Field Name

Specify a fieldname to hold the Intent Field Name. The default value is intent.

5

Show intent confidence?

Enable checkbox to enable the Intent Confidence field below.

6

Intent Confidence Field Name

Specify a fieldname to hold Intent Confidence. The default value of the field name is intentConfidence. 

7

Show Entities (in JSON format)?

Enable checkbox to enable the Entities field below.

8

Entities Field Name

Specify a fieldname to hold the Entities in JSON format. The default value of the field name is jsonEntities.

9

Show Intent Ranking (in JSON format)?

Enable checkbox to enable the Intent Ranking field below.

10

Intent Ranking Field Name

Specify a fieldname to hold the Intent Ranking in JSON format. All probable intents with confidence values (between 0 and 1), are generated in the JSON file. The default field name is jsonIntentRanking.


Common Buttons:

No.

Field Name

Description


Buttons:


1

OK

On click of this button. It will check the field values.  If any required field values are missing then it will display validation error message.

If all the required field values are provided then it will save the field values.

2

Cancel

On click of this button, it will cancel the window and do not save any values










      Links to better reach 

            Bot Store

             EPD