Deep Learning - A hyperparameter tuning game

Imagine one day you could look like alchemist to play magic or player in ball balance game to adventure.

To better understand the main point of this essay, it will be helpful to brush up some basics from another one Machine learning made easy. As for machine learning method, it usually includes six steps whether or not it is regression or classification model:

  1. Loading the data
  2. Create your model
  3. Training and test data
  4. Preprocessing the data
  5. Evaluating your model’s performance
  6. Tune your model

While for the deep learning, it will include more complex steps as follow:

No matter machine learning or more advanced deep learning algorithms, they all include a necessary step to re-tune the hyperparameters of the models being used. Hyperparameters are also known as human-tunable parameters but can be different when referring to various machine learning and artificial intelligence strategies. For examples:

  1. In the Random Forest (RF) classifier, each tree is only allowed to consider a randomly chosen subset of features at each decision split. Users can specify the number of decision trees in the ensemble, the number of features to consider at each split, and the minimum number of instances per region;
  2. In the Support Vector Machine (SVM) classifier, the binary classification is determined by the widest possible boundary between classes. Users can tune the parameter C, which defines the maximum number of misclassified instances allowed when maximizing the margin size;
  3. In the K nearest neighbour (kNN) classifier, it predicts the label of a given instance based upon the majority label of its k nearest neighbors. Users can tune the value of k to acquire the best results.

While in the more advanced deep learning models, more hyperparameters will be involved. For examples, Artificial neural network (ANN) can include input layer, hidden layers and output layer connected by lines, and in each layer they can have certain number of nodes, which looks like a neuron network. Deep learning is one type of machine learning methods based on neural network, automatically extract features without manually prepare the features to be trained. If the output is incorrect, the network can re-adjust the weights of nodes to improve performance (back-propagation). Users can have many hyperparameters to re-tune the model, for examples:

Sometimes, there are deep learning-specific parameters such as the models of convolutional neural network (CNN), recurrent neural network (RNN) and probabilistic neural network (PNN). CNN often used for image classification which utilize a convolution layer that preserves positional relationships between inputs (e.g. pixels in an image), thus capturing any dependencies among inputs. RNN discovers conditional dependencies of inputs by utilizing the output of previous inputs as features in classification, and can help model long range interactions. PNN estimates the probability distribution of each class, and the class with the highest posterior probability is assigned to each input.

Far beyond this, the selections of activation function (which decides whether a neuron should be activated or not by calculating the weighted sum and further adding bias to it, e.g., rectified linear unit (‘ReLU’) and ‘softmax’), optimizer (used to adjust the parameters for a model such as ‘adam’ or ‘sgd’. The purpose of an optimizer is to adjust model weights to maximize a loss function) and loss of function (which is a mathematical function that quantifies the difference between predicted and actual values in a machine learning model such as ‘mean_squared_error’ and ‘categorical_crossentropy’) can also impact the parameter selection of different Machine learning or deep learning models.

Fortunately, the new developed platforms (e.g., DataCamp), and tools like tensor flow and keras making it easier to use deep learning algorithms which were built-in as python library. Although the barrier has been lower to access the door, it is still challenging to optimize / re-tune the model to acquire best predictions. Although deep leaning require larger amount of training data and computational power comparing to other typical machine learning methods (such as, random forests (RF), support vector machine (SVM) k-nearest neighbors (kNN)), their accuracy evaluation methods are similar. For the classification model of deep learning or machine learning, Overall classification performance can be quantified by different metrics: TNR = true negative rate (also specificity), TPR = true positive rate (also sensitivity, recall), precision, accuracy, area under the receiver operating characteristic (AUROC) curve, PRC = precision-recall curve. For all of the above metrics, values closer to 1 indicate increasingly optimal performance. L is the positive class. TP = true positive, FP = false positive, TN = true negative, FN = false negative.

For the input data set may be split into a Training Set (usually 75–90% of the input data) and a Test Set. When a Test Set cannot be made, cross validation (CV) may be sufficient to estimate the algorithm’s error in classification of test instances. Here, five-fold CV is shown. In each fold, 20% of the input training data is randomly chosen to be the Validation Set. The model is trained using the training data, and its performance on new instances is determined through the Validation Set. Once the benchmark dataset has been spited into test data and train data, they will be fitted in different models to evaluate the performance. Since the data points in the test set may not be representative of the model's ability to generalize to unseen data. To combat this dependence on what is essentially an arbitrary split, we use a technique called cross-validation.

After knowing these performance formula and cross validation methods, it is important to understand the concepts of overfitting and underfitting, the former refers to your model will make accurate predictions on training data, but it will make inaccurate predictions on validation data and new datasets. Underfitting is the opposite. That is when your model fails to find important predictive patterns in the training data. Increasing the number of nodes in a hidden layer or add layers which can increase the model capacity.

References:

  1. Li, R., Li, L., Xu, Y. and Yang, J., 2022. Machine learning meets omics: applications and perspectives. Briefings in Bioinformatics, 23(1), p.bbab460.
  2. Mahood, E.H., Kruse, L.H. and Moghe, G.D., 2020. Machine learning: A powerful tool for gene function prediction in plants. Applications in Plant Sciences, 8(7), p.e11376.
  3. Soltis, P.S., Nelson, G., Zare, A. and Meineke, E.K., 2020. Plants meet machines: Prospects in machine learning for plant biology. Applications in Plant Sciences, 8(6).

<Last updated by Xi Zhang on Aug 14th, 2023>