Integration Nation
SnapLogic’s Community
Discuss, learn, and share how to leverage the power of SnapLogic.
Problem: Train a model to: 1) distinguish between different species of the Iris flower based on four features; and 2) predict which passengers survived on the Titanic based on eight different features –all using the decision tree method.
Context: The decision tree is a simple yet powerful machine learning algorithm. It is easy to understand and has been in circulation for a long time.
Model type: Decision tree
What we did: In developing the machine learning model, we started with the k-fold cross-validation process, in which we first split the training dataset into k-chunks. We then trained the model on the k-1 chunks and evaluated the model on the last chunk. We repeated this process while computing the average accuracy of the model’s outputs. When the cross-validation results were satisfactory, we then trained the model on the whole training dataset.
In this demo, we have two datasets: the Iris Flower and Titanic. For the Iris Flower dataset, the model reads four flower measurements (a.k.a., inputs or features) to determine which species of Iris Flower is in question. The four inputs are: sepal length, sepal width, petal length, and petal width.
For the Titanic dataset, the model reads eight features about each passenger to determine whether a given passenger did or did not survive the sinking of the Titanic.
Choose a dataset below and then try cross-validating, training, and/or training and testing the decision tree model.