Limited Time OfferFLAT 20% off & $20 bonus sign up. Order Now
New! Hire Essay Assignment Writer Online and Get Flat 20% Discount!!Order Now
To create imputation indicators for all imputed inputs, Indicator Variable Type has been set as Unique, and Indicator Variable Role has been set to Input (shown in Fig. 20)
The average squared error (ASE) of prediction is used to estimate the error in prediction of a model fitted using the training data shown below in Equation 1 (Atkinson, 1980)
…………………………………………Equation 1
Here is the ith observation in the validation data set and is its predicted value using the fitted model and n is the validation sample size. The ASE output SAS has been shown in Fig. 26. ASE for this model is 0.138587 (train data) and 0.137156 (validation data). In a modeling context, a good predictive model produces values that are close to these ASE values. An overfit model produces a smaller ASE on the training data but higher values on the validation and test data. An underfit model exhibits higher values for all data roles.
As per Fit statistics (Fig. 31), Tree 1 is the selected model, valid misclassification rate is lowest in Tree 1, i.e. 0.185. From Table 1, it can be denoted that, Kolmogorov-Smirnov Statistic and ROC Index area under the curve are effectively same for both the decision Tree 1 and Tree 2 model, and perform slightly better than regression model. In case of average squared error, Tree 1 and Tree 2 are also effectively same, and performs better than Regression model. Hence, it can be concluded that Tree 1 is better performer than other two models.
While the regression model estimates relationship among variables, it identifies key patterns in large data sets and is often used to determine how independent variables are related to the dependent variables, and to explore the forms of the relationships.
High dimension increases the risk of overfitting due to correlations between redundant variables, increases computation time, increases the cost and practicality of data collection for the model, and makes interpretation of the model difficult. For the organic data, misclassification rate is lowest in decision tree, and The ROC chart window shows that the both decision tree and regression model have good predictive accuracy. In this case, Decision Trees to consumer loyalty analysis will be valuable for predictive modeling to understand the consumer segments.
The supermarket’s objective is to develop loyalty model by whether customers have purchased any of the organic products. Hence the model needs to be fit in the real world.
Just getting things wrong Problem should be identified, without clear objective, model will be fail. In this business case, there were two target variables, TargetBuy, and TargetAmt. TargetAmt was just a product of TargetBuy, and TargetAmt was not a binary variable. Hence selection of target variable is one of the most important thing, the model could go wrong if TargetAmt be selected as Target Variable.
Overfitting For the training data, when model becomes more complex, with more leaves of the decision tree, due to more iterations of training for a neural network, it appears to be fit the training data. But, in actual scenario, it fits noise as well as signal. In this business case, the decision tree, Tree 2, misclassification rate (Train:0.1848) of the model is very marginally lower than the model with the decision tree, Tree 1 (Train: 0.1851) and average square error of the model with the decision tree, Tree 1 (Train: 0.1329) is lower than the model with the decision tree, Tree 2 (Train: 0.1330). Hence it can be said that tree with 29 leaves performs marginally better in terms of average square error and tree with 33 leaves performs marginally better in terms of misclassification rate. But as with the higher number of leaves complexity increases, a less complex and reliable tree i.e. Tree 1 will be more suitable for the model.
Sample bias For this analysis, this sample covers across 5 geographical regions and 13 television regions across the world. Hence, there will be different set of consumers and sample bias will not be present in the data.
Future not being like the past In this business case, the model has been created using the past data of consumers of super market, it will not always be true, that the consumer who purchased the product will buy in the future. Various extraneous may affect the loyalty and consumer’s purchase.
No matter how close the deadline is, you will find quick solutions for your urgent assignments.
All assessments are written by experts based on research and credible sources. It also quality-approved by editors and proofreaders.
Our team consists of writers and PhD scholars with profound knowledge in their subject of study and deliver A+ quality solution.
We offer academic help services for a wide array of subjects.
We care about our students and guarantee the best price in the market to help them avail top academic services that fit any budget.
15,000+ happy customers and counting!