Which of the following metrics measures the ‘goodness of fit’ of a regression model?
Mean absolute deviation
Root mean squared error
R-squared
The total sum of squared errors
R-squared
Which statement about outliers is true?
Outliers should be identified and removed from a dataset.
Outliers should be part of the training dataset but should not be present in the test data.
Outliers should be part of the test dataset but should not be present in the training data.
The nature of the problem determines how outliers are used.
The nature of the problem determines how outliers are used.
The average positive difference between computed and desired outcome values.
Root mean squared error
Mean squared error
Mean absolute error
Mean positive error
Mean absolute error
Which of the following is a common use of unsupervised clustering?
Detect outliers
Determine a best set of input attributes for supervised learning
Determine if meaningful relationships can be found in a dataset
All of a, b, and c are common uses of unsupervised clustering.
All of a, b, and c are common uses of unsupervised clustering.
What are the axes of an ROC curve?
Vertical axis: % of true negatives; Horizontal axis: % of false negatives
Vertical axis: % of true positives; Horizontal axis: % of false positives
Vertical axis: % of false negatives; Horizontal axis: % of false positives
Vertical axis: % of false positives; Horizontal axis: % of true negatives
Vertical axis: % of true positives; Horizontal axis: % of false positives
When or why should we use oversampling?
When the cost of failing rare events are low.
To de-emphasize rare events to the learning algorithm
When a dataset used for learning a predictive model for a binary response variable includes significantly more items with one choice of response than the other choice, and we seek to accurately predict both the choices.
When a dataset used for learning a predictive model, for a binary response variable includes roughly the same number of items for each choice.
When a dataset used for learning a predictive model for a binary response variable includes significantly more items with one choice of response than the other choice, and we seek to accurately predict both the choices.
In a military application domain, suppose we build a classifier for screening of terrorists (True means action has to be taken). Suppose that the confusion matrix is from testing the classifier on some test data. Which of the following situations would you like your classifier to have?
FP >> FN
FN >> FP
FN = FP × TP
TN >> FP
FP >> FN
Which is not included as a step when partitioning is done with oversampling.
Randomly select class 0 records to maintain the original ratio of class 0 to class 1 records for validation partition.
Randomly select class 0 records for training partition equal to no. of class 1 records.
Half the records from class 1 stratum are randomly selected into training partition.
Validate the models using the training set of class 0 to class 1 to maintain the accuracy of class 0 to class 1.
Validate the models using the training set of class 0 to class 1 to maintain the accuracy of class 0 to class 1.
Which statement is true about prediction problems?
The output attribute must be categorical.
The output attribute must be numeric.
The resultant model is designed to determine future outcomes.
The resultant model is designed to classify current behavior.
The resultant model is designed to determine future outcomes.
Why is asymmetric misclassification error considered as a part of performance metrics?
It changes the rule of classification, the cut off value in every field can be synchronised in that manner.
It shows the scenario of costs with analysing data, which can minimize loss and maximize profits.
It enhances the values for prediction, so accuracy increases averaging the error pruned through the datasets. 
It represents the processed data, that can be collected and models and various data visualization techniques can be applied.
It shows the scenario of costs with analysing data, which can minimize loss and maximize profits.