Structured Data Classification MCQ solution | TCS Fresco Play | Fresco Play
Disclaimer: The primary purpose of providing this solution is to assist and support anyone who are unable to complete these courses due to a technical issue or a lack of expertise. This website's information or data are solely for the purpose of knowledge and education.
Make an effort to understand these solutions and apply them to your Hands-On difficulties. (It is not advisable that copy and paste these solutions).
All Question of the MCQs Present Below for Ease Use Ctrl + F with the question name to find the Question. All the Best!
If you found answer for any of the questions is wrong. Please do mention in the comment section, could be useful for others. Thanks!
_________________________
Structured Data Classification Hands-on Solution
Which of the given hyper parameter(s), when increased may cause random forest to over fit the
data?
Answer : Depth of Tree
To view the first 3 rows of the dataset, which of the following commands are used?Download the
dataset
from:https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2
f487608c537c05e22e4b221/iris.csv to answer the question.
Answer : iris.head(3)
Pruning is a technique associated with
Answer : Decision tree
High classification accuracy always indicates a good classifier.
Answer : True
Categorical variables has
Answer : no logical order
Cross-validation technique will provide accurate results when the training set and the testing set are
from two different populations.
Answer : True
Let's assume, you are solving a classification problem with highly imbalanced class. The majority
class is observed 99% of times in the training data. Which of the following is true when your model
has 99% accuracy after taking the predictions on test data. ?
Answer : For imbalanced class problems, accuracy metric is not a good idea.
Email spam detection is an example of
Answer : supervised classification
A technique used to depict the performance in a tabular form that has 2 dimensions namely “actual”
and “predicted” sets of data.
Answer : Confusion Matrix
Choose the correct sequence for classifier building from the following:
Answer : Initialize -> Train - -> Predict-->Evaluate
The commonly used package for machine learning in python is
Answer : sklearn
A classifer that can compute using numeric as well as categorical values is
Answer : Decision Tree Classifier
Can we consider sentiment classification as a text classification problem?
Answer : yes
What kind of classification is the given case study(IRIS dataset)?Download the dataset from:
https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876
08c537c05e22e4b221/iris.csv to answer the question.
Answer : Multi class classification
Structured Data Classification Hands-on Solution
Ensemble learning is used when you build component classifiers that are more accurate and
independent from each other.
Answer : true
clustering is an example of
Answer : unsupervised classification
Model Tuning helps to increase the accuracy
Answer : True
Images and documents are examples of _________
Answer : Unstructured Data
Ordinal variables has
Answer : clear logical order
Which command is used to select all NUMERIC types in the dataset.Download the dataset from:
https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876
08c537c05e22e4b221/iris.csv to answer the question.
Answer : iris_num = iris_data.select_dtypes(include=[numpy.number])
The number of categorical attributes in the original dataset.Download the dataset from:
https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876
08c537c05e22e4b221/iris.csv to answer the question.
Answer : 3
Which classifier converges easily with less training data?
Answer : Naive Bayes Classifier
Imputing is a strategy to handle
Answer : Missing Values
classification where each data is mapped to more than one class is called
Answer : Binary Classification.
The fit(X, y) is used to
Answer : Train the Classifier
Supervised learning differs from unsupervised learning as supervised learning requires __________
Answer : Labeled data
Clustering is a supervised classification.
Answer : False
Select the correct option which directly achieve multi-class classification (without support of binary
classifiers).
Answer : K Nearest Neighbor
Structured Data Classification Hands-on Solution
The classification where each data is mapped to more than one class is called ___________
Answer : Multi Label Classification
Email spam data is an example of __________
Answer : unstructed Data
The most widely used package for machine learning in Python is _________
Answer : sklearn
Pruning is a technique associated with __________
Answer : dt
What does the command sentiment_analysis_data['label'].value_counts() return?
Answer : counts of unique values in the 'label' column
Select the pre-processing technique(s) from the following.
Answer : all
Which of the given hyper parameter, when increased, may cause random forest to over fit the data?
Answer : depth of tree
Select the correct statement about Nonlinear classification.
Answer : Kernel tricks are used by Nonlinear classifiers to achieve maximum-margin hyperplanes.
Choose the correct sequence for classifier building from the following.
Answer : Initialize -> Train - -> Predict-->Evaluate
What command should be given to tokenize a sentence into words?
Answer : from nltk.tokenize import word_tokenize, Word_tokens =word_tokenize(sentence)
Choose the correct sequence from the following.
Answer : Data Analysis -> PreProcessing -> Model Building--> Predict
The following are all classification techniques, except ___________
Answer : StratifiedShuffleSplit
The commonly used package for machine learning in python is
Answer : sklearn
How many new columns does the following command return?
Answer : iris_series = pd.get_dummies(iris['Species'])
Download the dataset from:
https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876
08c537c05e22e4b221/iris.csv to answer the question.
Answer : 3
Identify the command used to view the dataset SIZE and what is the value returned?Download the
dataset from:
https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876
08c537c05e22e4b221/iris.csv to answer the question.
Answer : iris.shape,(150,6) (Incorrect)
Which type of cross validation is used for imbalanced dataset?
Answer : K fold
Structured Data Classification Hands-on Solution
To view the first 3 rows of the dataset, which of the following commands are used?Download the
dataset from:
https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876
08c537c05e22e4b221/iris.csv to answer the question.
Answer : iris.head(3)
Naive Bayes Algorithm is useful for :
Answer : indepth analysis
A process used to identify data points that are simply unusual
Answer : Anomaly Detection
Is there a class imbalance problem in the given data set?
Answer : no
Which of the following is not a technique to process missing values?
Answer : One hot encoding
Images,documents are examples of
Answer : Unstructured Data
email spam detection is an example of
Answer : The count with unique values in the iris['species'] column
Choose the correct sequence for classifier building from the following:
Answer : Initialize -> Train -> Predict -> Evaluate
Imagine you have just finished training a decision tree for spam classication and it is showing
abnormal bad performance on both your training and test sets. Assume that your implementation
has no bugs. What could be reason for this problem.
Answer : All
Identify the structured data from the following.
Answer : Data from mySQL DB and Excel
True Negative is when the predicted instance and the actual is positive.
Answer : False
What does the command iris['species'].value_counts() return?Download the dataset
fromhttps://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f
487608c537c05e22e4b221/iris.csv to answer the question.
Answer : The count with unique values in the iris['species'] column
A process used to identify unusual data points is _________
Answer : Anomaly Detection
The following are techniques to process missing values, except _______
Answer : of the options
How many classes will the following command return?(target classes in the dataset) :
classes=list(iris['species'].unique())Download the dataset
fromhttps://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f
487608c537c05e22e4b221/iris.csv to answer the question.
Answer : 3
Cross-validation causes over-fitting.
Answer : False
True Positive is when the predicted instance and the actual instance is not negative.
Answer : True
What kind of classification is our case study 'Churn Analysis'?
Answer : Binary
Which command is used to identify the unique values of a column?
Answer : unique()
Which preprocessing technique is used to make the data gaussian with zero mean and unit variance?
Answer : Standardization
Structured Data Classification Hands-on Solution
Cross-validation technique is used to evaluate a classifier by dividing the data set into training set to
train the classifier and testing set to test the same.
Answer : True
What are the advantages of Naive Bayes?
Answer : Both the options
What kind of classification is the given case study (Iris dataset)?Download the dataset
fromhttps://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f
487608c537c05e22e4b221/iris.csv to answer the question.
Answer : Binary classification (Incorrect)
Let's assume you are solving a classification problem with a highly imbalanced class. The majority
class is observed 99% of the time in the training data. Which of the following is true when your model
has 99% accuracy after taking the predictions on test data?
Answer : For imbalanced class problems, the accuracy metric is not a good idea.
The cross-validation technique will provide accurate results when the training set and the testing set
are from two different populations.
Answer : False
Structured Data Classification Hands-on Solution
Post a Comment