Download our latest MNC Answers Application at Play Store. Download Now

Structured Data Classification MCQ solution | TCS Fresco Play | Fresco Play

Structured Data Classification MCQ solution | TCS Fresco Play | Fresco Play

Disclaimer: The primary purpose of providing this solution is to assist and support anyone who are unable to complete these courses due to a technical issue or a lack of expertise. This website's information or data are solely for the purpose of knowledge and education.

Make an effort to understand these solutions and apply them to your Hands-On difficulties. (It is not advisable that copy and paste these solutions).

All Question of the MCQs Present Below for Ease Use Ctrl + F with the question name to find the Question. All the Best!

If you found answer for any of the questions is wrong. Please do mention in the comment section, could be useful for others. Thanks!

_________________________

Structured Data Classification Hands-on Solution


Which of the given hyper parameter(s), when increased may cause random forest to over fit the

data?

Answer : Depth of Tree

To view the first 3 rows of the dataset, which of the following commands are used?Download the

dataset

from:https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2

f487608c537c05e22e4b221/iris.csv to answer the question.

Answer : iris.head(3)

Pruning is a technique associated with

Answer : Decision tree

High classification accuracy always indicates a good classifier.

Answer : True

Categorical variables has

Answer : no logical order

Cross-validation technique will provide accurate results when the training set and the testing set are

from two different populations.

Answer : True

Let's assume, you are solving a classification problem with highly imbalanced class. The majority

class is observed 99% of times in the training data. Which of the following is true when your model

has 99% accuracy after taking the predictions on test data. ?

Answer : For imbalanced class problems, accuracy metric is not a good idea.

Email spam detection is an example of

Answer : supervised classification

A technique used to depict the performance in a tabular form that has 2 dimensions namely “actual”

and “predicted” sets of data.

Answer : Confusion Matrix

Choose the correct sequence for classifier building from the following:

Answer : Initialize -> Train - -> Predict-->Evaluate

The commonly used package for machine learning in python is

 Answer : sklearn

A classifer that can compute using numeric as well as categorical values is

Answer : Decision Tree Classifier

Can we consider sentiment classification as a text classification problem?

Answer : yes

What kind of classification is the given case study(IRIS dataset)?Download the dataset from:

https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876

08c537c05e22e4b221/iris.csv to answer the question.

Answer : Multi class classification


Structured Data Classification Hands-on Solution


Ensemble learning is used when you build component classifiers that are more accurate and

independent from each other.

Answer : true

clustering is an example of

Answer : unsupervised classification

Model Tuning helps to increase the accuracy

Answer : True

Images and documents are examples of _________

Answer : Unstructured Data

Ordinal variables has

Answer : clear logical order

Which command is used to select all NUMERIC types in the dataset.Download the dataset from:

https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876

08c537c05e22e4b221/iris.csv to answer the question.

Answer : iris_num = iris_data.select_dtypes(include=[numpy.number])

The number of categorical attributes in the original dataset.Download the dataset from:

https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876

08c537c05e22e4b221/iris.csv to answer the question.

Answer : 3

Which classifier converges easily with less training data?

Answer : Naive Bayes Classifier

Imputing is a strategy to handle 

Answer : Missing Values

classification where each data is mapped to more than one class is called

Answer : Binary Classification.

The fit(X, y) is used to

Answer : Train the Classifier

Supervised learning differs from unsupervised learning as supervised learning requires __________

Answer : Labeled data

Clustering is a supervised classification.

Answer : False

Select the correct option which directly achieve multi-class classification (without support of binary

classifiers).

Answer : K Nearest Neighbor


Structured Data Classification Hands-on Solution



The classification where each data is mapped to more than one class is called ___________

Answer : Multi Label Classification

Email spam data is an example of __________

Answer : unstructed Data

The most widely used package for machine learning in Python is _________

Answer : sklearn

Pruning is a technique associated with __________

Answer : dt

What does the command sentiment_analysis_data['label'].value_counts() return?

Answer : counts of unique values in the 'label' column

Select the pre-processing technique(s) from the following.

Answer : all

Which of the given hyper parameter, when increased, may cause random forest to over fit the data?

Answer : depth of tree

Select the correct statement about Nonlinear classification.

Answer : Kernel tricks are used by Nonlinear classifiers to achieve maximum-margin hyperplanes.

Choose the correct sequence for classifier building from the following.

Answer : Initialize -> Train - -> Predict-->Evaluate

What command should be given to tokenize a sentence into words?

Answer : from nltk.tokenize import word_tokenize, Word_tokens =word_tokenize(sentence)

Choose the correct sequence from the following.

Answer : Data Analysis -> PreProcessing -> Model Building--> Predict

The following are all classification techniques, except ___________

Answer : StratifiedShuffleSplit

The commonly used package for machine learning in python is

Answer : sklearn

How many new columns does the following command return?

Answer : iris_series = pd.get_dummies(iris['Species'])

Download the dataset from:

https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876

08c537c05e22e4b221/iris.csv to answer the question.

Answer : 3

Identify the command used to view the dataset SIZE and what is the value returned?Download the

dataset from:

https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876

08c537c05e22e4b221/iris.csv to answer the question.

Answer : iris.shape,(150,6) (Incorrect)

Which type of cross validation is used for imbalanced dataset?

Answer : K fold


Structured Data Classification Hands-on Solution



To view the first 3 rows of the dataset, which of the following commands are used?Download the

dataset from:

https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876

08c537c05e22e4b221/iris.csv to answer the question.

Answer : iris.head(3)

Naive Bayes Algorithm is useful for :

Answer : indepth analysis 

A process used to identify data points that are simply unusual

Answer : Anomaly Detection

Is there a class imbalance problem in the given data set?

Answer : no

Which of the following is not a technique to process missing values?

Answer : One hot encoding

Images,documents are examples of

Answer : Unstructured Data

email spam detection is an example of

Answer : The count with unique values in the iris['species'] column

Choose the correct sequence for classifier building from the following:

Answer : Initialize -> Train -> Predict -> Evaluate

Imagine you have just finished training a decision tree for spam classication and it is showing

abnormal bad performance on both your training and test sets. Assume that your implementation

has no bugs. What could be reason for this problem.

Answer : All

Identify the structured data from the following.

Answer : Data from mySQL DB and Excel

True Negative is when the predicted instance and the actual is positive.

Answer : False

What does the command iris['species'].value_counts() return?Download the dataset

fromhttps://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f

487608c537c05e22e4b221/iris.csv to answer the question.

Answer : The count with unique values in the iris['species'] column

A process used to identify unusual data points is _________

Answer : Anomaly Detection

The following are techniques to process missing values, except _______

Answer : of the options

How many classes will the following command return?(target classes in the dataset) :

classes=list(iris['species'].unique())Download the dataset

fromhttps://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f

487608c537c05e22e4b221/iris.csv to answer the question.

Answer : 3

Cross-validation causes over-fitting.

Answer : False

True Positive is when the predicted instance and the actual instance is not negative.

Answer : True

What kind of classification is our case study 'Churn Analysis'?

Answer : Binary

Which command is used to identify the unique values of a column?

Answer : unique()

Which preprocessing technique is used to make the data gaussian with zero mean and unit variance?

Answer : Standardization


Structured Data Classification Hands-on Solution


Cross-validation technique is used to evaluate a classifier by dividing the data set into training set to

train the classifier and testing set to test the same.

Answer : True

What are the advantages of Naive Bayes?

Answer : Both the options

What kind of classification is the given case study (Iris dataset)?Download the dataset

fromhttps://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f

487608c537c05e22e4b221/iris.csv to answer the question.

Answer : Binary classification (Incorrect)

Let's assume you are solving a classification problem with a highly imbalanced class. The majority

class is observed 99% of the time in the training data. Which of the following is true when your model

has 99% accuracy after taking the predictions on test data?

Answer : For imbalanced class problems, the accuracy metric is not a good idea.

The cross-validation technique will provide accurate results when the training set and the testing set

are from two different populations.

Answer : False 

Structured Data Classification Hands-on Solution


______________________________

If you have any queries, please feel free to ask on the comment section.
If you want MCQs and Hands-On solutions for any courses, Please feel free to ask on the comment section too.

Please share and support our page!