Unstructured Data Classification MCQ solution | TCS Fresco Play

Disclaimer: The primary purpose of providing this solution is to assist and support anyone who are unable to complete these courses due to a technical issue or a lack of expertise. This website's information or data are solely for the purpose of knowledge and education.

Make an effort to understand these solutions and apply them to your Hands-On difficulties. (It is not advisable that copy and paste these solutions).

All Question of the MCQs Present Below for Ease Use Ctrl + F with the question name to find the Question. All the Best!

If you found answer for any of the questions is wrong. Please do mention in the comment section, could be useful for others. Thanks!

_________________________

Unstructured Data Classification Hands-on Solution

Unstructured Data Classification MCQ solution

Identify the unstructured data from the following.

Answer : image

What kind of classification is our case study 'Spam Detection'?

Answer : Binary

Which pre-processing technique is used to remove the most commonly used words?

Answer : Stop word removal

The cross-validation technique is used to evaluate a classifier by dividing the data set into a training

set to train the classifier and a testing set to test the same.

Answer : True

True Positive is when the predicted instance and the actual instance are not negative.

Answer : True

True Negative is when the predicted instance and the actual instance are positive.

Answer : False

An algorithm that counts how many times a word appears in a document is __________

Answer : Bag-of-Words (BOW)

Pruning is a technique associated with __________

Answer : Decision tree

Select the correct statement about Nonlinear classification.

Answer : Kernel tricks are used by Nonlinear classifiers to achieve maximum-margin hyper planes (Incorrect)

Stemming and lemmatization give the same result.

Answer : False

Question Type: Single-Select

a) Download the dataset from https://hrcdn.net/s3_pub/istreetassets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.

b) Give the column names as 'label' and 'message'.

c) Try out the code snippets and answer the questions.

What is the output of the following command: print(sentiment_analysis_data['label'].unique())

Answer : [1 0]

The most widely used package for machine learning in Python is _________

Answer : sklearn

In Supervised learning, class labels of the training samples are ____________

Answer : Known

Unstructured Data Classification Hands-on Solution

Select the pre-processing technique(s) from the following.

Answer : All the options

Model Tuning helps to increase accuracy.

Answer : True (Incorrect) Cannot say

Question Type: Single-Select

a) Download the dataset from https://hrcdn.net/s3_pub/istreetassets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.

b) Give the column names as 'label' and 'message'.

c) Try out the code snippets and answer the questions.

What command should be given to tokenize a sentence into words?

Answer : from nltk.tokenize import word_tokenize, Word_tokens =word_tokenize(sentence)

Identify the stop word(s) from the following.

Answer : Both "the" and "it"

The following are performance evaluation measures, except __________

Answer : Decision Tree

Images and documents are examples of ___________

Answer : Unstructured data

Choose the correct sequence for classifier building from the following.

Answer : Initialize -> Train -> Predict -> Evaluate

Which of the given hyperparameters, when increased, may cause the random forest to overfit the

data?

Answer : Depth of Tree

The fit (X, y) is used to __________

Answer : Train the classifier

Question Type: Single-Select

a) Download the dataset from https://hrcdn.net/s3_pub/istreetassets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.

b) Give the column names as 'label' and 'message'.

c) Try out the code snippets and answer the questions.

What does the command sentiment_analysis_data['label'].value_counts() return?

Answer : The count of unique values in the 'label' column

What is the purpose of lemmatization?

Unstructured Data Classification Hands-on Solution

Answer : To convert words into a proper base form

Clustering is supervised classification.

Answer : False

Supervised learning differs from unsupervised learning as supervised learning requires __________

Answer : Labeled data

Set2:

To view the first 3 rows of the dataset, which of the following commands is used?

Answer : sentiment_analysis_data.head(3)

Inverse Document frequency is used in the term-document matrix.

Answer : True

Can we consider sentiment classification as a text classification problem?

Answer : Yes

In document classification, each document has to be converted from full text to a document vector.

Answer : true

A technique used to depict the performance in a tabular form that has 2 dimensions namely actual

and predicted sets of data is ___________

Answer : Confusion Matrix

Which NLP technique uses a lexical knowledge base to obtain the correct base form of the words?

Answer : Lemmatization

Which numerical statistics is used to identify the importance of a rare word in a document?

Answer : TF-IDF

Which type of cross-validation is used for an imbalanced dataset?

Answer : K-Fold

Cross-validation causes over-fitting.

Answer : False

$Download the dataset from https://inclass.kaggle.com/c/si650winter11/download/training.txt and

load it to the variable 'sentiment_analysis_data'.

b) Give the column names as 'label' and 'message'.

c) Try out the code snippets and answer the questions.

Is there a class imbalance problem in the given data set?

Answer : Yes

SVM is a _____________

Answer : Supervised learning algorithm

In a Term Document Matrix (TDM), each row represents ____________

Answer : TF-IDF value

Imagine you have just finished training a decision tree for spam classification, and it is showing

abnormal bad performance on both your training and test sets. Assume that your implementation

has no bugs. What could be the reason for this problem?

Answer : All the options

In a Document Term Matrix (DTM), each row represents

Answer : TF-IDF value

Email spam data is an example of __________

Answer : Unstructured data

Choose the correct sequence from the following.

Answer : Data Analysis -> Pre-Processing -> Model Building -> Predict

High classification accuracy always indicates a good classifier.

Answer : False

_______ directly achieves multi-class classification (without the support of binary classifiers).

Answer : K Nearest Neighbor

A classifier that can compute using numeric as well as categorical values is __________

Answer : Random Forest Classifier

Lemmatization offers better precision than stemming.

Answer : True

The following are pre-processing methods used for unstructured data classification, except

_________

Answer : Confusion_matrix

TF-IDF is a feature extraction technique.

Answer : True

The higher value of which of the following hyperparameters is better for the decision tree

algorithm?

Answer : Cannot say

$Download the dataset from https://hrcdn.net/s3_pub/istreetassets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.

b) Give the column names as 'label' and 'message'.

c) Try out the code snippets and answer the questions.

What kind of classification is the given case study (Sentiment Analysis dataset)?

Answer : Binary classification

$ Download the dataset from https://hrcdn.net/s3_pub/istreetassets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.

b) Give the column names as 'label' and 'message'.

c) Try out the code snippets and answer the questions.

Which of the following commands is used to view the dataset SIZE, and what is the value returned?

Answer : sentiment_analysis_data.shape, (6918, 2)

Unstructured Data Classification Hands-on Solution

______________________________

If you have any queries, please feel free to ask on the comment section.

If you want MCQs and Hands-On solutions for any courses, Please feel free to ask on the comment section too.

Please share and support our page!

MNC Answers

Unstructured Data Classification MCQ solution | TCS Fresco Play

Post a Comment

Azure Virtual Machines MCQs Solution | TCS Fresco Play

Node.Js Essentials Hands-On Solutions | TCS Fresco Play

Python 3 Functions and OOPs MCQs Solution | TCS Fresco Play

List of Fresco Play courses with MCQ and Hands-on | TCS Fresco Play

Microsoft Teams MCQs Solution | TCS Fresco Play

MNC Answers