Unstructured Data Classification MCQ solution | TCS Fresco Play
Disclaimer: The primary purpose of providing this solution is to assist and support anyone who are unable to complete these courses due to a technical issue or a lack of expertise. This website's information or data are solely for the purpose of knowledge and education.
Make an effort to understand these solutions and apply them to your Hands-On difficulties. (It is not advisable that copy and paste these solutions).
All Question of the MCQs Present Below for Ease Use Ctrl + F with the question name to find the Question. All the Best!
If you found answer for any of the questions is wrong. Please do mention in the comment section, could be useful for others. Thanks!
_________________________
Unstructured Data Classification Hands-on Solution
Unstructured Data Classification MCQ solution
Identify the unstructured data from the following.
Answer : image
What kind of classification is our case study 'Spam Detection'?
Answer : Binary
Which pre-processing technique is used to remove the most commonly used words?
Answer : Stop word removal
The cross-validation technique is used to evaluate a classifier by dividing the data set into a training
set to train the classifier and a testing set to test the same.
Answer : True
True Positive is when the predicted instance and the actual instance are not negative.
Answer : True
True Negative is when the predicted instance and the actual instance are positive.
Answer : False
An algorithm that counts how many times a word appears in a document is __________
Answer : Bag-of-Words (BOW)
Pruning is a technique associated with __________
Answer : Decision tree
Select the correct statement about Nonlinear classification.
Answer : Kernel tricks are used by Nonlinear classifiers to achieve maximum-margin hyper planes (Incorrect)
Stemming and lemmatization give the same result.
Answer : False
Question Type: Single-Select
a) Download the dataset from https://hrcdn.net/s3_pub/istreetassets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.
b) Give the column names as 'label' and 'message'.
c) Try out the code snippets and answer the questions.
What is the output of the following command: print(sentiment_analysis_data['label'].unique())
Answer : [1 0]
The most widely used package for machine learning in Python is _________
Answer : sklearn
In Supervised learning, class labels of the training samples are ____________
Answer : Known
Unstructured Data Classification Hands-on Solution
Select the pre-processing technique(s) from the following.
Answer : All the options
Model Tuning helps to increase accuracy.
Answer : True (Incorrect) Cannot say
Question Type: Single-Select
a) Download the dataset from https://hrcdn.net/s3_pub/istreetassets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.
b) Give the column names as 'label' and 'message'.
c) Try out the code snippets and answer the questions.
What command should be given to tokenize a sentence into words?
Answer : from nltk.tokenize import word_tokenize, Word_tokens =word_tokenize(sentence)
Identify the stop word(s) from the following.
Answer : Both "the" and "it"
The following are performance evaluation measures, except __________
Answer : Decision Tree
Images and documents are examples of ___________
Answer : Unstructured data
Choose the correct sequence for classifier building from the following.
Answer : Initialize -> Train -> Predict -> Evaluate
Which of the given hyperparameters, when increased, may cause the random forest to overfit the
data?
Answer : Depth of Tree
The fit (X, y) is used to __________
Answer : Train the classifier
Question Type: Single-Select
a) Download the dataset from https://hrcdn.net/s3_pub/istreetassets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.
b) Give the column names as 'label' and 'message'.
c) Try out the code snippets and answer the questions.
What does the command sentiment_analysis_data['label'].value_counts() return?
Answer : The count of unique values in the 'label' column
What is the purpose of lemmatization?
Unstructured Data Classification Hands-on Solution
Answer : To convert words into a proper base form
Clustering is supervised classification.
Answer : False
Supervised learning differs from unsupervised learning as supervised learning requires __________
Answer : Labeled data
Set2:
To view the first 3 rows of the dataset, which of the following commands is used?
Answer : sentiment_analysis_data.head(3)
Inverse Document frequency is used in the term-document matrix.
Answer : True
Can we consider sentiment classification as a text classification problem?
Answer : Yes
In document classification, each document has to be converted from full text to a document vector.
Answer : true
A technique used to depict the performance in a tabular form that has 2 dimensions namely actual
and predicted sets of data is ___________
Answer : Confusion Matrix
Which NLP technique uses a lexical knowledge base to obtain the correct base form of the words?
Answer : Lemmatization
Which numerical statistics is used to identify the importance of a rare word in a document?
Answer : TF-IDF
Which type of cross-validation is used for an imbalanced dataset?
Answer : K-Fold
Cross-validation causes over-fitting.
Answer : False
$Download the dataset from https://inclass.kaggle.com/c/si650winter11/download/training.txt and
load it to the variable 'sentiment_analysis_data'.
b) Give the column names as 'label' and 'message'.
c) Try out the code snippets and answer the questions.
Is there a class imbalance problem in the given data set?
Answer : Yes
SVM is a _____________
Answer : Supervised learning algorithm
In a Term Document Matrix (TDM), each row represents ____________
Answer : TF-IDF value
Imagine you have just finished training a decision tree for spam classification, and it is showing
abnormal bad performance on both your training and test sets. Assume that your implementation
has no bugs. What could be the reason for this problem?
Answer : All the options
In a Document Term Matrix (DTM), each row represents
Answer : TF-IDF value
Email spam data is an example of __________
Answer : Unstructured data
Choose the correct sequence from the following.
Answer : Data Analysis -> Pre-Processing -> Model Building -> Predict
High classification accuracy always indicates a good classifier.
Answer : False
_______ directly achieves multi-class classification (without the support of binary classifiers).
Answer : K Nearest Neighbor
A classifier that can compute using numeric as well as categorical values is __________
Answer : Random Forest Classifier
Lemmatization offers better precision than stemming.
Answer : True
The following are pre-processing methods used for unstructured data classification, except
_________
Answer : Confusion_matrix
TF-IDF is a feature extraction technique.
Answer : True
The higher value of which of the following hyperparameters is better for the decision tree
algorithm?
Answer : Cannot say
$Download the dataset from https://hrcdn.net/s3_pub/istreetassets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.
b) Give the column names as 'label' and 'message'.
c) Try out the code snippets and answer the questions.
What kind of classification is the given case study (Sentiment Analysis dataset)?
Answer : Binary classification
$ Download the dataset from https://hrcdn.net/s3_pub/istreetassets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.
b) Give the column names as 'label' and 'message'.
c) Try out the code snippets and answer the questions.
Which of the following commands is used to view the dataset SIZE, and what is the value returned?
Answer : sentiment_analysis_data.shape, (6918, 2)
Unstructured Data Classification Hands-on Solution
Post a Comment