CARVIEW |
Checking & Handling Duplicate Values
Checking & Handling Duplicate Values Quiz
Question 1
What is the main reason why duplicates can negatively impact machine learning models?
They increase storage space usage
They cause overfitting by repeating the same data points
They make the dataset too small for training
They make models run faster
Question 2
Which pandas function is used to identify duplicate rows in a DataFrame?
dropna()
duplicated()
fillna()
replace()
Question 3
What happens when you use df.drop_duplicates() without any parameters?
It removes all rows from the DataFrame
It removes all duplicate rows while keeping the first occurrence
It removes duplicate rows while keeping the last occurrence
It removes duplicate values only from numeric columns
Question 4
Which of the following is NOT a reason why duplicates should be removed?
They can lead to incorrect statistical analysis
They make machine learning models more robust
They increase computational costs
They introduce complexity in data management
Question 5
How can you remove duplicates while keeping only the last occurrence of each row?
df.drop_duplicates(keep='first')
df.drop_duplicates(keep='none')
df.drop_duplicates(keep='last')
df.drop_duplicates(subset=['column1', 'column2'])
There are 5 questions to complete.