Temperature (in Celcius)	Sales
20	2,000
25	2,100
26	2,300
28	2,400
30	2,600
36	3,100

No. of rooms	Floors	Area (sq ft)	Price
2	0	900	$4000,00
3	2	1,100	$600,000
3.5	5	1,500	$900,000
4	3	2,100	$1,200,000

Basic Program 📚	Suggested Program ✍️	Trending Program 📈
Explore Now	Explore Now	Explore Now

Error	Residual Error
The difference between the actual value and the predicted value is called an error. Some of the popular means of calculating data science errors are: Root Mean Squared Error (RMSE) Mean Absolute Error (MAE) Mean Squared Error (MSE)	The difference between the arithmetic mean of a group of values and the observed group of values is called a residual error.
An error is generally unobservable.	A residual error can be represented using a graph.
A residual error is used to show how the sample population data and the observed data differ from each other.	An error is how actual population data and observed data differ from each other.

Standardization	Normalization
The technique of converting data in such a way that it is normally distributed and has a standard deviation of 1 and a mean of 0.	The technique of converting all data values to lie between 1 and 0 is known as Normalization. This is also known as min-max scaling.
Standardization takes care that the standard normal distribution is followed by the data.	The data returning into the 0 to 1 range is taken care of by Normalization.
Normalization formula - X’ = (X - Xmin) / (Xmax - Xmin) Here, Xmin - feature’s minimum value, Xmax - feature’s maximum value.	Standardization formula - X’ = (X - 𝞵) / 𝞼

NAME	ATTRIBUTE	VALUE
RAMA	HEIGHT	182
SITA	HEIGHT	160

NAME	HEIGHT
RAMA	182
SITA	160

Tutorial Playlist

Data Science Tutorial for Beginners

What Is Data Science: Lifecycle, Applications, Prerequisites and Tools

The Best Introduction to Data Science

Data Scientist vs Data Analyst vs Data Engineer: Job Role, Skills, and Salary

Data Science with R

Getting Started with Linear Regression in R

Logistic Regression in R: The Ultimate Tutorial with Examples

Support Vector Machine (SVM) in R: Taking a Deep Dive

Introduction to Random Forest in R

What is Hierarchical Clustering and How Does It Work

The Best Guide to Time Series Forecasting in R

How to Build a Career in Data Science?

How to Become a Data Scientist

Data Scientist Salary in India: Are You Earning Enough?

Top 90+ Data Science Interview Questions and Answers for 2025

What is Synthetic Data Generation? Definition, Types, and More

Data Science Interview Questions and Answers

Data Science Tutorial for Beginners

What Is Data Science: Lifecycle, Applications, Prerequisites and Tools

The Best Introduction to Data Science

Data Scientist vs Data Analyst vs Data Engineer: Job Role, Skills, and Salary

Data Science with R

Getting Started with Linear Regression in R

Logistic Regression in R: The Ultimate Tutorial with Examples

Support Vector Machine (SVM) in R: Taking a Deep Dive

Introduction to Random Forest in R

What is Hierarchical Clustering and How Does It Work

The Best Guide to Time Series Forecasting in R

How to Build a Career in Data Science?

How to Become a Data Scientist

Data Scientist Salary in India: Are You Earning Enough?

Top 90+ Data Science Interview Questions and Answers for 2025

What is Synthetic Data Generation? Definition, Types, and More

Table of Contents

What is Data Science?

Take Your Data Scientist Skills to the Next Level

10 Most Asked Data Science Interview Questions

Basic and Advanced Data Science Interview Questions

1. What are the differences between supervised and unsupervised learning?

2. How is logistic regression done?

3. Explain the steps in making a decision tree.

4. How do you build a random forest model?

Steps to build a random forest model:

5. How can you avoid overfitting your model?

Master Data Science and Unlock Top-Tier Roles

6. Differentiate between univariate, bivariate, and multivariate analysis.

Univariate

Bivariate

Multivariate

7. What are the feature selection methods used to select the right variables?

Filter Methods

Wrapper Methods

Become the Highest Paid Data Scientist in 2025

8. In your choice of language, write a program that prints the numbers ranging from one to 50.

9. You are given a data set consisting of variables with more than 30 percent missing values. How will you deal with them?

10. For the given points, how will you calculate the Euclidean distance in Python?

11. What are dimensionality reduction and its benefits?

12. How will you calculate eigenvalues and eigenvectors of the following 3x3 matrix?

13. How should you maintain a deployed model?

Monitor

Evaluate

Compare

Rebuild

14. What are recommender systems?

Collaborative Filtering

Content-based Filtering

15. How do you find RMSE and MSE in a linear regression model?

Is Becoming a Data Scientist Your Next Milestone?

16. How can you select k for k-means?

17. What is the significance of p-value?

18. How can outlier values be treated?

Grab the Top Data Science Job Roles

19. How can time-series data be declared as stationery?

20. How can you calculate accuracy using a confusion matrix?

21. Write the equation and calculate the precision and recall rate.

22. 'People who bought this also bought…' recommendations seen on Amazon are a result of which algorithm?

23. Write a basic SQL query that lists all orders with customer information.

24. You are given a dataset on cancer detection. You have built a classification model and achieved an accuracy of 96 percent. Why shouldn't you be happy with your model performance? What can you do about it?

25. Which of the following machine learning algorithms can be used for inputting missing values of both categorical and continuous variables?