Machine Learning and Statistics Solutions¶
This document provides answers and explanations for the multiple-choice questions related to Machine Learning and Regression Analysis.
1. Deep Learning Definition¶
Question: What is deep learning? (image_fc3093.png) Answer: c. All of the other
- Explanation: Deep learning is a subset of machine learning that utilizes neural networks with multiple hidden layers. Both machine learning and deep learning are fields within Artificial Intelligence.
2. Discrete Variables¶
Question: ______ can only take a countable number of values. (image_fc298b.png) Answer: d. Discrete variables
- Explanation: Discrete variables are countable, typically represented by integers (e.g., number of students, result of a die roll). Continuous variables can take any value within a range (e.g., height, temperature).
3. Logistic Regression Prediction¶
Question: When a trained logistic regression classifier gives a prediction for a new example 'x', it means (image_fc3018.png) Answer: a. Our estimate for P(y=1 | x)
- Explanation: Logistic regression is a probabilistic model that estimates the conditional probability of the positive class (\(Y=1\)) given the input features \(X=x\).
4. Linear Regression and Outliers¶
Question: Which of the following statement is true about outliers in Linear regression? (image_fc28ef.png) Answer: c. Linear regression is sensitive to outliers
- Explanation: Standard linear regression minimizes the Mean Squared Error (MSE). Because the error is squared, outliers (which have large errors) are heavily penalized, making the model highly sensitive to them.
5. Logistic Regression Model Type¶
Question: State true or false. Logistic regression is a discriminative model? (image_fc42dd.png) Answer: a. True
- Explanation: Discriminative models directly model the conditional probability \(P(Y|X)\) to distinguish between classes, which is exactly what Logistic Regression does.
6. Gradient Descent Convergence¶
Question: You run gradient descent for 15 iterations with \(\alpha=0.3\) and compute \(J(\theta)\) after each iteration. You find that the value of \(J(\theta)\) decreases quickly and then levels off. Based on this, which of the following conclusions seems most plausible? (image_fc2cf2.png) Answer: d. \(\alpha=0.3\) is an effective choice of learning rate
- Explanation: When the cost function decreases quickly and then levels off (converges), it indicates the learning rate (\(\alpha\)) is well-tuned—it was large enough for speed and small enough for stable convergence.
7. Evaluation Metric for Continuous Value¶
Question: If \(y\) is a continuous predicted value which of the following evaluation metrics is correct? (image_fc99ac.png) Answer: c. Sum of Squares Error
- Explanation: A continuous predicted value is the output of a regression problem. Sum of Squares Error (SSE) is a core regression metric. Accuracy, Recall, and Precision are classification metrics.
8. Regression Metric for Outliers¶
Question: What regression evaluation metric should you consider when you have many outliers in your data and you do not want to account for outliers? (image_fc46d9.png) Answer: d. Mean Absolute Error (MAE)
- Explanation: Mean Absolute Error (MAE) is robust to outliers because it penalizes errors linearly, unlike MSE and RMSE, which amplify outlier penalties by squaring the error.
9. Regression and Causality¶
Question: State whether the statement is True or False - "Regression analysis discovers causal relationships from large data sets" (image_fc9db3.png) Answer: a. False
- Explanation: Regression analysis only measures correlation (association). Correlation does not imply causation; establishing causality requires controlled experiments.
[Image of the difference between correlation and causation]
10. Adjusted \(R^2\) Property¶
Question: Which of the following is true about the adjusted value of the coefficient of determination? (image_fc3839.png) Answer: c. Is always less than the proportion of the sum of the squared deviations of the dependent variable from its mean that is explained by the regression model.
- Explanation: The "proportion of the sum of the squared deviations explained" is the standard \(R^2\). The Adjusted \(R^2\) penalizes model complexity (extra predictors) and is mathematically less than or equal to the standard \(R^2\).
11. Parameters for a Quadratic Curve¶
Question: What is the minimum and maximum number of parameters we need if we wish to fit a quadratic curve to data? (image_fc34f5.png) Answer: b. 1 and 3
- Explanation: The general quadratic equation (\(y = w_0 + w_1 x + w_2 x^2\)) has a maximum of three parameters. The minimum number of non-zero parameters required in a constrained model is one.
12. Vectorization Preference¶
Question: Vectorization is not preferred for _ values of \(n\) due to _. (image_fc9d16.png) Answer: a. none of these options
- Explanation: Vectorization (using parallel matrix operations) is the preferred computational method in machine learning for all dataset sizes (\(n\)) due to significant speed advantages on modern hardware.
13. Gradient Descent and Differentiability¶
Question: Gradient descent can be used for non-differentiable functions? (image_fc3f1a.png) Answer: b. False
- Explanation: Standard Gradient descent requires the cost function to be differentiable so the gradient (derivative) can be calculated to find the direction toward the minimum.
14. One-Vs-All in Logistic Regression¶
Question: With respect to One-Vs-All method in Logistic Regression, We need to? (image_fc33fa.png) Answer: a. Fit \(n\) models in \(n\)-class classification problem
- Explanation: The One-Vs-All strategy handles an \(n\)-class problem by training \(n\) separate binary classifiers, each distinguishing one class from all the rest.
15. Cross-Entropy Loss Behavior¶
Question: What happens to cross-entropy loss as the predicted probability diverges from the actual label? (image_fc3fb5.png) Answer: c. Increases
- Explanation: As the prediction becomes less accurate (diverges) from the true label, the error increases, causing the cross-entropy loss to increase.
16. Regression Relationship¶
Question: Regression is the process of finding the relationship between (image_fc475d.png) Answer: d. Several Predictors and One Response variable
- Explanation: The most common form of regression, Multiple Linear Regression, uses several independent variables (predictors) to model a single dependent variable (response).
17. Linear Basis Function¶
Question: Which of these is a linear basis function? (image_fc3bf8.png) Answer: b. Polynomial
- Explanation: A model using a Polynomial basis (e.g., \(x, x^2\)) is the standard example of a model that is non-linear in the input feature \(x\) but linear in the parameters (\(\mathbf{w}\)), fitting the definition of a linear basis function model.
18. Sigmoid Function Properties¶
Question: Choose the appropriate option which indicates the correct properties of a sigmoid function? (image_fc437c.png) Answer: c. It acts as a squashing function because it maps the whole real axis into a finite interval.
- Explanation: The sigmoid function maps any input \(z \in (-\infty, \infty)\) to the finite range \((0, 1)\). This is the squashing property.
[Image of a sigmoid function curve]
19. Learning Rate Consequences¶
Question: If the learning rate is not selected properly in the gradient descent algorithm, the consequences that may arise out of it are? (image_fca87a.png) Answer: a. All the given options
- Explanation: An improper learning rate (\(\alpha\)) can lead to Oscillations (if \(\alpha\) is too high), Slow convergence (if \(\alpha\) is too low), and failure to find the Global Minimum.
20. Logistic Regression Cross-Entropy¶
Question: Choose the appropriate statement for logistic regression? (image_fca555.png) Answer: d. If \(P(Y=1|X, \theta) = 0\) and actual class label is \(1\) then cross entropy cost will be very high for logistic regression
- Explanation: This is a confident, incorrect prediction (\(\hat{y}=0\) when \(y=1\)). The cross-entropy cost \(J = -\log(0)\) approaches positive infinity (\(\infty\)), which is considered a very high cost.