Classification - Nejati Notes

In *classification*, we try to predict the output based on a given samples like regression. However, unlike regression, this output is not a continuous number but a limited number **categories/class**. e.g. **Binary Classification**: the output is either *yes* or *no*. ### Logistic Regression Unlike its name, logistic regression is a **classification algorithm**. It does binary classification of tasks by predicting the probability of being in "yes" class or not. e.g. with the variable purchase decision with the two values buys a product and does not buy a product. The logistic function is based on a *logical* model. For a range of minus and plus infinity, it always returns a number between 0 and 1. ![[Logistic-function.png]] Where $z$ is the logical function of parameters: ![[Pasted image 20250322154114.png]] ### Decision Boundary As we know, if the logistic function $f_{(\hat{w}, b)}$ outputs more $0.5$, the output becomes 1 and if less, it becomes 0. So there must be a middle line based on the parameters where $f$ outputs exactly 0.5. That is called the *decision boundary*. ![[Pasted image 20250322154714.png]] Just like linear regression, the features of $f$ function can be polynomial. This also effects how the decision boundary looks like: ![[Pasted image 20250322154943.png]] ### Cost Function If we use squared-error for logistic regression as well, its cost function will not be convex. It means that there will be high number of local minimums that gradient decent might stuck into. Hence, a new cost function can be used that will be convex for this scenario: ![[Pasted image 20250322161617.png]] ![[Pasted image 20250322161652.png]] When that output is 1, function incentives the parameter to adjust in order to get to 1. Conversely, when the output is 0, the other function will be used that incentives 0 output. This way gradient decent can nicely reach a global minimum. > [!info] > The loss function can be *simplified* as following: > $ - (1 - 0) \log{(1-f_{(\hat{x})})}$ > [!tip] > The cost function is derived from *maximum likelihood* in math.