Fundamentals_of_Accelerated_Data_Science/25.11.25.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

3.04 как корректируются коэффициенты, функция ошибок

3.06 энтропия, джини (splitting criteria)
subsample 0.8 для чего помогает
f score для переменной
ROC curve


https://mlu-explain.github.io/logistic-regression/
A common way to estimate coefficients is to use gradient descent. In gradient descent, the goal is to minimize the Log-Loss cost function over all samples. This method involves selecting initial parameter values, and then updating them incrementally by moving them in the direction that decreases the loss. At each iteration, the parameter value is updated by the gradient, scaled by the step size (otherwise known as the learning rate). The gradient is the vector encompassing the direction and rate of the fastest increase of a function, which can be calculated using partial derivatives. The parameters are updated in the opposite direction of the gradient by the step size in an attempt to find the parameter values that minimize the Log-Loss.

Because the gradient calculates where the function is increasing, going in the opposite direction leads us to the minimum of our function. In this manner, we can repeatedly update our model's coefficients such that we eventually reach the minimum of our error function and obtain a sigmoid curve that fits our data well.

https://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch12.pdf


https://medium.com/data-science/optimization-loss-function-under-the-hood-part-ii-d20a239cde11


https://xgboost.readthedocs.io/en/stable/parameter.html
Subsample ratio of the training instances. Setting it to 0.5 means that XGBoost would randomly sample half of the training data prior to growing trees. and this will prevent overfitting. Subsampling will occur once in every boosting iteration.


ROC (Receiver Operating Characteristic)  
https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc


https://www.learndatasci.com/glossary/gini-impurity/