Colab/머신러닝

14. GBoost 02

HicKee 2023. 3. 10. 15:36

loss : 경사 하강법에서 사용할 비용함수 MSE

 

learning rate : 학습률

 

n_estimator : weak_learn 개수 (디폴트 100)

 

subsample : weak_learn 가 학습에 사용하는 데이터와 샘플링의 비율 (디폴트 1)

> 과적합이 우려되면 1보다 작은 수 

 

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
# 정확도, 혼돈행렬(참, 예측), 리포트
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.model_selection import train_test_split, GridSearchCV

dt_iris = datasets.load_iris()
X = dt_iris.data
y = dt_iris.target

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_state=124)

sc= StandardScaler()

X_train_ss = sc.fit_transform(X_train)
X_test_ss = sc.transform(X_test)

prams_boost = {
    'n_estimators' : [100,150,200,250,400],
    'learning_rate' : [0.01,0.02,0.03,0.05,0.1],
    'max_depth':[3,4,5,7,10],
    'subsample': [0.9,0.7,0.5,0.3,0.2]
}
model  = GradientBoostingClassifier(random_state=1234)
gboost_cv = GridSearchCV(model, param_grid=prams_boost, cv=3,verbose=2)
gboost_cv.fit(X_train_ss, y_train)
print(gboost_cv.best_params_)
print(gboost_cv.best_score_)
더보기

[CV] END learning_rate=0.1, max_depth=10, n_estimators=400, subsample=0.3; total time=   1.2s
[CV] END learning_rate=0.1, max_depth=10, n_estimators=400, subsample=0.3; total time=   1.2s
[CV] END learning_rate=0.1, max_depth=10, n_estimators=400, subsample=0.2; total time=   1.2s
[CV] END learning_rate=0.1, max_depth=10, n_estimators=400, subsample=0.2; total time=   1.2s
[CV] END learning_rate=0.1, max_depth=10, n_estimators=400, subsample=0.2; total time=   1.2s
{'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 400, 'subsample': 0.3}
1.0

# model = GradientBoostingClassifier(max_depth=4, learning_rate=0.01, random_state=1234, subsample=0.8)
# model.fit(X_train_ss, y_train)
model = gboost_cv.best_estimator_


y_pred = model.predict(X_test_ss)

print(accuracy_score(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
더보기

0.9111111111111111
[[14  0  0]
 [ 0 11  3]
 [ 0  1 16]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        14
           1       0.92      0.79      0.85        14
           2       0.84      0.94      0.89        17

    accuracy                           0.91        45
   macro avg       0.92      0.91      0.91        45
weighted avg       0.91      0.91      0.91        45

y_pred = gboost_cv.best_estimator_
y_pred
더보기

GradientBoostingClassifier(learning_rate=0.01, n_estimators=400,
                           random_state=1234, subsample=0.3)

'Colab > 머신러닝' 카테고리의 다른 글

16. K-평균 (K-means) & 실루엣 계수 (silhouette coefficient) 01  (0) 2023.03.10
15. XGBoost 01  (0) 2023.03.10
13. GBoost 01  (0) 2023.03.10
12. 부스팅(Boosting) 01  (0) 2023.03.09
11. 랜덤 포레스트 (random forest) 03  (0) 2023.03.09