Regal Credit Finance Limited

 找回密碼
 立即註冊
搜索
熱搜: 活動 交友 discuz
查看: 141|回復: 0

Decision tree Information gain Gini index Pruning Random forest

[複製鏈接]

1

主題

1

帖子

5

積分

新手上路

Rank: 1

積分
5
發表於 2024-9-24 14:54:28 | 顯示全部樓層 |閱讀模式
Decision Tree in Data Mining What is a decision tree? A decision tree is a common machine learning algorithm that builds a tree structure for classification or regression tasks by learning features in the data. Each node of a decision tree represents a feature, each edge represents the value of a feature, and the leaf node represents the final classification result or the prediction of a continuous value. Figuratively speaking, a decision tree is like a flowchart for making decisions. For example, if we want to decide what clothes to wear today, we can build a decision tree to make a judgment based on features such as weather, temperature, and occasion. How decision trees work Feature selection: Select a feature that best distinguishes samples from all features as a node. Node splitting: Split the data set into sub-datasets based on the selected features.

Recursive construction Repeat steps 1 and 2 for each sub-dataset until the stopping Special Data condition is met (such as all samples belong to the same class or the maximum depth is reached). Advantages of decision trees Strong interpretability: The structure of a decision tree is intuitive and easy to understand. It works well with both categorical and numerical data. It does not require a lot of data preprocessing. Disadvantages of decision trees Overfitting: When the depth of a decision tree is too deep, it is easy to overfit the training data and perform poorly on the test set. Instability: Decision trees are sensitive to small changes in data. Common algorithms for decision trees ID3 algorithm: Select features based on information gain. C4.5 algorithm: Select features based on information gain rate, and improve the ID3 algorithm.



Can be used for classification and regression, using the Gini index or variance reduction as the splitting criterion. Application scenarios of decision trees Classification problems: Customer classification, credit scoring, disease diagnosis, etc. Regression problems: Predict house prices, stock prices, etc. Anomaly detection: Discover abnormal data. Pruning of decision trees In order to prevent overfitting, it is usually necessary to prune the decision tree. Pruning is divided into pre-pruning and post-pruning. Pre-pruning: Set some restrictions during the tree growth process to stop the tree growth in advance. Post-pruning: Let the tree grow completely first, and then prune some subtrees from the bottom up.


回復

使用道具 舉報

您需要登錄後才可以回帖 登錄 | 立即註冊

本版積分規則

Archiver|手機版|小黑屋|Regal Credit Finance Limited

GMT+8, 2024-11-23 17:18 , Processed in 0.037269 second(s), 19 queries .

Powered by Discuz! X3.4

© 2001-2017 Comsenz Inc.

快速回復 返回頂部 返回列表