Decision Tree Induction



  • Decision Tree is a tree that helps us in decision-making purposes. Decision tree creates classification or regression models as a tree structure.
  • It separates a data set into smaller subsets, and at same time, decision tree is steadily developed. Decision node has at least two branches. leaf nodes show a classification or decision. Decision trees can deal with both categorical and numerical data.

Key factors

Entropy

  • Entropy refers a common way to measure impurity. It measures the randomness or impurity in data sets.
 Decision Tree Induction1

Read Also

Information Gain

  • It refers to decline in entropy after dataset is split. It is also called Entropy Reduction.
 Decision Tree Induction2
  • Decision tree is just like a flow chart diagram with terminal nodes showing decisions.

Why are decision trees useful

  • It enables us to analyze the possible consequences.
  • It provides us a framework to measure the values of outcomes.
  • It helps us to make the best decisions based on existing data.
  • The decision tree model comprises a set of rules for portioning a huge heterogeneous population into smaller, more homogeneous, or mutually exclusive classes given data of attributes together with its class, a decision tree creates a set of rules that can be used to identify the class. A decision tree creates a set of rules that can be used to identify the class. Rule is implemented after another, resulting in a hierarchy of segments within a segment.
  • The hierarchy is known as the tree. Each segment is called a node. With each progressive division, the members from the subsequent sets become more and more similar to each other. The algorithm used to build a decision tree is referred to as recursive partitioning. The algorithm is called as CART (Classification and Regression Trees)
  • The given example of a factory where
 Decision Tree Induction3
  • Management teams need to take a data-driven decision to expand or not based on the given data.
  • Net Expand = ( 0.6 *8 + 0.4*6 ) - 3 = $4.2M
    Net Not Expand = (0.6*4 + 0.4*2) - 0 = $3M
    $4.2M > $3M, the factory should be expanded.

Decision tree Algorithm

  • Algorithm is based on three parameters: D, attribute_list, and Attribute _selection_method. It refer to D as a data partition.
    • D - It is the entire set of training tuples and their related class levels.
    • attribute_list - It is a set of attributes defining tuples.
    • Attribute_selection_method - It specifies a heuristic process for choosing attribute that "best" discriminates given tuples according to class. Attribute_selection_method process applies attribute selection measure

Advantages of using Decision Trees

  • Missing values in data also do not influence process of building a choice tree to any considerable extent.
  • Decision tree does not need scaling of information.
  • Decision tree does not require a standardization of data.
  • Decision tree model is automatic and simple to explain to the technical team as well as stakeholders.
  • Compared to other algorithms, decision trees need less exertion for data preparation during pre-processing.


Related Searches to Decision Tree Induction