Sometimes, some particular condition attributes cannot be used to distinguish objects; they are redundant. The condition attributes excluding redundant attributes are called reduct in rough sets theory. A reduct is the essential part of an information table which can selleck product discern all objects discernible by the original table. The performance of the specified condition attributes can be described with two indicators:
accuracy of the approximation and quality of approximation. Accuracy of approximation represents the percentage of the associated objects definable with the specified condition attributes. It is defined as follows: αpX=cardA_XcardA¯X, (2) where cardrefers to cardinality. The value of accuracy ranges from 0 to 1. The closer to 1 is the accuracy, the more discernible is the condition attribute, that is, travel mode. It implies that the associated travel mode does exist unambiguously. On the other hand, quality of approximation represents what percentage of the universe is definable. Let X = X1, X2,…, Xn be a classification of U;
that is to say, Xi∩Xj = ∅, ∀i, j ≤ n, i ≠ j and i=1nXi = U. Xi is called class of X. Quality of approximation of classification X by a set of attributes can be defined as follows: γpX=∑i=1rcardA_XicardU. (3) The value of quality ranges from 0 to 1. The closer to 1 is the quality, the more objects of the universe clearly belong to a single class of X. It implies that all travel modes can be clearly identified. To recognize further details of mode choices, rules need to be extracted. Using reduced information table (without redundant attributes), the rules could be found through determining the decision attributes value based on condition attributes values. Therefore, the rules are presented in an “IF condition(s) THEN decision(s)” format. If the condition(s) in the IF part matches with the given fact(s), the decision(s) in the THEN part will be performed. Unlike mathematical functions or statistical models in traditional travel demand forecasting analysis, decision rules induced from a set of raw data can capture and represent both numeric and
nonnumeric variables. In addition, the modular nature of decision rules makes it easy for researchers to insert new decisions rules or to modify/delete existing decision rules without affecting the overall system. Once a set of rules have been derived, it is then that the training stage of the Drug_discovery knowledge discovery finishes and the rules are then tested. 4.2. Theory of Testing The testing stage is relatively straight forward and involves the application of rules to a previously unseen set of data in order to predict mode choice. Fortunately the actual mode choice is known so it is therefore possible to evaluate the predictive ability. This information is usually presented in a confusion matrix [26] which contains the actual mode choices as rows and the predicted mode choices as columns.