To evaluate supplier OSI-420 the mode choice modeling performance of the rough sets, two prediction indicators are defined: accuracy of prediction and coverage of prediction. They, respectively,
reflect the modeling performance on individual and aggregate level. Accuracy of prediction (γi) or hit ratio is the ratio of the number of correctly predicted individual observations for one mode (Npi) over the total number of the actual observations choosing this mode (Na), expressed as ri=NpiNa. (4) Coverage of prediction (ra) reflects the prediction accuracy on the mode aggregate level, defined as the ratio of the number of predicted observations (including correctly and incorrectly predicted observations) for one mode (Npa) over the number of the actual observations
choosing this mode (Na), expressed as ra=NpaNa. (5) The accuracy is always less than 1 while the coverage may be greater than 1 or less than 1, with the accuracy rate being always no more than coverage rate. In the context of rough sets classification, accuracy alone is not a meaningful measure since the coverage affects how many classification attempts are made. Therefore, in this paper, accuracy and coverage are both utilized as the performance measures. 5. Applications to Travel Diary Survey The software used to produce the results in this study is Rosetta [27]. In the application of knowledge discovery procedures to datasets, it is important that overfitting does not take place. This means that data used to derive the knowledge during the training stage are not the same as those used to test the knowledge. There are standard procedures to ensure that this does not take place. Where there is a limited amount of data, a k-fold procedure is adopted
where the data is split into k mutually exclusive parts and then k training and testing procedures are conducted, but during each procedure one of the k parts is not used during the training stage but is held back for testing purposes. An alternative where there is sufficient data is to partition the data into two parts, one for exclusive training purposes and another for exclusive testing purposes. Since the travel data available in this study is Drug_discovery large, it is this partition approach which has been adopted here. The data has been randomly split into two parts, 1/2 for the model estimation and another 1/2 for the subsequent validation test. The actual mode split proportions in the total database as well as the training set and testing set are shown in Table 3. Table 3 Summary of the mode splits in the datasets. 5.1. Approximation and Reduct The accuracy of approximation is used to describe completeness of knowledge about decision attribute (travel mode) that could be obtained from condition attributes. As depicted in Table 4, foot shows the highest accuracy value of 91.9%. Other modes also have relatively good accuracy.