Novel Approach to Tourism Analysis with Multiple Outcome Capability Using Rough Set Theory

Chun-Che Huang; Tzu-Liang (Bill) Tseng†; Kun-Cheng Chen

doi:10.1080/18756891.2016.1256574

<Previous Article In Issue

Download article (PDF)

Next Article In Issue>

Volume 9, Issue 6, December 2016, Pages 1118 - 1132

Novel Approach to Tourism Analysis with Multiple Outcome Capability Using Rough Set Theory

Authors

Chun-Che Huang¹^,cchuang@ncnu.edu.tw, Tzu-Liang (Bill) Tseng†²^,btseng@utep.edu, Kun-Cheng Chen³^,stylishjackal@gmail.com

¹Department of Information Management, National Chi Nan University, University Road, Pu-Li, Nantau 545, Taiwan

²Department of Industrial, Manufacturing and Systems Engineering, The University of Texas at El Paso El Paso, Texas 79968

³Department of Information Management, National Chi Nan University, University Road, Pu-Li, Nantau 545, Taiwan

Received 11 April 2015, Accepted 19 December 2015, Available Online 1 December 2016.

DOI: 10.1080/18756891.2016.1256574 How to use a DOI?
Keywords: Rough Sets; Tourism; Customer Satisfaction; Decision Making; Rule Induction
Abstract: To explore the relationship between characteristics and decision-making outcomes of the tourist is critical to keep competitive tourism business. In investigation of tourism development, most of the existing studies lack of a systematic approach to analyze qualitative data. Although the traditional Rough Set (RS) based approach is an excellent classification method in qualitative modeling, but it is can’t deal with the case of multiple outcomes, which is a common situation in tourism. Consequently, the Multiple Outcome Reduct Generation (MORG) and Multiple Outcome Rule Extraction (MORE) approaches based on RS to handle multiple outcomes are proposed. This study proposes a ranking based approach to induct meaningful reducts and ensure the strength and robustness of decision rules, which helps decision makers understand tourist’s characteristics in a tourism case.
Copyright: © 2016. the authors. Co-published by Atlantis Press and Taylor & Francis
Open Access: This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

1. Introduction

As tourism becomes the largest industry in the world, its impact on the global economy is widely recognized. To keep business competitive, one of the key and preliminary elements in the planning process is to study characteristics of customers in order to understand the relationship between tourist tasks and decision ( Zalatan, 1998). That is to customize their product offerings and strengthen marketing policies to an appropriate degree in order to respond to the shifting importance and growing complexity of customer choice. To date, most of the existing causal studies of tourism research apply quantitative models that use mathematical functions. These models are established on the basis of many statistical assumptions and limitations. However, it is unrealistic to assume that all tourism data are numeric. The tourism industry often possesses numerous qualitative data in flight scheduling databases, retail store employee files, and airline reservation repositories ( Law & Au, 2000). The qualitative nature of the data makes them difficult to analyze by standard statistical techniques.

The rough set approach has been developed as an interesting technique for discovering knowledge from data (Pawlak, 1991a). It is suitable for processing qualitative information that is difficult to be analyzed by standard statistical techniques ( Heckerman, Mannila, Pregibon, & Uthurusamy, 1997). Moreover, it provides an effective means for analysis of data by synthesizing or constructing approximations (upper and lower) of set concepts from the acquired data ( Pal & Mitra, 2003). One of the reasons for using the rough set as an approach to mine data is due to the fact that the qualitative nature of the data (e.g., gender and vegetarian) being analyzed makes it difficult for applying standard statistical techniques ( Heckerman et al., 1997; Simoudis, Han & Fayyad,1996).

Traditional rough set theory has no ability to consider multiple outcomes. It’s hard to analyze a data set with more than two outcomes, only then do segments and reduct and provide one single consideration for each outcome. Therefore, this study proposes an approach that applies the rough sets theory for solving multiple outcome situations. Moreover, the direct use of the result, provided by the reduct generation procedure, may lead to many reducts that contain features that are not meaningful. In order to identify meaningful reducts and ensure the strength and stability of rules, this study proposes a ranking based approach to induct and apply each reduct to the test data set corresponding to the training set and collect information regarding the support and accuracy of the reduct in new data for rule extraction.

The objective of this study is to propose a novel reduct extraction approach for multiple outcomes in a tourism case. The aim is to understand the preferences of the customers there for exploring relationship between tourist tasks and decision. The remaining part of this study is organized as follows: Section 2 presents literature review related to tourism and several rough set based approaches. Section 3 develops the solution approach to operate multi-outcomes, while the tourism related case study and computation results based on the proposed approach are depicted in Section 4. Finally, Section 5 concludes the findings and provides suggestions for future research.

2. Literature Review

In this section, literature review related to tourism and rough set based approaches is presented.

2.1 Tourism

Nowadays tremendous growth of tourism as a field of study, coupled with increasing demand for tourism education has led to a heightened focus on research and publications ( Jogaratnam, Chon, McCleary, Mena, & Yoo, 2005). This field has seen the development of numerous associations, journals and research consortia, and is linked to groups of researchers in other domains such as business and information technology ( Racherla & Hu, 2010). Research topics in the tourism domain are various, for example, sustainability tourism ( Divino & McAleer, 2009; Miller &Twining-Ward, 2006), marketing ( Butcher, 2010; Frochot, 2010), data mining and knowledge discovering (Buultjens, Brereton, Memmott, Reser, Thomson & O’Rourke, 2010; Liao, Chen & Deng, 2010), quantitative approach ( Enright & Newton, 2004; Jang, 2004) such as rules induction ( Au & Law, 2002; Cardoso, Themido & Pires, 1999).

Walle (1997) argued that formal scientific methods are needed in tourism research. This would ensure generality of the empirical outcomes as well as maintain the quality and credibility of research findings. Ever since the introduction of formal statistical techniques to the tourism field, researchers have been focus on the application of mathematical functions to model the quantitative relationships between some phenomena in tourism data. At present, the quantity of tourism demanded modeling techniques consist primarily of time series model ( Nelson, Dickey & Smith, 2009), needs-functions model to consider more fully the activities that occur in different settings from the market through to the destination and from one sector to another. ( Pearce, 2008), dynamic model (Garín-Muñoz & Montero-Martín, 2007), Bayesian model ( Wang, Hsiehb, Yehc & Tsai, 2004), and the use of statistical techniques to model the phenomena of numeric tourism oriented data in mathematical functions has achieved certain degree of success in relationship modeling and forecasting ( Witt & Witt, 1995). Moreover, most of the existing studies on tourism demand forecasting apply economic models that use mathematical functions, which require many statistical assumptions and limitations ( Law, Goh, & Pine, 2004).

However, it is unrealistic to assume that all tourism data are numeric. The tourism industry often possesses numerous qualitative data in flight scheduling databases, retail store employee files, and airline reservation repositories ( Law & Au, 2000). The qualitative nature of the data makes them difficult to analyze by standard statistical techniques. In addition, due to quick changing of information technology (IT) today, some techniques (i.e. rough sets and data mining tools) have become important research trends to both practitioners and academicians ( Chen & Cheng, 2009). One of the solutions to resolve the issues is the rough set approach. A rough set approach can capture useful information from a set of mixed data and output this information in the form of decision rules ( Pawlak, 1984). This makes the rough set approach an excellent rule induction method in qualitative modeling. Therefore, this study proposed a new approach that applies the rough sets theory. Having discussed Information System (IS), information reduction and decision rules induction in the next section.

2.2 Rough Set Approach

The Rough Set Model (RSM) was proposed by Pawlak in the early 1980s ( Pawlak, 1982) as a powerful mathematical tool to deal with imprecise or vague concepts. It allows users to classify objects into sets of equivalent members based on their attributes ( Dasgupta, 1988). In recent years, a rapid growth of interest in rough set theory and its applications are witnessed ( Swiniarski & Skowron, 2003) through demonstrating that rough set model is applicable to a wide range of practical problems pertaining to supplier selection ( Bai & Sarkis, 2010), investment portfolios ( Shyng, Wang, Tzeng & Wu, 2010), marketing ( Cheng, Chen, & Lin, 2010), EEG analysis ( Ningler, Stockmanns, Schneider, Kochs & Kochs 2009), multiple criteria classification ( Zhang, Shi & Gao, 2009; Dembczyński, Greco & Słowiński, 2009), expert system ( Shao, Chu, Qiu, Gao & Yan,, 2009), time series analysis ( Yao & Herbert, 2009; Teoh, Cheng, Chu & Chen, 2008; Sarkar, 2006), image compression & segmentation ( Mushrif & Ray, 2008; Petrosino & Ferone, 2008), boolean reasoning ( Pawlak & Skowron, 2007).

The rough set theory offers a large collection of tools for knowledge discovery from data. Many of these methods, like decision rule induction, classifier construction, discretization, decision tree construction, and representative association rule induction, are based on computing the most relevant sets of attributes called reducts ( Nguyen, 2003). A reduct is defined as a minimal sufficient subset of a set of attributes, which has the same ability to discern concepts as when the full set of attributes is used (Pawlak, 1991a; Ziarko, 1994). Basically, the reducts represent necessary condition attributes in decision making. Furthermore, a subset of the attributes has more than one reduct; hence the simplification of reducs (called reduct rues) is required to extract significant rules (called decision rules) does not yield unique results. Therefore, a rough et based approach ofen contains two parts: Rduct generaion and decision rule extraction.

2.2.1 Reduct Generation

According to the rough set theory, I = {U, A} is an information system, where U is a finite set of objects and A is a finite set of attributes. With every attribute a ∈ A, a set of its values V_a is associated. Assume A = C ∪ D, B ⊂ C, where B is a subset of C; the positive region POSB(D) = {x ∈ U: [x]B ⊂ D} can be defined. The positive region POSB(D) includes all objects in U which can be classified into classes of D, in the knowledge B. The degree of dependency between B and D can be defined as:

(1)K(B,D)=card(POSB(D))card(POSC(D)),

where card yields the set cardinality. In general, if K(B, D) = K(C, D), and K(B, D) ≠ K(B−{a}, D), for any a ∈ B are held; then B is a reduct of C. Since a reduct (B) preserves the degree of dependency with respect to D and a reduct (B) is a minimal subset, any further removal of condition attributes will change the degree of dependency. The most useful procedure for determining the reducts can be referred in the (Pawlak, 1991b).

In order to find dispensable attributes, an examination of each attribute of the object is required. It might also have to drop one attribute at a time and check whether the intersection of the remaining attributes is still included in the decision attribute. In most case, if one or two features in a reduct formed, the procedure stops because too many features to form a reduct rule will reduce advantage of the rough set theory. The reduct rule would be useless without finding critical features. However, the case of multiple outcomes fairly common in the real world hasn’t been seriously considered in traditional rough set theory. Bai and Sarkis (2010) define the data set with ten conditional attributes and two decision attributes for the illustrative application.

The reduct generation procedure is required for improvement and evolving into a powerful tool for decision makers and researchers, especially in complex decision environments associated with multiple outcomes. For example, the multiple outcomes table in Table 1 represents the final classifications. Objects 1 and 2 have the same value of features but difference outcomes ({X₁} = [R, 3, 0]_o1,o2,o3 ≠ {X₂} = [A, 3, 0]_o1,o2,o3, and so do objects 3 and 4 in Table 1. After performing the reduct search procedure, no other reduct was generated.

Table 1.

The multiple outcomes.

2.2.2 Rule Induction

The direct use of the results provided by the reduct generation procedure may lead to many reducts that contain features that are not meaningful. And another challenge of rough set based rule induction is the fact that finding all reducts is NP-hard ( Ziarko, 1990). Law and Au (2000) present a decision rules induction approach that incorporates rough set theory to model the relations that exist among a set of mixed numeric and non-numeric tourism shopping data. The decision rules induction approach was formed a decision rule is in the form of IF_condition(s)_THEN_decision(s). But without consideration of threshold (e.g., a strength index), it may generate a few redundant rules Furthermore, the relationship among the reducts is not explored. In order to identify meaningful reducts and ensure the strength and stability of rules, a comparison between the rules and how they react with data must be carried out. In previous studies, a rule extraction algorithm ( Tseng and Huang 2007) has been developed for discovering preference-based rules according to the reducts which contain the maximum strength index, and relatively small number of induced rules with reasonable domain interpretation. However, the desired reducts may not be unique. Extension of the strength index is required for improvement.

In addition, the RS approach has to encode outcomes of decision table to distinguish the individual outcome features, for example, the outcome (1, 2) is different from (1, 3), but both are included in (1, ^*). It requires a coding tool to represent the hierarchy of the outcomes. From those points of view, this study proposed a multiple outcome reduct generation algorithm and a multiple outcome rule extract algorithm based on rough set theory to generated multiple outcome rules.

Through literature review, the desires to developing the multiple outcome reduct generation algorithm and multiple outcome rule extraction algorithm based rough set are as follows:

1.
The tourism industry often possesses numerous qualitative data in flight scheduling databases, retail store employee files, and airline reservation repositories (Law & Au, 2000). The qualitative nature of the data makes them difficult to analyze by standard statistical techniques. One of the solutions to resolve the issues is the rough set approach, which can capture useful information from a set of mixed data and output this information in the form of decision rules (Pawlak, 1984).
2.
Although numerous maching learning approaches have considered multiple outcomes (Richard et al., 2013; Thomas et al., 2008), unfortunately most rough set approaches do not consider the issue of multiple outcomes. For example, if there is a dataset withthree features and two outcomes, the traditional reduct generation procedure has to be performed twice for each outcome. In addition, a decision rule always consists of a single outcome. It is not easy to understand the commonality of rules in those two outcomes;

Therefore, next, the multiple outcomes problem is defined.

3. Mutiple Outcome Problem

In this section, a multiple outcome table for the representation of the relationship between condition features and outcome features is used. Alternatively, a feature and an attribute are exchanable due to their similarity. In Table 2, the element (e_ij) denotes the value of feature (F_j) (j = 1 to m) that an object (tuple) (X_i) (i = 1 to n) contains, while O_j (j = 1 to p) depicts the different outcome set of the corresponding tuple. In addition, the number of values is incorporated in this table.

Condition features (F_j)					Outcomes features (O_p)
Object (X_i)	F₁	F₂	…	F_m	Outcome(O_j)	O₁	O₂	…	O_p
1	e₁₁	e₁₂	…	e_1m	1	o₁₁	o₁₂	…	o_1p
2	e₂₁	e₂₂	…	e_2m	2	o₂₁	o₂₂	…	o_2p
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮
n	e_n1	e_n2	…	e_nm	n	o_q1	o_q2	…	o_qp

Table 2.

Fundamental structure of a multiple outcome table.

In this study, the RS multiple outcomes problem is defined as:

Given

•
An information set I = (U, A), where U is a finite set of objects and A is a finite set of features. The elements of A are called condition features.
•
A decision table I = (U, A ∪{d₁, d₂,…, d_n}), where any d_i in {d₁, d₂,…, d_n} ∉ A is a distinguished outcome feature.

Objective

Minimize the subset of U, in which the specific outcome feature of decision rules (r_i) has the highest Strong Index (SI) and the maximal level number.

Subject to the following constraints:

1.
A = C ∪ D,
2.
B ⊂ C
3.
POSB(D) = {x ∈ U: [x]B ⊂ D}
4.
For any a ∈ B of C, K (B, D) = K (C, D), and K (B, D) ≠ K(B−{a}, D)

where, C is the condition domain and D is the outcome domain.

POSB(D) is the positive region and includes all objects in U which can be classified into classes of D, in the knowledge B.

K(B,D)=card(POSB(D))card(POSC(D)) is the degree of dependency between B and D

Attribute a ∈ A, set of its values

4. Solution Approach

The solution approach includes two algorithms: The Multiple Outcome Reduct Generation (MORG) algorithm and Multiple Outcome Rule Extraction (MORE) algorithm.

4.1 The MORG Algorithm

The MORG algorithm generates reducts with multiple outcomes. All steps are described as follows: (1) Input training data set and initialize each object in the beginning; (2) Generate interaction set and combination set of outcomes; (3) Check all subsets of value sets of outcomes, if the interaction set is a subset of value set of outcomes, and then generate combination set of features; (4) Check all subsets of value sets of outcomes, if the value set of features is a subset of value set of outcomes, and then output the result into reduct table; (5) Stop the multiple outcome reduct generation procedure and repeat the above steps until running out all objects. Figure 1 illustrates the step by step instructions to describe the MORG algorithm.

Notations:

T_a:: The training data set used for training rules.

T_reduct:: The ad hoc data set used for store the results.

Out_N:: The total number of outcomes.

Fea_N:: The total number of features.

F_i,j:: The objects where each F_,j feature contains F_i,j.

O_i,p:: The objects where each O_,p outcome contains O_i,p.

F_Interaction():: A generation features interaction set function. E.g., F_Interaction(i) = {[F_i,1]}∩ {[F_i,2]} ∩ {[F_i,3]} ∩ … ∩ {[F_i,final]} = [F_i] ∩ T_a.

F_Combination():: A generation combination set function. E.g., F_Combination(3, 2) can generate a set of two possible from a set {1, 2, 3}, and the result is {12, 13, 23}, i.e., F_Combination(3, 2) is equal to C23.

The Multiple Outcomes Reduct Generation (MORG) Procedure

Input: T_a.

Output: T_reduct.

Step 1. Initialization: List all objects in T_a.

Step 2. Generate the reducts for each object.

Step 2.1. for i = 1 to n do

      Loop:

Step 2.2. Set new_Out_Set = ∅

Step 2.3. Set mini_Set = F_Interaction(i)

Step 2.4. for out = Out_N to 1

Step 2.5. Set comb_ Out_Set = F_Combination(Out_N ,out)

Step 2.6. for each p in comb_ Out_Set

Step 2.7. if mini_Set ⊂ {[O_,p] ∩ [O_i,p]} then Add p into new_Out_Set

        endfor

Step 2.8. if new_Out_Set = ∅ then Go to Step2.4

Step 2.9. for fea = 1 to Fea_N − 1

Step 2.10. Set comb_ Fea_Set = F_Combination(Fea_N ,fea)

Step 2.11. for each p in new_Out_Set

Step 2.12. for each j in comb_Fea_Set

Step 2.13. if {[F_,j] ∩ [F_i,j]} ⊂ {[O_,p] ∩ [O_i,p]}

Step 2.14. then the reducts for X_i is formed

Step 2.15. else the reducts for X_i is not formed

  endfor

  endfor

if any reducts for X_i is formed

then break Loop

else the reducts for X_i is not formed

  endfor

  endfor

  endfor

Step 3. Termination: Stop and output the results into T_reduct.

4.2 The MORE Algorithm

The steps in the MORE algorithm include: (1) Input training data set, (2) Test data set, (3) Reduct table, (4) Rule table, and initialization in the beginning. Moreover, calculation of four indexes (support and accuracy of the training data set, support and accuracy of the test data set) into the information table for each reduct is required. For each reduct, it is required to calculate four indexes. If any values of the index is lower than the threshold value determined by the expert, then remove the reduct from information table. The decision maker requires to sort the rank of four indexes and the obtained ranking table id. Figure 2 presents the step by step procedure to describe the MORE algorithm. In order to determine which rules are the strongest ones, comparison between the rules and how they react with data must be made ( Yao & Herbert, 2009). Individual rules are ranked according to their strengthes (support and accuracy) and stability, which refers to the changes to support and accuracy from the training data set to testing data set. An overall rank can be determined based on the support and accuracy rankings.

Notations:

T_v:: The test data set used for validate rules.

T_info:: Information table (There are six columns in this table: support and accuracy of training data set, support and accuracy of test set, change in support (%) and change in accuracy (%).).

T_Rank:: Ranking table (There are five columns in this table: total support, average accuracy, change in support, change in accuracy and final rank for each rule.).

F_CompareF():: A function is used for comparing the value of features of a particular rule with value of features of T_a and T_v.

F_CompareO():: A function is used for comparing the value of outcomes of a particular rule with value of outcomes of T_a and T_v.

F_Threshold():: A function is used for extracting the desired rule.

Sup_N:: The total mach number of specific rules in the data set.

Acc_N:: The total mach number of specific rules in the data set divided by the total mach number of features of specific rule.

The Multiple Outcome Rule Extract (MORE) Procedure

Input: T_a,T_v, T_reduct and T_rule.

Output: T_info and T_Rank.

Step 1. Initialization: List all reduct from T_reduct and combine duplicate rule into T_rule.

Step 2. Calculate four indices value into T_info for each reduct.

Step 2.1. for each row i in T_rule

Step 2.2. Set Sup_N = 0 and Acc_N = 0

Step 2.3. for each row j in (T_a) & (T_v)

Step 2.4. if F_CompareF(i, j) = 1

Step 2.5. Set Sup_N += number of objects of row j

Step 2.6. if F_CompareO(i, j) = 1

Step 2.7. Set Acc_N += number of objects of row j

end for

Step 2.8. Set Support = Sup_N / total no. of objects. & Accuracy = Sup_N / Acc_N

Step 2.9. Assign the values of support and accuracy into T_info end for

Step 2.10.Calculator change in support (%) and change in accuracy (%) for each rule.

Step 3. Extract the rule by F_Threshold().

Step 4. Sort the rank of four indices and Finalize the ranking from T_info into T_Rank.

Step 5. Output T_info and T_Rank.

5. Case study

The objective of this case study is to extract the rules to reveal the tourist’s characteristics. The experiment results focus on the use of rules to classify the different features of the tourists and find out what they really care about; to help the tourism service providers satisfy the tourists with the desired travel package.

Middelkoop (2004) argues that tourists do not necessarily maximize their utility in selecting a travel mode; rather, their choice behavior is context dependent. Given particular conditions related to their family, the environment, and other aspects of the tourist experience, they are assumed to use different heuristics. Taking a vacation involves a decision-making process which may be simple or complex. Therefore, to study characteristics of tourist and to understand the relationship between tourism and decision, it is required to induce decision rules through historical data sets. In short, different characteristics of a tourist might influence the decision-making. The characteristics of a tourist are such as household income, number of trips yearly, presence of children, tourist’s age, vegetarian, pet ownership, etc., all have a certain influence on the decision-making process ( Harcar, Spillan & Kucukemiroglu, 2005; Zalatan, 1998). Moreover, Zalatan (1998) segments tourism decision-making tasks into 17 categories (see Table 3). And in general, arranging transportation, selecting restaurants and accommodation, purchase insurance, choosing the destination, purchasing tickets, etc., are included in the tourist package deal. So six characteristics are used in features and six tasks are used in outcomes in this study. Tables 4 and 5 present the description and the domain range of each input feature and the final outcomes, respectively. Two data sets are used for validating the solution approach, a training data set (20 objects) and a testing data set (10 objects).

1. Initial Trip Tasks	I. Determining Vacation Date II. Selecting Destination III. Collecting Information
2. Financing Tasks	I. Arranging Financing II. Tourists Cheques III. Purchasing Travel Package IV. Purchasing Tickets
3. Pre-departure Tasks	I. Transportation Arrangements II. Accommodation Arrangements III. Preparing Luggage IV. Other Tasks (e.g., medical, insurance)
4. Destination Tasks	I. Selecting Tourist Sites II. Length of Time at each Site III. Selecting Restaurants IV. Selecting Accommodation V. Shopping Expenditures VI. Other Decisions (e.g., special events)

Table 3.

The tourism decision-making tasks list.

No	Features name and domain

	F₁^a Household income	F₂^b No. of trips	F₃ Number of kids	F₄ Age	F₅ Vegetarian	F₆ Pet ownership
1	< 100 (L)	< 1 (N)	< 1 (N)	< 25 (YO)	Yes (Y)	Yes (Y)
2	100∼200 (M)	1∼2 (L)	1∼2 (L)	25∼50 (MA)	No (N)	No (N)
3	> 200 (H)	3∼4(M)	3∼4(M)	> 50 (OA)
4		> 4 (H)	> 4 (H)

Note:

a:

F₁ unit is US dollars.

b:

F₂* unit is a year.

Table 4.

Categorization of the features.

No.	Outcomes name and domain

	O₁ Selecting Destination	O₂ Purchasing Tickets	O₃ Arrangements Transportation	O₄ Arrangements Accommodation	O₅ Insurance	O₆ Selecting Restaurants
1	Short (S)	Yes (Y)	KTX (K)	Yes (Y)	Yes (Y)	Yes (Y)
2	Medium (M)	No (N)	Bus (B)	No (N)	No (N)	No (N)
3	Long (L)		Train (T)
4			Car (C)
5			Other (O)

Table 5.

Categorization of the outcomes.

After performing the reduct generation procedure in using training data set, the list of resulting reducts in Table 6 is obtained. The heuristic procedure is applied to determine the reducts.

Object No.	Reduct No.	Rule No.	F₁	F₂	F₃	F₄	F₅	F₆	O₁	O₂	O₃	O₄	O₅	O₆
X₁ & X₂ & X₃	1	1	*	*	*	*	*	Y	*	*	C	Y	*	Y

X₄ & X₅	1	2	M	*	M	*	*	*	S	N	C	Y	*	Y
	2	3	*	M	*	*	*	Y	S	N	C	Y	*	Y
	3	4	*	*	M	MA	*	*	S	N	C	Y	*	Y
	4	5	*	*	M	*	*	Y	S	N	C	Y	*	Y
X₆	1	6	L	H	*	*	*	*	M	Y	C	Y	Y	Y
	2	7	L	*	H	*	*	*	M	Y	C	Y	Y	Y
	3	8	L	*	*	*	*	Y	M	Y	C	Y	Y	Y
	4	9	*	H	H	*	*	*	M	Y	C	Y	Y	Y
	5	10	*	H	*	*	*	Y	M	Y	C	Y	Y	Y
	6	11	*	*	H	YO	*	*	M	Y	C	Y	Y	Y
	7	12	*	*	H	*	*	Y	M	Y	C	Y	Y	Y
	8	13	*	*	*	YO	*	Y	M	Y	C	Y	Y	Y

X₇ & X₈	1	14	M	*	H	*	*	*	S	*	T	Y	Y	*
	2	15	*	L	H	*	*	*	S	*	T	Y	Y	*
	3	16	*	L	*	*	*	N	S	*	T	Y	Y	*
	4	17	*	*	H	MA	*	*	S	*	T	Y	Y	*
	5	18	*	*	H	*	*	N	S	*	T	Y	Y	*

X₉ & X₁₆	1	19	L	N	*	*	*	*	*	N	*	*	Y	*
	2	20	L	*	L	*	*	*	*	N	*	*	Y	*
	3	21	L	*	*	*	*	N	*	N	*	*	Y	*
	4	22	*	N	L	*	*	*	*	N	*	*	Y	*
	5	23	*	*	L	YO	*	*	*	N	*	*	Y	*
X₁₀ & X₁₁ & X₁₄	1	24	*	*	N	*	*	*	L	*	O	*	*	*

X₁₂	1	25	H	*	L	*	*	*	M	Y	B	N	N	Y
	2	26	*	H	L	*	*	*	M	Y	B	N	N	Y
	3	27	*	H	*	OA	*	*	M	Y	B	N	N	Y
	4	28	*	*	L	OA	*	*	M	Y	B	N	N	Y
	5	29	*	*	*	OA	N	*	M	Y	B	N	N	Y

X₁₃	1	30	H	N	*	*	*	*	S	N	T	N	N	N
	2	31	*	N	M	*	*	*	S	N	T	N	N	N
	3	32	*	*	M	YO	*	*	S	N	T	N	N	N

X₁₅	1	33	H	*	N	*	*	*	L	N	O	N	Y	Y
	2	34	*	H	N	*	*	*	L	N	O	N	Y	Y
	3	35	*	*	N	YO	*	*	L	N	O	N	Y	Y

X₁₇ & X₁₈ & X₁₉ & X₂₀	1	36	*	*	*	*	Y	*	*	*	*	Y	*	Y

Table 6.

The list of resulting reducts (T_reduct).

Next the MORE algorithm is applied. The information in Table 7 is computed and the ranking of the rules in Table 8 are obtained. Table 7 clearly shows the strength or weakness of the rule through its relative strength and stability measure. Therefore, through the threshold function, weak and unstable rules are not extracted and incorporated into the ranking table. Note that the unselected rules are marked in gray in Table 7. Moreover, Table 8 shows the ranking of the rules in detail. Basically, the heuristic procedure is applied to extract the rules. For example, there is significsnt discrepancy (e, g., 20% diffeerence) in the change of support and accuracy from the training data to the test data. These rules, for example, #2, #3, #6, etc. (in gey color) are concluded as an unstable rule and excluded in decision rules, since the margin of error of 20% is considerably high from professosional perspective in the field. The rules, for example #1, #9, #12, etc in Table 7 show the low percentage of change in support and no change in accuracy. Basiaccly, they are highly stable and will be included in final decision rules. Furthermore, a threshold function, F_Threshold(), to extract the rules is described as follows: If the candidate rule and its accuracy is higher than 0.7 and then the rule is extract as final decision rules.

Table 7.

The information table (T_info).

Rule no.	Total support	Total accuracy	Change in support	Change in accuracy	Final rank
1	1	1	7	1	4
4	6	10	4	10	10
5	6	10	4	10	10
9	11	1	8	1	6
12	11	1	8	1	6
14	4	1	1	1	1
15	4	1	1	1	1
21	9	1	6	1	5
24	3	1	3	1	3
25	8	13	11	13	13
32	10	1	10	1	8
33	11	1	12	1	9
36	2	12	13	12	12
Weight	1	0.8	0.9	0.7

Table 8.

The ranking of the rules (T_Rank).

According to the final rankings, Table 9 shows the first five rules with their rankings, where the relationships between the characteristics of the tourist’s and tourism decision-making tasks. Rules could be interpreted as if the tourist conforms to those features, and then infers he/she would be more concerned about what tasks are on travel arrangements. For example, the fourteenth and the fifteenth rules are the most accurate and stabile. The forteenth rule shows if a tourist has a household income of $100∼200K and has more than four kids; then he/she might care about some tourism tasks like take a short destination trip by train, purchase pre-arranged accommodation and insurance.

Rule No.	Rank	Contents of rule
14	#1	IF (Household income = 100∼200) AND (No. of kids = > 4)
14	#1	THEN (Selecting Destination = Short) AND (Arrangements Transportation = Train) AND (Arrangements Accommodation = Yes) AND (Insurance = Yes)

15	#1	IF (No. of trips = 1∼2) AND (No. of kids = > 4)
15	#1	THEN (Selecting Destination = Short) AND (Arrangements Transportation = Train) AND (Arrangements Accommodation = Yes) AND (Insurance = Yes)

24	#3	IF (Number of kids = < 1)
24	#3	THEN (Selecting Destination = Long) AND (Arrangements Transportation = Other)

1	#4	IF (Pet ownership = Yes)
1	#4	THEN (Arrangements Transportation = Car) AND (Arrangements Accommodation = Yes) AND (Selecting Restaurants = Yes)

21	#5	IF (Household income = < 100) AND (Pet ownership = No)
21	#5	THEN (Purchasing Tickets = No) AND (Insurance = Yes)

Table 9.

Induced decision rules.

Table 10 depicts comparison of coverage between the MORE algorithm and the traditional rough set reduct procedure in this case.

Algorithm	Number of reducts	Number of objects can generate reduct	Total number of Objects	Coverage
The MORE	36	20	121	100%
The reduct generation procedure through RST	19	4	16	13%

Table 10.

Comparison of coverage between the MORE algorithm and the traditional rough set reduct procedure in this case study.

Algorithm	10³ objects		10⁶ objects		10⁹ objects

	4 outcome	10 outcomes	4 outcome	10 outcomes	4 outcome	10 outcomes
The MORE	83.3%	81.5%	78.5%	76.0%	72.3%	71.5%
The traditional reduct generation procedure	15%	14.7%	14.3%	13.1%	13.8%	13.7%

Table 11.

Comparison of coverage between the MORE algorithm and the traditional rough set reduct procedure in the random dat set

Using the traditional RS approach, the results show there are only four objects can generate reducts. Although the MORE algorithm generate nineteen reducts and each reudct has 100% accuracy, but the total number of four objects is 13% (16 divided by 121) of the total number of training data. That means these reducts can make some special cases in the training data set only.

In addition, to validate practicability of the proposed approach, an experiment with 23 conditional features in different data was made. The benefits of the proposed approach are to extract more rules such that the coverage is higher than the traditional RS approach that distinguishes the individual outcome features. For example, the outcome (1, 2) is different from (1, 3), but it can not both be included in (1,*) with one unique entry for the outcome feature. Comparison of coverage between the MORE algorithm and the traditional rough set reduct procedure is presented in Tabe 11.

6. Conclusion

In tourism, most of the existing studies lack a method to analyze the qualitative nature of the data. Although the traditional rough set approach is an excellent classification method in qualitative modeling, but the approach has no consideration for multiple outcome situations. This study presents a novel solution approach, namely the MORG and MORE algorithms, to generate reduct and extraction rule. Moreover, a case study of tourism with multiple outcomes is used as an example and is compared the results with a traditional rough set and rule extraction algorithm. The contributions of this paper are summarized as follows: (1) The MORG algorithm overcomes traditional rough set’s drawbacks. The algorithm is not limited to analyze data set with more than a single outcome; (2) The MORE algorithm has stronger rule extraction capability. The algorithm is able to extract decision rules based both on strength and stability. And determines which rules are the strongest by rank.

To date, there are vey few work on multiple outcome rule induction, future research could focus on the following: (1) Using sensitivity methodology to assess the effects of features in tourism decision-making tasks; (2) Defining threshold values and weight by experts that are used to extract and rank rules; (3) In addition, unknown or incorrect features and outcome values often occur in real world applications. How to pre-process the data set is important; (4) Another consideration in reducing the set of rules is to determine ‘infeasible’ rules. Even the initial rules generated may miss some lower level performance results that cause infeasibility to occur (Bai & Sarkis, 2010); (5) Performance measurement of the proposed approach could include the precision recall, Receiver Operating Characteristic (ROC) curve; (6) More data sets and larger sample size will be used to validate the model.

Acknowledgments

This work was partially supported by the Ministry of Science and Technology in Taiwan (NSC 102-2410-H-260-045-MY3, MOST 105-2410-H-260 -019 -MY2) and the National Science Foundation (DUE-TUES-1246050). The authors wish to express sincere gratitude for their financial support.

References

1.N Au and R Law, “Categorical classification of tourism dining”, Annals of Tourism Research, Vol. 29, No. 3, 2002, pp. 819-833.

2.C Bai and J Sarkis, “Green supplier development: analytical evaluation using rough set theory”, Journal of Cleaner Production, Vol. 18, No. 12, 2010, pp. 1200-1210.

3.K Butcher, “Marketing in Travel and Tourism”, Annals of Tourism Research, Vol. 37, No. 1, 2010, pp. 280-281.

4.J Buultjens, D Brereton, P Memmott, J Reser, L Thomson, and T O’Rourke, “The mining sector and indigenous tourism development in Weipa, Queensland”, Tourism Management, Vol. 31, No. 5, 2010, pp. 597-606.

5.MGMS Cardoso, IH Themido, and FM Pires, “Evaluating a clustering solution: An application in the tourism market”, Intelligent Data Analysis, Vol. 3, No. 6, 1999, pp. 491-510.

6.YS Chen and CH Cheng, “Evaluating industry performance using extracted RGR rules based on feature selection and rough sets classifier”, Expert Systems with Applications, Vol. 36, No. 5, 2009, pp. 9448-9456.

7.JH Cheng, HP Chen, and YM Lin, “A hybrid forecast marketing timing model based on probabilistic neural network”, rough set and C4.5, Expert Systems with Applications, Vol. 37, No. 3, 2010, pp. 1814-1820.

8.P Dasgupta, “Rough sets and information retrieval”, in Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, 567–581, 1988.

9.K Dembczyński, S Greco, and R Słowiński, “Rough set approach to multiple criteria classification with imprecise evaluations and assignments”, European Journal of Operational Research, Vol. 198, No. 2, 2009, pp. 626-636.

10.JA Divino and M McAleer, “Modelling sustainable international tourism demand to the Brazilian Amazon”, Environmental Modelling & Software, Vol. 24, No. 12, 2009, pp. 1411-1419.

11.MJ Enright and J Newton, “Tourism destination competitiveness: a quantitative approach”, Vol. 25, No. 6, 2004, pp. 777-788.

12.I Frochot, “Tourism and hospitality marketing – A global perspective”, Tourism Management, Vol. 31, No. 5, 2010, pp. 692-693.

13.T Garín-Muñoz and LF Montero-Martín, “Tourism in the Balearic Islands: A dynamic model for international demand using panel data”, Tourism Management, Vol. 28, No. 5, 2007, pp. 1224-1235.

14.T Harcar, JE Spillan, and O Kucukemiroglu, “A multi-national study of family decision-making”, Multinational Business Review, Vol. 1, No. 2, 2005, pp. 3-21.

15.D Heckerman, H Mannila, D Pregibon, and R Uthurusamy, AAAI Press, in Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (Menlo Park, CA, 1997).

16.SCS Jang, “Mitigating tourism seasonality: A quantitative approach”, Annals of Tourism Research, Vol. 31, No. 4, 2004, pp. 819-836.

17.G Jogaratnam, K Chon, K McCleary, M Mena, and J Yoo, “An analysis of institutional contributors to three major academic tourism journals: 1992–2001”, Tourism Management, Vol. 26, No. 5, 2005, pp. 641-648.

18.R Law and N Au, “Relationship modeling in tourism shopping: a decision rules induction approach”, Tourism Management, Vol. 21, No. 3, 2000, pp. 241-249.

19.R Law, C Goh, and R Pine, “Modeling Tourism Demand: A Decision Rules Based Approach”, Journal of Travel & Tourism Marketing, Vol. 16, No. 2, 2004, pp. 61-69.

20.SH Liao, YJ Chen, and MY Deng, “Mining customer knowledge for tourism new product development and customer relationship management”, Expert Systems with Applications, Vol. 37, No. 6, 2010, pp. 4212-4223.

21.MV Middelkoop, A Borgers, and H Timmermans, “Inducing Heuristic Principles of Tourist Choice of Travel Mode: A Rule-Based Approach”, Journal of Travel Research, Vol. 42, No. 1, 2004, pp. 75-83.

22.Graham Miller and Louise Twining-Ward, Monitoring for a Sustainable Tourism Transition: The Challenge of Developing and Using Indicators, CABI Publishing, Wallingford, Oxfordshire, UK; Cambridge, MA, 2006.

23.MM Mushrif and AK Ray, “Color image segmentation: Rough-set theoretic approach”, Pattern Recognition Letters, Vol. 29, No. 4, 2008, pp. 483-493.

24.LA Nelson, DA Dickey, and JM Smith, “Estimating time series and cross section tourism demand models: Mainland United States to Hawaii data”, Tourism Management, 2009, pp. 1-11.

25.HS Nguyen, “On the Decision Table with Maximal Number of Reducts”, Electronic Notes in Theoretical Computer Science, Vol. 82, No. 4, 2003, pp. 1-8.

26.M Ningler, G Stockmanns, G Schneider, HD Kochs, and E Kochs, “Adapted variable precision rough set approach for EEG analysis”, Artificial Intelligence in Medicine, Vol. 47, No. 3, 2009, pp. 239-261.

27.S Pal and P Mitra, Rough sets, EM algorithm, MST and multispectral image segmentation, Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, 2003, pp. 589-589.

28.Z Pawlak and A Skowron, “Rough sets and Boolean reasoning”, Information Sciences, Vol. 177, No. 1, 2007, pp. 41-73.

29.Z Pawlak, “Rough Sets”, International journal of Computer and Information Sciences, Vol. 11, No. 5, 1982, pp. 341-356.

30.Z Pawlak, “Rough classification”, International Journal of Man-Machine Studies, Vol. 20, No. 5, 1984, pp. 469-483.

31.Z Pawlak, Rough Sets, Kluwer Academic Publishers, Dordrecht, the Netherlands, 1991a.

32.Z Pawlak, Rough sets: Theoretical aspects of reasoning about data, Kluwer Academic Publishers, Dordrecht, Boston, and London, 1991b.

33.DG Pearce, “A needs-functions model of tourism distribution”, Annals of Tourism Research, Vol. 35, No. 1, 2008, pp. 148-168.

34.A Petrosino and A Ferone, “Rough fuzzy set-based image compression”, Fuzzy Sets and Systems, Vol. 160, No. 10, 2008, pp. 1485-1506.

35.P Racherla and C Hu, “A social network perspective of tourism research collaborations”, Annals of Tourism Research, 2010. Article in Press, Corrected Proof.

36.W Richard, JG Cynthia, JL Robert, B M. Alan, and S Til, “Variable selection for propensity score models when estimating treatment effects on multiple outcomes: a simulation study”, Pharmacoepidemiology and Drug Safety, Vol. 22, No. 1, 2013, pp. 77-85.

37.M Sarkar, “Ruggedness measures of medical time series using fuzzy-rough sets and fractals”, Pattern Recognition Letters, Vol. 27, No. 5, 2006, pp. 447-454.

38.XY Shao, XZ Chu, HB Qiu, L Gao, and J Yan, “An expert system using rough sets theory for aided conceptual design of ship’s engine room automation”, Expert Systems with Applications, Vol. 36, No. 2, 2009, pp. 3223-3233.

39.JY Shyng, HM Shieh, GH Tzeng, and SH Hsieh, “Using FSBT technique with Rough Set Theory for personal investment portfolio analysis”, European Journal of Operational Research, Vol. 201, No. 2, 2010, pp. 601-607.

40.E Simoudis, J Han, and U Fayyad, AAAI Press, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (Menlo Park, CA, 1996).

41.R Swiniarski and A Skowron, “Rough set methods in feature selection and recognition”, Pattern Recognition Letters, Vol. 24, No. 6, 2003, pp. 833-849.

42.HJ Teoh, CH Cheng, HH Chu, and JS Chen, “Fuzzy time series model based on probabilistic approach and rough set rule induction for empirical research in stock markets”, Data & Knowledge Engineering, Vol. 67, No. 1, 2008, pp. 103-117.

43.MB Thomas, M Diane, N Solange, W Ellen, and R Bill, “Multiple Capacities, Multiple Outcomes:Delving Deeper into the Meaning of Community Capacity”, Journal of Rural and Community Development, Vol. 3, 2008, pp. 56-75.

44.Tzu-Liang (Bill) Tseng and Chun-Che Huang, “Rough set based approach to feature selection in customer relationship management”, The International Journal of Management Science, OMEGA journal, Vol. 35, No. 4, 2007, pp. 365-383.

45.AH Walle, “Quantitative versus qualitative tourism research”, Annals of Tourism Research, Vol. 24, No. 3, 1997, pp. 524-536.

46.KC Wang, AT Hsiehb, YC Yehc, and CW Tsai, “Who is the decision-maker: the parents or the child in group package tours?”, Tourism Management, Vol. 25, No. 2, 2004, pp. 183-194.

47.X Wang, J Yang, R Jensen, and X Liu, “Rough set feature selection and rule induction for prediction of malignancy degree in brain glioma”, Computer Methods and Programs in Biomedicine, Vol. 83, No. 2, 2006, pp. 147-156.

48.SF Witt and CA Witt, “Forecasting tourism demand: A review of empirical research”, International Journal of Forecasting, Vol. 11, No. 3, 1995, pp. 447-475.

49.JT Yao and JP Herbert, “Financial time-series analysis with rough sets”, Applied Soft Computing, Vol. 9, No. 3, 2009, pp. 1000-1007.

50.A Zalatan, “Wives’ involvement in tourism decision processes”, Annals of Tourism Research, Vol. 25, No. 4, 1998, pp. 890-903.

51.Z Zhang, Y Shi, and G Gao, “A rough set-based multiple criteria linear programming approach for the medical diagnosis and prognosis”, Expert Systems with Applications, Vol. 36, No. 5, 2009, pp. 8932-8937.

52.W Ziarko, Rough Sets, Fuzzy Sets and Knowledge Discovery, Workshop in Computing, Springer, London, 1994.

53.W Ziarko, “The Discovery, Analysis, and Representation of Data Dependencies in Databases”, G Piatetsky-Shapiro and WJ Frawlwy (editors), Knowledge Discovery in Database, AAAI/MIT, Menlo Park, CA, 1990, pp. 213-228.

<Previous Article In Issue

Download article (PDF)

Next Article In Issue>

Journal: International Journal of Computational Intelligence Systems
Volume-Issue: 9 - 6
Pages: 1118 - 1132
Publication Date: 2016/12/01
ISSN (Online): 1875-6883
ISSN (Print): 1875-6891
DOI: 10.1080/18756891.2016.1256574 How to use a DOI?
Open Access: This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

Cite this article

ris enw bib

TY  - JOUR
AU  - Chun-Che Huang
AU  - Tzu-Liang (Bill) Tseng†
AU  - Kun-Cheng Chen
PY  - 2016
DA  - 2016/12/01
TI  - Novel Approach to Tourism Analysis with Multiple Outcome Capability Using Rough Set Theory
JO  - International Journal of Computational Intelligence Systems
SP  - 1118
EP  - 1132
VL  - 9
IS  - 6
SN  - 1875-6883
UR  - https://doi.org/10.1080/18756891.2016.1256574
DO  - 10.1080/18756891.2016.1256574
ID  - Huang2016
ER  -

download .riscopy to clipboard