Attribute reduction in decision systems is the process of choosing the minimum set of the conditional attribute set, preserving classified information df the decision systems

(1)

Phimg Thi Thn Hien va Dtg Tjp chi KHOA HQC & CONG NGHE 173(13): 51-57

FINDING KNOWLEDGE ACCORDING TO ROUGH SET THEORY

Phung Thi Thu Hien', Ninh Van Tho University of Economic and Technical Industries ABSTRACT ,

Attribute reduction is a core issue of rough set theory and also an essential pre-processing step in dita mining. In recent years, there have been many papers about attribute reduction methods based on different views, and generally can be classified as attribute reduction method based on positive region, attribute reduction method based on disceraibility matrix, attribute reduction method used information entropy. However, most of attribute reduction methods are performed on single- valued decision system decision table. In this paper, we propose methods for attribute reduction in set-valued decision systems. Next, based on some results in the relational database, this article proposes an algorithm building a relationship scheme from the decision table.

Keywords: Relational database, rough set, relational scheme, decision tgblem, keys INTRODUCTION

The theory of conventional rough set initiated by Pawlak [4] is an effective tool to solve attribute reduction problems and to extract rules in information systems. Attribute reduction in decision systems is the process of choosing the minimum set of the conditional attribute set, preserving classified information df the decision systems. In decision systertis, computer scientists have provided several attribute reduction methods based on model of conventional rough set, summarized ^by Shifei D et. al. in ref. [10]. In set-valued information system, Guan Y. Y. Wang et. al.

[6] expanded equivalent relation in conventional rough set to tolerance relation and developed model tolerance-based rough set by expanding lower approximation, upper approximation, positive domain, etc. based on tolerance relation. There are remarkable reports about attribute reduction in decision system and ordered decision system in model of tolerance-based rough set approach in ref. [2], [9], [13]. In ref. [15], the authors using matrix method studied the altering of approximation sets with and without attribute set.

In this paper, section 2 describes the results of set-valued decision system and definitions of reduct and basic concepts in relational databases. In section 3, the author demonstrate attribute reduction method. In

" Tel 0914 770070. Email. Thuhiencnl@gmail.com

section 4, the author provides some algorithms in relation database. In section 5, 'the author discuss about the overall results

and future study.

BASIC DEFINITIONS Basic definitions in rough set

A decision table is defined as DX = {U,C'u{d})'m which, (7 = {«|,U2,...,w„} is the finite & non-empty set of objects C = {ci,CT,...,c,„}the set of condition attributes, D is the set of decision attributes and

Cr\D = 0,V^ n K„ where Fo is the value range of attribute a, f:Ux{C^D)-^Vis an information function, where Vcr e C u Z), w e [/,

f{u,a)^V^ hold.

Set-valued decision systems were proposed as a tool to characterize the data sets with incomplete or uncertain information [9].

Formally set-values decision table is a tuple DT = (u,A^{d}), where U is a finite set of objects, A is a finite set of set-valued attributes, i,e the functions of form a:U —» 2'^^

for a e A, and d g A is a distinguished attribute called decision. The set Va is called the domein of attribute a, and a(x) £ v^ for each a e A and x G U. In the case, when |a(x)|

= 1 for any a G A and x 6 U we have a standard single valued decision table, in Table 1 [9] we have an example of a set- valued decision system.

(2)

Phiing Thi Thu Hien vi Big Tap chf KHOA HQC & CONG NGHE 173(13): 51-57

Table 1. An example of a set-valued decision table V

n

i ) 11 K ir, K,

a

IS

n,

IM AiiditKJiilA]

|£l i £ , f , c | ISO) l E f l (KOI

n

|I,F,C|

lE.'l ( f , q lE.f)

ipukuL Luii;ikigc(S)

|E| ItF.C) l E f ) lE.C) [f.OI W |i-,f.c)

|r,G|

|0) (E.Cl

KiuidHiglR)

{f.G}

{f.oi

|F,C]

IF.O)

|f,c) (EF) (EC) ( E F C) (F.C) IF.Cl

v\Tit»e(Wi

|F,t) CEFC) (F.C) (F|

IF) (EFl (EEC) (EC) (CC) (EF)

dlL

\ 0

Vo Su

\n

\ r i

^E.

V«

\ a

YK Yts

Let DT = iu,A\^[d])he a set-valued decision table. Any reflexive and symmetric relation T

£ U X U is called a tolerance relation defined on U. A tolerance relation TB related to a set of attributes B Q A can be defined by:

TB (X, y) <=> VbGB la(x) H a(y)| ^ 0 (1) For any Ba,A we denote by

[x]Ts={y^U:(x,y)eT^}the tolerance class related to object x e U. We also denote by the family U/Tg={[x]j- :xeU} of all tolerance classes of TB .

Basic coocepts in relatioQal databases n],[4], [12]. Let R = {ai,...,a„}he a nonempty finite set of atributes, each attribute has a domain value of D(a,). A relation r on S as a set of tuples {A,,.,.,A„,},

hj-.R-i' U D(a,), \<.j&m is a fimction such that hj{a,)eD{a,).

Let /• = {ft],...,ft„,}be a relation over ff={ai,...,a„}. A functional dependency (FD for short) over Ti is a statement of form

A->B,whereA,BcR./'Z) A->B holds in a relation r over R if

({^a^A){h,[a) = h,{a)) = [{VbsB){h,{b) = hj{b))

Let F, = [{A, B):A,BCR,A~)-B], F, is called the full family of functional dependencies in r. Let ^ be a finite set and denote P(R) its power set, we say that F is an /-family over R iif for all A,B,C,D^R:

(1) (A.A)eF

(2) (A, B) e F, (B, C) e F =>(A.C) EF (3) (A, B) €F, AaC, DaB => (C, D) e F 52

(VA„A,€

(4) (A, B) eF, (C. D) eF=>(AU:, BuD) EF Clearly, F, is an/family over Jt It is known [1] that if F is an arbitrary /family over R, then there is a relation r such that F^=F.

F* is the set of al! FDs which can be derived from F by the rules (1) - (4).

A relation schema j is a pair <R,F>, where i? is a set of attributes and F is a set of FDs on R. Denote J* = la eR\ A-»• {a] sF^\, A* \s called the closure of A on 5.

It is clear that A ^ BeF* iifBQA".

According to [1], ifs = <R, F> is a relational schemes r over R, such a relation is called an Armstrong relation of .y.

Let /- be a relation, s=<R,F> be a relation scheme and AQ,R. Then .^ is a key of r (a key of s) if A^RiA^R&F*\. ^ is a minimal key of?- {s) if A is a key of r{s) and any proper subset of A is not a key of r (s).

Denote A'^(/irj) the set of all minimal keys of r {s). K^P[R) is a Spemer system if for any K^.K^eK implies Ki<tK2. Clearly, K,{K,) are Spemer systems.

Let .ff be a Spemer-system over R as the set of all minimal keys of 5 . We defined the set of antikeys ofK, denoted by K'^, as follows:

K-^ ={Atz R:{Be K)=>{B(Z A)} and if {A^C}^{3BeK){B^C).

It is easy to see that fC' is also a Spemer system over R. By definition, if K is the minimum set of keys of a FD then IC' is the set of all set not the biggest key.

Let r be a relation over R. Denote

£, ={£,j:l£/<7<|^|j, where E,j={a^R:h,{a) = hj{a)].ThGn £, is called

the equality set of r. It is known [2] that for -^r e ^ , A^ = r^Sy if there existe F,j e £^ : ^ c fiy, Otherwise A^ =:R. In next content, we introduce some definitions about the family of all minimal sets of an attribute over a relation and a relation scheme.

(3)

Phiing Thi Thu Hiln vd Dtg Tap chi KHOA HQC & CONG NGHE 173(13): 51-57 Definition 2.[4] Let S = {R,F) be a relation

scheme over Raxyd aeR Set

K = [AcR:A^{a},^B:{B~>{a}){BczA)].

Kl is called the family of minimal sets of the attribute a over s.

Similarly, we define the family of minimal sets of an attribute over a relation Definition 3. Let r be a relation over R and a^R.

Set

K=[A^R:A^ {a],^B ^R:[B^ {a}){B c A)]

K^ is called the family of minimal sets of the attribute a over r. It is clear that R€Kl,Rf£K^a}eKl,{a}eK;, and Kl,K^

are Spemer systems over R.

ATTRIBUTE REDUCTION IN SET- VALUED DECISION SYSTEM Attribute reduction in decision systems is the process of choosing the minimum set of the conditional attribute set, preserving classified information of the decision syste Definition 4. (Decision relative reduct) Given a set-valued decision table DT = {U,A\u{d]'\ the decision relative reduct of DT is the minimal set of attribute R £ A, which satisfying the following conditions:

1. for any pair (x, y) E U, if d(x) ^ d(y) and (x, y) S TA then (x, y) a TR;

2. no proper subset R'of R satisfies the previous condition.

The reduct R is optimal if H consists of the smallest number of attributes.

Discernibility Function

Definition 5. (Basic discernibility measure) [11]

Let DT = [u,A\.j{d}) be a single-valued decision table. The discernibility measure for a set of attributes B ^ A is defined by:

disc(B)=\{{x,y)eUxUmx)'^d{y))A3,,,ib(x)tb(y))}\

Definition 6. (Generalized discernibility function). Let DT = {u,A\.j{d)) be a set-

valued decision table with tolerance relations Ta (for all a G A). The mapping discern : 2^:

R" u {0}, defined by

where B E A is set of attributes, is called the generalized discemibility funcfion.

Below we list some properties of the generalized function:

Property 1. For any attribute a 6 A, the value discem(a) is equal to frequency of occurrence of attribute a in the discemibility matrix MDT- Property 2. Discemibility function is increasing. For any set B £ A and C £ A, if B e C then discem(B) < discem(C ).

Contingency Table and Tolerance-Based Contingency Table

Contingency Table.

Let Vd be the set of decision values in decision table DT = (y,A^{d}), and let

partition of U defined by indiscemibility relation IND(B) for B Q A . Contingency table CTg related to B is a two dimensional table C7;=[CT,[,-,y]]i'',';i»

where: CTB[i,j]=\{xeU:xe[x,]s ^d(x)=j}\

The local discemibili^ measure related to indiscemibility class [x,] is defined as follows:

5{[x.l) = \{ixuX2)^hl-{u\[x.l:d(x,)^d(x,)}\

S CT[i.J^].CT[k,J,]

= Z CT[i,j,\(\D^^\]-CT[i,J,]

where || D^ jdenotes cardinality of decision class Dj for j = \,...\Vj\

Hence the basic disceraibili^ measure of attribute set B is defined as the number of pairs of discernible objects, i.e.

disd,B)=2_Jl

4 2 X!^'-'''"'^''-'^'^ (2)

(4)

Phiing Thi Thu Hifin vd Dig Tap chi KHOA HQC & CONG NGHE 173(13): 51-57 Table 2. The contingency tables for single

attributes and values of the discern function of spoken language attribute Values

E F G E,F

Spoken No

language 1 0 0 1

Yes 0 1 1 0 E,G 1 1 F.G 1 1 E,F,G 1 1

discern 1 (S) - 22

The summation is taken over the disjoint subsets induced by IND(B) and over all j„J2s{i,...|y,|),j,*j,.

Table 2 presents the contingency table and the values of the discemibility function for each attribute from Table 1. We remind that the cardinality of each decision class is equal to 5. The contingency table with the indiscemibility relation is further called the basic contingency table.

Proposition 1. Let DT = {V,A^{d]) be a decision table. Let IND(B) be a indiscemibility relation related to B c A. Let UB denotes a number of indiscemibility classes defined by INB(B). Given a contingency table CTB. The value discem(B) can be determined in time 0(dnB ), which is bounded by 0(dn), where n = |U | and d is a number of decision classes.

Tolerance-Based Contingency Table. For a decision table DT-'[u,A<j{d]), let TB be a tolerance relation for B c A and let t;/WO(«) = {h],,[,,]^ [,„]^) be the partition of U defined by indiscemibility relation IND(B). The tolerance based contingency table is a two-dimensional table TCTB=[TCT[i,j]^^^^''^^f which is defined as follows:

TCTs[i,j] = \{uBU\u^{u,]^ vdd{u) = j ' \ Intuitively, tolerance-based contingency table stores the decision distributions inside each tolerance class. One can observe that the

tolerance classes are not disjoint in general.

To compute the value of disceraibility function we modify the concept of a local discemibility measure.

For a tolerance class [x,].j. , the local discemibility measure related to [\'\^ is

defined by:

5fl([x,]7i,)=K(î.^2)eKlr,x(t/\[x,]r,):Ûi)*''(ii) 2 CTSVLMVCTÎKJ^]

^C7-«[/,y,](|D,J~7-C7-,[/,j,l) The generalized discemibility measure can be calculated as follows:

(3) where B e A . We denote by CTB ® TCTB the operation in Equation 3. The summation is taken over a disjoint subsets induced by IND(B) and over all Ji,J2 e{l,...|V^|},j| ^ j ^ . Algorithm attribute reduction in set-valued decision tables

Algorithm 1. Generalized Maximal Discemibility heuristic for setvalued decision tables with tolerance relation.

1: Input: Set-valued decision table D = (U, A Ud).

2: Output: Attribute reduction R, 3: Generate a set of lattices Latt(A);

4:R-<-0;

5: discern(R) • - 0;

6: while (discem(R) < discern(A)) do 7: max discern <— 0;

8: for (ai G A) do 9: B < - R U { a i } ; 10: Create C T B ; 11: Create TCTB using CTB;

12: Determine discem(B) = CTB ® TCTB using Equation (3);

13: if (discem(B) > max dicern) then 14: max discern <— discem(B);

54

(5)

Phung Thi Thu Hi6n vd Dtg Tap chi KHOA HQC & CONG NGHE 173(13): 51-57 15: best attribute •«— ai;

16: end if 17: end for

18: ÂA\{bestatttibute};

19: R * - R U {bestattribute};

20: end while

The time complexity of Algorithm 3.3 is Oijc^n^), where k is a number of attributes, n is the number of objects.

BASIC ALGORITHMS IN RELATION DATABASE

Finding a minimal key is one of the most important problems in the field of knowledge discovery and data mining.

Algorithm 2. [3] Finding a minimal key from the set of antikeys.

Input: Let Â be a Spemer-system over R as the set of antikeys, C = [}}^,...,b„,}^R and//is a Spemer-system as the set of minimal keys

[K = //"') such that 3 5 e A: : 5 c C Output: O e / /

5/ep7: WesetT(0) = C;

Step i+I; We set

T{i + \) = T{l)-b,^.^ if VBeK. there is not T^B

T(i +1) = T[i) otherwise Finally, we set D = T(m);

Algorithm 3. [3] Finding the set of minimal keys from the set antikeys.

Input: Let K = {Bi,...,Bf^] be a Spemer- system over R.

Output: /fwhere//"' -A"

We construct Hby inducfion.

Step I: We construct an Ai,(AjeH) using Algorithm 2 We set AT, = /l,.

Step i+I: If there is a fieẤsuch that ff^S,(y/:l^y<m),then by algorithm which finds a minimal key (Algorithm 2) we determine an 4ti > where 4^1 s H, 4+] c B.

After that, let /^,+, = K^ u 4^,. In the converse case we set H = K,.

From definition 3, the article builds the algorithm for finding the minimal set of attributes over relation.

Algorithm 4. Algorithm finds the minimal set of attributes over relation

Input: r = {uj,u2,...,u,„] is the relation over fl and a ER.

Output: K^.

Step I: From r we calculate the equality system Er =^ {£y : 1 < (• < j" < rfi} , where E,={aER:u,ia) = Uj{a)].

Step 2: From Er we construct the set M^=^AeÊ:a€ ÂB e Ê:asB,A<=.B].

Step 3: Compute K from the set M^

("A:"' = M„)(By Algorithm 3.)

In the worst case, the complexity of the algorithm is not greater than the exponent n in which n is the number of elements of R.

Algorithms to construct relation scheme from decision tables

TJie problem; Given a decision table DS = ^U,C^{dyjas a relation r over an attribute R = Cu{d], we have to construct the relation scheme sj =< fl, F >, where F is the, set of fimctional dependencies^,->{rf} for A,QC,\<i<t, such that Kl = K^=-RED{C)^{d}, whereA:; is the set of all minimal keys of S^ , K^ is the family of all minimal sets of the attribute d over the relation r and RED(C) is the set of all reducts ofDS.

Algorithm 5. Construct a relation scheme from a decision tablẹ

Input: Let DS = {u,Cy-j{d]) be a decision table, where POS^^^ [{d}) = U . Output: Sj =<R,F> such that A"^ - {d} ^ R^.

(6)

Phiing Thi Thu Hi6n vd Dtg Tap chi KHOA HQC & C 6 N G NGHE Let us consider the relation r over the set of

attributes R = C^{d].

Step I: Using Algorithm 3 we obtain Kj.

Assume that K'^ ={K^,K2^,...,K,], according to definition K^^ is a Spemer-system over C.

Step 2: For each K,eKa,\<i<t,K, ^[d], we construct the functional dependency K, -> [d].

The relation scheme Sj=<R,F>, where R = Cu{d]and F = [K,^{d]:K,^K'^], is the one w e have to consttuct.

The c o m p l e x ! ^ of the algorithm is polynomial according to the size of r.

Proof K'j-{d] = Ra first of all. I prove

1) For any KeK^ we have K-^{d} and there does not exist K't^K such that

K'-^{d]. Hence, according to the method to constmct s^=<R,F> w e conclude K is a minimal key of 5^, that is KeKj.

2) Conversely, assume that there exists A" e A:^ such that KiKj, then we have K^[d] and there does not exist K'cK such that K'-^{d]. It is easy to see that for any K,BK^,\&i<t, K<zK, (i) because if K(zK, then K, is not a reduct of C in DS.

Moreover, for any Ki£K^,\<i<,t,K,-^K (ii) because if K,czK then K is not a minimal key of Kj. From (i), (ii) we can conclude K-{A',Ar|,A'2,...,A:,}is a Spemer-system and for any./4ciyC we have A-^{d]. According to the definition, AT is the family of all minimal sets of attributed, so /C = K'j,Ke.Kj This is in contradiction with the condifion K^Kj. Therefore w e have AT e A ^ . From 1) and 2) we conclude K^=K^.

CONCLUSION

In this paper, based on indiscemibility matrix and indiscemibility ftincfion in traditional rough set theory [11], the author proposed

173(13): 3 1 - 3 1 contingency tables and discernibility function in order to find reduct of set-valued decision system. Based on some results of J„

Demetrovics and Thi V.D concerning keys,,, the article building algorithm relation scheme from a consistent decision table, it has important implications in knowledge discovery and data mining. In next papers we will show that the proposed solution can be also modified to manage with dominance based rough sets approach to set-valued decision table.

R E F E R E N C E S

1. Armsttxjng W. W. (1974), "Dependency structures of database relationships", Information Processing, 74, 580-583.

2. Demetrovics J., Thi V. D. (1987), "Keys, antikeys and prime attributes". Ann. Univ. Scien.

Budapest Sect Compul., 8, pp. 37-54 3. Demetrovics J., Thi V. D. (1998), "Relations and minimal keys", Acta Cybernetica 8, 3, pp.

279-285.

4. Demetrovics J., Thi V. D. (1995), "Some remarks on generating Armstrong and inferring functional dependencies relation". Acta Cybernetica 12, pp. 167-180.

5. Guan Y. Y., Wang H. K., (2006), "Set-valued information systems", Information Sciences, 176, pp.2507-2525.

6. Kryszkiewicz M., (1998), "Rough set approach to incomplete information systems ", Information Science, Vol. 112, pp. 3 9 ^ 9 .

7. Pawlak Z., (1982), Rough sets. International Journal of Information and Computer Sciences, 11(5), pp. 341-356.

8. Pawlak Z. (1991), Rough sets: Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers.

9. Qian Y. H., Dang C. Y., Liang J. Y., Tang D.

W. (2009), "Set-valued ordered information' systems". Information Sciences, 179, pp. 2809- 2832.

10. Shifei D., Hao D. (2010), "Research and Development of Attribute Reduction Algorithm Based on Rough Set", IEEE, CCDC2010, pp.648- 653.

11. Skowron A., Rauszer C. (1992), "The Discemibility Matrices and Functions in Information systems, Interiligent Decision Support, Handbook of Applications and Advances

56

(7)

Phiing Thi Thu HiSn vd Dtg Tap chi KHOA HQC & CONG NGHE 173(13): 51 - 57 of the Rough Sets Theory", Kluwer, Dordrecht, 14. Yao Y. Y., Zhao Y., Wang J., (2006), "On pp. 331-362. reduct construction algorithms", Proceedings of 12. Thi V.D, (1986), "Minimal keys and International Conference on Rough Sets and Antikeys". Acta Cybernetica 7, 4 361-371. Knowledge Technology, pp. 297-304.

13. Y. H. Qian Y. H. , Liang J. Y.,(2010), "On 15. Zhang J. B., Li T. R., Ruan D., Liu D. (2012), Dominance Relations in Disjunctive Set-Valued "Rough sets based matrix approaches with Ordered Information Systems", International dynamic attribute variation in set-valued Journal of Information Technology & Decision information systems". International Journal of MakingVol 9, No. 1, pp. 9-33. Approximate Reasoning 53, pp. 620-635.

T O M T A T

P H A T H I E N T R I T H l T C T H E O H U 6 N G T I E P C A N T A P T H O

Phiing Thi Thu Hien', Ninh Van Thg Trudng Dai hpc Kinh le Ky ihudi Cong nghiep Riit gon thuoc tinh la bai toan quan trpng nhat trong ly thuyet tap tho. Trong nhung nam gan dSy, cac phuong phap nit gon thuoc tinh da thu hiit su chii y va quan tam cua nhiSu nha nghien cihi.

Dang chii y la phuong phap dua tren mien dirong, phirong phap sii dung ma tran phan biet, phuong phap sii dung entropy thong tin ...w. Tuy nhien, hau het cac phuong phap nay deu thi^c hien tren cac he thdng tin don trj. Trong b^i b^o nay, tac gia dua ra phuong phap nit gon thuoc linh trong bang quyet djnh da tri. Dong thoi, dya IrSn mdt s6 k6t qua nghiSn cuu trong co so dO lieu quan he bai bao trinh bay thuat toan xay dung so d6 quan he tir bang quyet dinh don tri, Tii' khoa: Co sd dir lieu guan he, tap thd. so do quan hi, bdng quyet d/nh, khda.

Ngay nhdn bdi: 30/8/2017; Ngay phdn bi^n: 08/9/2017; Ngay duyitdang: 30/11/2017 Tel: 0914 770070, Email: Thuhiencnl@gmail.com