このページは http://www.slideshare.net/shima__shima/2011-wsicdm-padmpr の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

5年弱前 (2011/12/11)にアップロードinテクノロジー

Fairness-aware Learning through Regularization Approach

The 3rd IEEE International Workshop on Pr...

Fairness-aware Learning through Regularization Approach

The 3rd IEEE International Workshop on Privacy Aspects of Data Mining (PADM 2011)

Dec. 11, 2011 @ Vancouver, Canada, in conjunction with ICDM2011

Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2011.83

Article @ Personal Site: http://www.kamishima.net/archive/2011-ws-icdm_padm.pdf

Handnote: http://www.kamishima.net/archive/2011-ws-icdm_padm-HN.pdf

Workshop Homepage: http://www.zurich.ibm.com/padm2011/

Abstract:

With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect people’s lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be socially and legally fair from a viewpoint of social responsibility; namely, it must be unbiased and nondiscriminatory in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. From a privacy-preserving viewpoint, this can be interpreted as hiding sensitive information when classification results are observed. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.

- Fairness-aware Classifier with Prejudice Remover Regularizer約4年前 by Toshihiro Kamishima
- Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, and Theoretical Aspects1年以上前 by Toshihiro Kamishima
- Feygina & Henry (2015, Oxford) Culture and Prosocial Behavior4ヶ月前 by Irina Feygina, Ph.D.

- Fairness-aware Learning

through Regularization Approach

Toshihiro Kamishima†, Shotaro Akaho†, and Jun Sakuma‡

† National Institute of Advanced Industrial Science and Technology (AIST)

‡ University of Tsukuba & Japan Science and Technology Agency

http://www.kamishima.net/

IEEE Int’l Workshop on Privacy Aspects of Data Mining (PADM 2011)

Dec. 11, 2011 @ Vancouver, Canada, co-located with ICDM2011

START

1 - Introduction

Due to the spread of data mining technologies...

Data mining is being increasingly applied for serious decisions

ex. credit scoring, insurance rating, employment application

Accumulation of massive data enables to reveal personal information

Fairness-aware / Discrimination-aware Mining

taking care of the following sensitive information

in its decisions or results:

information considered from the viewpoint of social fairness

ex. gender, religion, race, ethnicity, handicap, political conviction

information restricted by law or contracts

ex. insider information, customers’ private information

2 - Outline

Backgrounds

an example to explain why fairness-aware mining is needed

Causes of Unfairness

prejudice, underestimation, negative legacy

Methods and Experiments

our prejudice removal technique, experimental results

Related Work

finding unfair association rules, situation testing, fairness-aware data

publishing

Conclusion

3 - Backgrounds

4 - Why Fairness-aware Mining?

[Calders+ 10]

US Census Data : predict whether their income is high or low

Male

Female

High-Income

3,256 fewer

590

Low-income

7,604

4,831

Females are minority in the high-income class

# of High-Male data is 5.5 times # of High-Female data

While 30% of Male data are High income, only 11% of Females are

Occum’s Razor : Mining techniques prefer simple hypothesis

Minor patterns are frequently ignored

and thus minorities tend to be treated unfairly

5 - Calders-Verwer Discrimination Score

[Calders+ 10]

Calders-Verwer discrimination score (CV score)

Pr[ Y=High-income | S=Male ] - Pr[ Y=High-income | S=Female ]

Y: objective variable,

S: sensitive feature

The conditional probability of the preferred decision

given a sensitive value subtracted that given a non-sensitive value

As the values of Y, the values in sample data are used

The baseline CV score is 0.19

Objective variables, Y, are predicted by a naive-Bayes classifier

trained from data containing all sensitive and non-sensitive features

The CV score increases to 0.34, indicating unfair treatments

Even if sensitive features are excluded in the training of a classifier

improved to 0.28, but still being unfairer than its baseline

Ignoring sensitive features is ineffective against the exclusion of

their indirect influence (red-lining effect)

6 - Social Backgrounds

Equality Laws

Many of international laws prohibit discrimination in socially-sensitive

decision making tasks [Pedreschi+ 09]

Circulation of Private Information

Apparently irrelevant information helps to reveal private information

ex. Users’ demographics are predicted from query logs [Jones 09]

Contracts with Customers

Customers’ information must be used for the purpose within the

scope of privacy policies

We need sophisticated techniques for data analysis whose outputs

are neutral to specific information

7 - Causes of Unfairness

8 - Three Causes of Unfairness

There are at least three causes of unfairness in data analysis

Prejudice

the statistical dependency of sensitive features on an objective

variable or non-sensitive features

Underestimation

incompletely converged predictors due to the training from the finite

size of samples

Negative Legacy

training data used for building predictors are unfairly sampled or

labeled

9 - Prejudice: Prediction Model

variables

objective variable Y : a binary class representing a result of social

decision, e.g., whether or not to allow credit

sensitive feature S : a discrete type and represents socially sensitive

information, such as gender or religion

non-sensitive feature X : a continuous type and corresponds to all

features other than a sensitive feature

prediction model

Classification model representing Pr[ Y | X, S ]

M[ Y | X, S ]

Joint distribution derived by multiplying a sample distribution

Pr[ Y, X, S ] = M[ Y | X, S ] Pr[ X, S ]

considering the independence between these variables

over this joint distribution

10 - Prejudice

Prejudice : the statistical dependency of sensitive features on an

objective variable or non-sensitive features

Direct Prejudice

Y ⊥⊥

/ S | X

a clearly unfair state that a prediction model directly depends on a

sensitive feature

implying the conditional dependence between Y and S given X

Indirect Prejudice

Y ⊥⊥ S |

/

φ

a sensitive feature depends on a objective variable

bringing red-lining effect (unfair treatment caused by information

which is non-sensitive, but depending on sensitive information)

Latent Prejudice

X ⊥⊥ S |

/

φ

a sensitive feature depends on a non-sensitive feature

completely excluding sensitive information

11 - Relation to PPDM

indirect prejudice

the dependency between a objective Y and a sensitive feature S

from the information theoretic perspective...

mutual information between Y and S is non-zero

from the viewpoint of privacy-preservation...

leakage of sensitive information when an objective variable is known

different conditions from PPDM

introducing randomness is occasionally inappropriate for severe

decisions, such as job application

disclosure of identity isn’t problematic generally

12 - Underestimation

Underestimation : incompletely converged predictors due to the

training from the finite size of samples

If the number of training samples is finite, the learned classifier may

lead to more unfair decisions than that observed in training samples.

Though such decisions are not intentional,

they might awake suspicions of unfair treatment

Notion of asymptotic convergence is mathematically rationale

Unfavorable decisions for minorities due to the shortage of training

samples might not be socially accepted

Techniques for an anytime algorithm or a class imbalance problem

might help to alleviate this underestimation

13 - Negative Legacy

Negative Legacy : training data used for building predictors are

unfairly sampled or labeled

Unfair Sampling

If people in a protected group have been refused without

investigation, those people are less frequently sampled

This problem is considered as a kind of a sample selection bias

problem, but it is difficult to detect the existence of the sampling bias

Unfair Labeling

If the people in a protected group that should favorably accepted have

been unfavorably rejected, labels of training samples become unfair

Transfer learning might help to address this problem, if additional

information, such as the small number of fairly labeled samples, is

available

14 - Methods and Experiments

15 - Logistic Regression

Logistic Regression : discriminative classification model

A method to remove indirect prejudice from decisions made by logistic

regression

samples of

samples of

samples of

objective

non-sensitive

sensitive

regularization

variables

features

features

model

parameter λ

parameter

− ln Pr({(y, x, s)}; Θ) + ηR({(y, x, s)}, Θ) + λ2�Θ�22

regularization

Regularizer for Prejudice Removal

parameter η

L2 regularizer

the smaller value

the larger value

avoiding

more strongly constraints

more enforces

over fitting

the independence between S and Y

fairness

16 - Prejudice Remover

Prejudice Remover Regularizer

mutual information between S and Y

so as to make Y independent from S

�

Pr[Y, S]

M[Y |X, S] Pr[X, S] ln Pr[S]Pr[Y ]

Y,X,S

� �

Pr[y|¯x

M[y|x, s; Θ] ln

s, s; Θ]

� Pr[y|¯xs,s;Θ]

Y ∈{0,1} (x,s)

s

true distribution is replaced

approximate by the mutual info at means of X

with sample distribution

instead of marginalizing X

We are currently improving a computation method

17 - Calders-Verwer Two Naive Bayes

[Calders+ 10]

Calders-Verwer Two

Naive Bayes

Naive Bayes (CV2NB)

Y

Y

S

X

S

X

S and X are conditionally

non-sensitive features X are

independent given Y

mutually conditionally

independent given Y and S

Unfair decisions are modeled by introducing of the dependency of S

on X, as well as that of Y on X

A model for representing joint distribution Y and S is built so as to

enforce fairness in decisions

18 - Experimental Results

Accuracy

Fairness (MI between S & Y)

10−6

0.85

10−5

0.80

10−4

0.75

er 10−3

0.70

fair 10−2

0.65

high-accuracy

10−1

0.60

1

0.01

0.1

1

10 η

0.01

0.1

1

10

η

LR (without S) Prejudice Remover

Naive Bayes (without S)

CV2NB

Both CV2NB and PR made fairer decisions than their corresponding

baseline methods

For the larger η, the accuracy of PR tends to be declined, but no

clear trends are found in terms of fairness

The unstablity of PR would be due to the influence of approximation

or the non-convexity of objective function

19 - Related Work

20 - Finding Unfair Association Rules

[Pedreschi+ 08, Ruggieri+ 10]

ex: association rules extracted from German Credit Data Set

(a) city=NYC class=bad (conf=0.25)

0.25 of NY residents are denied their credit application

(b) city=NYC & race=African class=bad (conf=0.75)

0.75 of NY residents whose race is African are denied their credit application

conf( A

extended lift (elift)

elift =

∧ B C )

conf( A C )

the ratio of the confidence of a rule with additional condition

to the confidence of a base rule

a-protection : considered as unfair if there exists association rules

whose elift is larger than a

ex: (b) isn’t a-protected if a = 2, because elift = conf(b) / conf(a) = 3

They proposed an algorithm to enumerate rules that are not a-protected

21 - Situation Testing

[Luong+ 11]

Situation Testing : When all the conditions are same other than a

sensitive condition, people in a protected group are considered as

unfairly treated if they received unfavorable decision

They proposed a method for finding

people in

unfair treatments by checking the

a protected group

statistics of decisions in k-nearest

neighbors of data points in a

protected group

Condition of situation testing is

Pr[ Y | X, S=a ] = Pr[ Y | X, S=b ] ∀ X

This implies the independence

between S and Y

k-nearest neighbors

22 - Fairness-aware Data Publishing

[Dwork+ 11]

data owner

loss function

original data

representing utilities

for the vendor

data representation

vendor (data user)

archtype

so as to maximize

vendor’s utility

under the constraints

to guarantee fairness

in analysis

fair decisions

They show the conditions that these archtypes should satisfy

This condition implies that the probability of receiving favorable

decision is irrelevant to belonging to a protected group

23 - Conclusion

Contributions

three causes of unfairness: prejudice, underestimation, and negative

legacy

a prejudice remover regularizer, which enforces a classifier's

independence from sensitive information

experimental results of logistic regressions with our prejudice remover

Future Work

Computation of prejudice remover has to be improved

Socially Responsible Mining

Methods of data exploitation that do not damage people’s lives, such

as fairness-aware mining, PPDM, or adversarial learning, together

comprise the notion of socially responsible mining, which it should

become an important concept in the near future.

24