To higher appreciate this issue, we currently promote theoretic knowledge. With what pursue, i earliest design the newest ID and you can OOD studies withdrawals right after which obtain aplikacja grizzly statistically new model output out-of invariant classifier, in which the design aims to not trust the environmental enjoys to own forecast.
Options.
We consider a binary classification task where y ? < ?>, and is drawn according to a fixed probability ? : = P ( y = 1 ) . We assume both the invariant features z inv and environmental features z e are drawn from Gaussian distributions:
? inv and you may ? 2 inv are exactly the same for all environment. In contrast, the environmental variables ? age and you can ? 2 elizabeth are very different round the age , in which the subscript is employed to indicate the significance of new ecosystem together with index of your ecosystem. In what comes after, i present the outcome, which have outlined proof deferred regarding Appendix.
Lemma step one
? age ( x ) = M inv z inv + M age z elizabeth , the perfect linear classifier to possess a breeding ground e provides the corresponding coefficient dos ? ? step one ? ? ? , where:
Remember that the newest Bayes max classifier uses environmental enjoys which can be academic of your own label however, non-invariant. Alternatively, we hope in order to count merely towards the invariant has whenever you are ignoring environmental features. Particularly a good predictor is also called optimum invariant predictor [ rosenfeld2020risks ] , that is given regarding the following. Note that that is a special matter of Lemma step one with M inv = We and Yards e = 0 .
Proposal step one
(Max invariant classifier using invariant features) Imagine the newest featurizer recovers the invariant ability ? age ( x ) = [ z inv ] ? elizabeth ? Elizabeth , the perfect invariant classifier comes with the corresponding coefficient 2 ? inv / ? dos inv . step 3 3 step 3 The ceaseless name regarding classifier weights is actually diary ? / ( step one ? ? ) , and therefore we neglect here plus in the brand new sequel.
The optimal invariant classifier explicitly ignores environmentally friendly possess. But not, an invariant classifier learned cannot always depend simply toward invariant features. Second Lemma means that it could be possible knowing a keen invariant classifier that depends on environmentally friendly has actually when you are gaining all the way down risk compared to the maximum invariant classifier.
Lemma dos
(Invariant classifier using non-invariant features) Suppose E ? d e , given a set of environments E = < e>such that all environmental means are linearly independent. Then there always exists a unit-norm vector p and positive fixed scalar ? such that ? = p T ? e / ? 2 e ? e ? E . The resulting optimal classifier weights are
Keep in mind that the perfect classifier pounds dos ? are a constant, hence cannot count on the environment (and none really does the suitable coefficient having z inv ). The fresh new projection vector p acts as an excellent “short-cut” that the student may use so you can produce a keen insidious surrogate rule p ? z e . Just like z inv , which insidious laws can also bring about an invariant predictor (across surroundings) admissible by invariant learning tips. This means, regardless of the different research delivery all over environments, the optimal classifier (having fun with non-invariant has actually) is similar for every environment. We currently inform you the head abilities, in which OOD identification can falter around for example an invariant classifier.
Theorem 1
(Failure of OOD detection under invariant classifier) Consider an out-of-distribution input which contains the environmental feature: ? out ( x ) = M inv z out + M e z e , where z out ? ? inv . Given the invariant classifier (cf. Lemma 2), the posterior probability for the OOD input is p ( y = 1 ? ? out ) = ? ( 2 p ? z e ? + log ? / ( 1 ? ? ) ) , where ? is the logistic function. Thus for arbitrary confidence 0 < c : = P ( y = 1 ? ? out ) < 1 , there exists ? out ( x ) with z e such that p ? z e = 1 2 ? log c ( 1 ? ? ) ? ( 1 ? c ) .