#1: Neural nets and function approximation

Estimating conditional probabilities

A supervized learning problem - learning a mapping from input variables to output variables - may be seen as a conditional probability estimation problem. For instance, consider a toy example below:

Patient (x(i)x^{(i)}) Age Sex (M=0, W=1) Outcome (y(i)=Cancery^{(i)}= \text{Cancer})
1 55 0 1
2 65 1 0

Our training data set consists of two patients, or two examples (x(i),y(i)),i=1,2x^{(i)}, y^{(i)}), i = 1, 2. Where xix^i are features from the variables: Age and Sex, and yiy^i is the binary outcome: 1 if a patient has cancer, and 0 otherwise. Our prediction problem

  • I want to predict the probability of cancer given data, i.e., p(y=Cancerx)p(y=\text{Cancer}|x)
  • My ground truth

    • p(y=Cancerx(1)={Age=55,Sex=0)=1p(y=\text{Cancer}|x^{(1)} = \{\text{Age} = 55, \text{Sex} = 0) = 1
    • p(y=Cancerx(2)={Age=65,Sex=1)=0p(y=\text{Cancer}|x^{(2)} = \{\text{Age} = 65, \text{Sex} = 1) = 0
  • I can approximate this conditional probability p(y=Cancerx)p(y=\text{Cancer}|x) with a function fΘ(x)f_{\Theta}(x) parameterized with Θ\Theta

Learn fΘ(x)f_{\Theta}(x) to approximate p(y=Cancerx)p(y=\text{Cancer}|x)