BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:On Selection Bias and Fairness Issues in Machine Learning - Stepha
 n Clémençon (Telecom Paris\, Institut Polytechnique de Paris)
DTSTART:20250708T093000Z
DTEND:20250708T103000Z
UID:TALK233974@talks.cam.ac.uk
DESCRIPTION:With the deluge of digitized information in the Big Data era\,
  massive datasets are becoming increasingly available for learning predict
 ive models. However\, in many situations\, the poor control of the data ac
 quisition processes may jeopardize the outputs of machine-learning algorit
 hms and selection bias issues are now the subject of much attention. Recen
 tly\, the accuracy of facial recognition algorithms for biometrics applica
 tions has been fiercely discussed for instance\, its monitoring over time 
 revealing sometimes a predictive performance very far from what was expect
 ed at the end of the training stage. The use of machine-learning methods f
 or designing medical diagnosis/prognosis support tools is currently trigge
 ring the same type of fear. Making the enthusiasm and the confidence for w
 hat can be accomplished by machine learning durable requires to revisit pr
 actice and theory both at the same time. It is precisely the purpose of th
 is talk to explain and illustrate through real examples how to extend Empi
 rical Risk Minimization\, the main paradigm of statistical learning\, when
  the training observations are biased\, i.e. are drawn from distributions 
 that may significantly differ from that of the data in the test/prediction
  stage. As expected\, there is &lsquo\;no free lunch&rsquo\;: practical\, 
 theoretically grounded\, solutions do exist in a variety of contexts (e.g.
  training examples composed of censored/truncated/survey data) but their i
 mplementation crucially depends on the availability of relevant auxiliary 
 information about the data acquisition process. One should also have in mi
 nd that the &lsquo\;bias&rsquo\; in machine-learning\, as perceived by the
  general public\, also refers to situations where the predictive error exh
 ibits a huge disparity\, to cases where the predictive algorithms are much
  less accurate for certain population segments than for others. If certain
  facial recognition algorithms make more mistakes for certain ethnic group
 s for instance\, representativeness issues concerning the training data sh
 ould not be incriminated solely: the variability in the error rates can be
  due just as much to the intrinsic difficulty of certain recognition probl
 ems or to the limitations of the state-of-the-art machine-learning technol
 ogies. As will be discussed in this talk\, trade-offs between fairness and
  predictive accuracy then become unavoidable.&nbsp\;
LOCATION:Seminar Room 2\, Newton Institute
END:VEVENT
END:VCALENDAR