AI: PBL Problem 3

In PBL-3, we try to help an oncologist friend who's woking on genomics data to cure cancer. Specifically, the scientist want to analyze gene expression data of breast cancer patients.

  • Gene expression data: link
  • Clinical data: link
  • Python code to read data: link

The scientist want us to help him in three perspectives:

    • Visualize the data
      • This is to find how the patients are groupped and also to find if there's any patient who's gene expression is atypical compared to the group
    • Find interesting genes
      • This is to discover genes that contributes the most to the clinical outcome.
      • The scientist want to investigate our findings further. We cannot perform bio-medical experiments, so use Google about your findings to verify the genes you've found could be something interesting.
    • Build an effective classifer of grades 1, 2, and 3 (grade and statge)
      • This classifier will help to detect if a new patient will be of high risk (grade 3) or low risk (grade 1).


    • Python notebook
    • Individual submission
    • Replaces the final exam (grading of the final exam will be 'relative')
    • Due Dec 18 (class 1), Dec 20 (class 2)