Thesis Projects
PhD Thesis
Silander Lab, Massey University, Auckland, New Zealand & Jones Lab, Genome Sciences Centre, Vancouver, BC, Canada September 2018 – August 2022
Machine Learning Methods for Colorectal Cancer Dataset Analysis
- Developed a rank based permutation approach to assigning p-values to features and feature sets from importance scores in random forest models
- Created the R package Rf2pval in order to make the aforementioned statistical method user-friendly, easy to implement and to visualize results
- Created an RShiny app for visualizing colorectal microbiome dataset results from a colleague’s high throughput microbial data analysis pipeline (MetaFunc)
- Developed a multi-language bioinformatics pipeline for extensive machine-learning model development large RNA-seq datasets using a high performance cluster computing system
- Performed a feasibility study on how to combine RNA-seq genes and Microbial human unmapped read data in a random forest model.
- Analyzed a novel colorectal cancer RNA-seq dataset using machine learning approaches
- Trained, tested & validated on an independent dataset 3 random forest models for genes, genes + microbes, and microbes alone, which can differentiate CRC anatomical side with 80 to 90% accuracy
- Associated novel and known biomarkers discovered in the random forest models with either right or left-sided colorectal cancers
Honours Thesis
Bieda Lab, University of Calgary September 2014 – April 2015
Advanced Computational Approaches to Omics Data Set Analyses
- Created advanced R programs for combining Gene Expression Microarray analysis data with ChIP-seq analysis data for pharmacogenomic applications
- Datamined large data sets obtained from NCBI GEO/SRA
- Created scripts to automate graphical display production of downstream ChIP-seq analysis and gene expression microarray analysis results
- Integrated gene expression analysis with a visual pathway analysis by writing R programs that color-code genes within KEGG pathway diagrams based on their expression levels