Shawn Cho

Shawn Cho

Bioinformatician, Data Scientist, Web Developer · La Jolla, CA ·


Johns Hopkins University

Baltimore, MD

Bioinformatics MS, GPA: 3.967

Sept. 2014 — May 2016

Relevant Coursework

University of California, San Diego

La Jolla, CA

Bioengineering: Biotech BS, GPA: 3.588

Sept. 2010 — June 2014

Relevant Coursework


Senior Scientist, Bioinformatics

South San Francisco, CA

Loxo Oncology at Lilly

Dec. 2020 - Present

Informatics, Scientist I, II

San Diego, CA


Apr. 2016 - Feb. 2018 - Dec. 2020

Lead Bioinformatics Data Scientist
  • Working cross-functionally in oncology, neuro-degenerative, and osteoarthritis departments to support scientists in designing experiments, creating analysis pipelines, and elucidating biological insight.
  • Statistical modeling and data mining
    • Applying low-n, high-p type machine learning algorithms to find predictive biomarkers. DLDA, Random Forests, VSURF, SVM algorithms implemented in R.
    • Performing survival analysis using Cox regression models with publicly available data (TCGA) to analyze genes and pathways associated with poorer survival outcome.
    • Multivariate and univariate regression modeling to identify pharmacodynamic trends within biological systems.
    • Sigmoidal Curve fitting for EC50, AUC metrics. Algorithm developments to generate sorting metrics for identify lead compounds.
    • Clustering data using LDA, PCA or hierarchical algorithms for QC and identifying discerning variables.
  • Bioinformatics pipeline development
    • Elucidate differentially expressed genes, enriched biological pathways, and alternative splicing events. Tools include DESeq2, GSEA, IRFinder, JUM, rMATs, R, Python3, Linux command line, MySQL. Datasets include RNASeq, qPCR, protein arrays, Nanostring gene panels.
  • Data Wrangling, ETL and Data visualizations
    • Custom extract, transform and load scripts in Python3 for data wrangling.
    • Developed descriptive statistics plots for visual-minded scientists, including heatmaps, 2-way plots, barplots, box-and-whisker plots. R’s tidyverse, ggplot2, ComplexHeatmap.
LIMS Manager, Database Engineer & Administrator
  • Head manager and site reliability engineer of company’s R&D LIMS (Dotmatics). Oracle 12g on Windows 2008R2, Tomcat 8.5 on Windows 2012.
  • Support and managing role for R&D team of over 50 to effectively use tools and applications to capture, analyze and report R&D data and drive decisions.
  • Migration of legacy data, backup and disaster recovery solutions.
  • Identify and implement database-backed solutions to automate scientists’ workflow from chemical synthesis to biological assay metric calculations.

Scientist, Backend Engineer, Graph Consultant

San Diego, CA

i(x) Investments

Mar. 2020 - Sept. 2020

  • Aided an investment company to turn data into actionable insights through graphical models.
  • Automated web scraping using Selenium and pyquery for insights in competitive intelligence.
  • Visualizations, graph algorithm implementation (shortest paths, centrality, connectivity) through Neo4J graph database.
  • Integration of authentication with RESTful services endpoint using SailsJS and Auth0.
  • Authentication security implementation with JSON Web Tokens, with lock-out mechanisms for potential hackers attempting brute-force methods.

Founder, Full-Stack Web Developer & Instructor


Mar. 2014 — Present


Sebat Lab

UCSD Dept. of Psychiatry and Cellular & Molecular Medicine

May 2014 — Aug. 2014

  • Analyzed facial features of 16p11.2 CNV duplication and deletion cohorts using 3dMD software tools.
  • Performed pairwise comparison statistical analysis correcting for false discovery rate in R.

May 2012 — June 2013

  • Utilized Random Forests R package and the "leave-one-out" method to identify 16p11.2 copy number variant genotypes in autistic adolescents using clinical variables.
  • Analyzed locomotor behavior, pre-pulse inhibition, and fear conditioning data using linear regression models on transgenic mice with various copies of theVIPR2 gene.
  • Developed scripts in R to select fixed nucleotide differences between human and chip genomes while masking SNPs, repeats, and DGVs for primer design.
  • Formulated algorithms to calculate silhouette scores for clustering quality control.

Java Application Developer

Bodmer Lab

Sanford-Burnham Medical Research Institute

Aug. 2013 — Aug. 2014

  • Integrated microscope, camera, and a stage controller to develop a Micro-Manager front-end plugin.
  • End product enabled users to easily detect and record 30-second output videos of theDrosophila Melanogaster's heart under hypoxia.
  • Used quantitative image analysis algorithm to elucidate cardiac parameters from output video.

Lab Research Assistant

Creel Lab

UCSD Dept. of Cognitive Science

Oct. 2011 — May 2012

  • Assisted in research consisting of processing complex acoustic information, especially in speech and language.
  • Developed stimuli for experiments involving language and speech recognition.
  • Administered psychological tests to evaluate audible perception capabilities.

Skillset Overview

  • Machine Learning: Python: scikit-learn, pandas. R: randomForests, vsurf, dlda.
  • Databases: MySQL, Oracle, Neo4J, Mongodb.
  • Languages: Python3, R, Bash (Awk/Sed), Java, C/C++.
  • Web Frameworks: JavaScript (JQuery, NodeJS, AngularJS), HTML5, CSS3/SASS (Twitter Bootstrap, Bourbon), PHP (Laravel MVC).
  • OS Platforms: Linux (Ubuntu Desktop & Server), OS X, Windows.
  • Software Tools: Git, Eclipse, Sublime, Vim, MySQLWorkBench, Adobe Photoshop, Indesign & Illustrator.
  • Bioinformatics Tools: Galaxy + Linux command line, SAMtools, BEDtools, Tuxedo Suite, IGV, UCSC Genome Browsers, Biological databases: TCGA, dbSNP, NCBI, Ensembl, OMIM.

Honors, Awards & Certs

  • American Association for Cancer Research (AACR)
  • Institute of Electrical and Electronics Engineer (IEEE)
  • Oracle Database 12c: SQL Certificate
  • Linux+ Certificate
  • Coursera Genome Data Science Specialization
  • Ranked in Top 100 worldwide (99.7%) on Rosalind, a site that offers challenging bioinformatics problems. Solved with Python 3.4.3, and placed second in Illumina's 2013 Bioinformatics Code Challenge.
  • Received the Amgen 2012 Scholar's award, a competitive and fully-funded program for undergraduates to conduct a summer's worth of research with a UCSD professor.
  • Nominated for MMW15's Writing Showcase Award in Spring of 2014. "The Human Genome Project - a Work of Collaboration or Competition?"


Cho S, Choudhury A, Fleener C, Do L, Bossard C, Chung C, Phalen T, Cha S. (2020). "Transcriptome analysis of TCGA prostate cancer samples identifies an association of poorer survival and aggressive disease biology with CDC-like kinase (CLK) expression and spliceosome regulation." AACR Annnual Meeting II. 2020 Jun 22.


In prostate cancer, alternative splicing of mRNA and spliceosome activity are implicated in several areas of disease pathogenesis. This is exemplified by the strong association of androgen receptor splice variants with treatment resistance and poor clinical outcome in castration-resistant disease. Therefore, pharmacologic targeting of spliceosome-regulating proteins such as CLKs and serine/arginine-rich splicing factors (SRSFs) represents a novel treatment approach for prostate cancer. To evaluate the therapeutic potential of inhibiting CLK activity in prostate cancer, the association between splicing-related gene expression and survival was investigated in The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD) data collection (N=495).

Survival analysis of RNA-seq data assessed 17,879 genes to measure their association with progression-free interval (PFI). Using transcript per million as the metric for normalized gene expression, age-adjusted Cox proportional hazards regression models were performed for each gene (R v3.6.0, coxph v2.43-3). A total of 3,145 genes significantly correlated with worse prognosis (P-adj<0.10, Cox coefficient >0). CLK1 (P-adj=0.0218, HR=1.5939), CLK2 (Padj=0.001298, HR=2.1393), and SRSF2 (P-adj=0.00167, HR=3.2917) were found to be positively associated with poorer PFI, ranking 1202, 400, and 437, respectively. Reactome pathway analysis of the significant gene set showed that mRNA splicing and processing accounted for 5 of the 19 pathways that were strongly associated with poorer PFI.

An additional pathway analysis (GSEA v3.0, MSigDB v6.2) of tumors categorized by PTEN status to assess relationship with disease severity showed that mRNA splicing (P-adj=0.0243, NES=1.7714) was enriched in PTEN-null versus PTEN-wt tumors. Other pathways of interest, including Wnt signaling (P-adj=0.0187, NES=1.846), cell cycle (P-adj=0.0124, NES=1.974), chromatin remodeling (P-adj=0.0135, NES=1.901), DNA damage repair (P-adj=0.013974, NES=1.8934), and PTEN regulation (P-adj=0.0230, NES=1.7861), were also enriched in PTEN-null tumors.

Lastly, a survival analysis within all TCGA-PRAD patients showed that low CLK1 (P=0.03) and CLK2 (P=0.0004) expression were individually associated with better prognosis versus their highexpressing counterparts. Analysis of CLK3 and CLK4 expression did not reach statistical significance.

Collectively, these findings revealed an association ofspliceosome activity and CLK1/2 expression with aggressive disease biology in prostate cancer. A Phase 1 study of SM08502, a novel, smallmolecule pan-CLK inhibitor, in subjects with advanced solid tumors is ongoing (NCT03355066). This analysis nominates prostate cancer as a tumor type worth further exploring for the clinical activity of SM08502.

Tam B, Chiu K, Chung H, Bossard C, Nguyen J.D., Creger E, Eastman B.W., Mak C, Ibanez M, Ghias A, Cahiwat J, Do L, Cho S, Nguyen J, Deshmukh V, Stewart J, Chen C, Barroga C, Dellamary L, KC S, Phalen TJ, Cha S, Yazici Y. "The CLK inhibitor SM08502 induces anti-tumor activity and reduces Wnt pathway gene expression in gastrointestinal cancer models." Cancer Letters. 2019 Sep 9. doi: 10.1016/j.canlet.2019.09.009


The Wnt/β-catenin signaling pathway is aberrantly activated in colorectal (CRC) and many other cancers, and novel strategies for effectively targeting it may be needed due to its complexity. In this report, SM08502, a novel small molecule in clinical development for the treatment of solid tumors, was shown to reduce Wnt pathway signaling and gene expression through potent inhibition of CDC-like kinase (CLK) activity. SM08502 inhibited serine and arginine rich splicing factor (SRSF) phosphorylation and disrupted spliceosome activity, which was associated with inhibition of Wnt pathway-related gene and protein expression. Additionally, SM08502 induced the generation of splicing variants of Wnt pathway genes, suggesting that its mechanism for inhibition of gene expression includes effects on alternative splicing. Orally administered SM08502 significantly inhibited growth of gastrointestinal tumors and decreased SRSF phosphorylation and Wnt pathway gene expression in xenograft mouse models. These data implicate CLKs in the regulation of Wnt signaling and represent a novel strategy for inhibiting Wnt pathway gene expression in cancers. SM08502 is a first-in-class CLK inhibitor being investigated in a Phase 1 clinical trial for subjects with advanced solid tumors (NCT03355066).

Deshmukh V, Hu H, Barroga C, Bossard C, KC S, Dellamary L, Stewart J, Chiu K, Ibanez M, Pedraza M, Seo T, Do L, Cho S, Cahiwat J, Tam B, Tambiah JRS, Hood J, Lane NE, Yazici Y. "Modulation of the Wnt pathway through inhibition of CLK2 and DYRK1A by lorecivivint as a novel, potentially disease-modifying approach for knee osteoarthritis treatment."Osteoarthritis and Cartilage. 2019 May 24. doi: 10.1016/j.joca.2019.05.006


Objectives: Wnt pathway upregulation contributes to knee osteoarthritis (OA) through osteoblast differentiation, increased catabolic enzymes, and inflammation. The small-molecule Wnt pathway inhibitor, lorecivivint (SM04690), which previously demonstrated chondrogenesis and cartilage protection in an animal OA model, was evaluated to elucidate its mechanism of action.

Design: Biochemical assays measured kinase activity. Western blots measured protein phosphorylation in human mesenchymal stem cells (hMSCs), chondrocytes, and synovial fibroblasts. siRNA knockdown effects in hMSCs and BEAS-2B cells on Wnt pathway, chondrogenic genes, and LPS-induced inflammatory cytokines was measured by qPCR. In vivo anti-inflammation, pain, and function were evaluated following single intra-articular (IA) lorecivivint or vehicle injection in the monosodium iodoacetate (MIA)-induced rat OA model.

Results: Lorecivivint inhibited intranuclear kinases CDC-like kinase 2 (CLK2) and dual-specificity tyrosine phosphorylation-regulated kinase 1A (DYRK1A). Lorecivivint inhibited CLK2-mediated phosphorylation of serine/arginine-rich (SR) splicing factors and DYRK1A-mediated phosphorylation of SIRT1 and FOXO1. siRNA knockdowns identified a role for CLK2 and DYRK1A in Wnt pathway modulation without affecting β-catenin with CLK2 inhibition inducing early chondrogenesis and DYRK1A inhibition enhancing mature chondrocyte function. NF-κB and STAT3 inhibition by lorecivivint reduced inflammation. DYRK1A knockdown was sufficient for anti-inflammatory effects, while combined DYRK1A/CLK2 knockdown enhanced this effect. In the MIA model, lorecivivint inhibited production of inflammatory cytokines and cartilage degradative enzymes, resulting in increased joint cartilage, decreased pain, and improved weight-bearing function.

Conclusions: Lorecivivint inhibition of CLK2 and DYRK1A suggested a novel mechanism for Wnt pathway inhibition, enhancing chondrogenesis, chondrocyte function, and anti-inflammation. Lorecivivint shows potential to modify structure and improve symptoms of knee OA.

Deshmukh V, O'Green AL, Bossard C, Seo T, Lamangan L, Ibanez M, Ghias A, Lai C, Do L, Cho S, Cahiwat L, Chiu K, Pedraza M, Anderson S, Harris R, Dellamary L, KC S, Barroga C, Melchior B, Tam B, Kennedy S, Tambiah J, Hood J, Yazici Y. "A small-molecule inhibitor of the Wnt pathway (SM04690) as a potential disease modifying agent for the treatment of osteoarthritis of the knee." 2017 Sep 15. doi: 10.10.16/j.joca.2017.08.015


Objectives: Osteoarthritis (OA) is a degenerative disease characterized by loss of cartilage and increased subchondral bone within synovial joints. Wnt signaling affects the pathogenesis of OA as this pathway modulates both the differentiation of osteoblasts and chondrocytes, and production of catabolic proteases. A novel small-molecule Wnt pathway inhibitor, SM04690, was evaluated in a series of in vitro and in vivo animal studies to determine its effects on chondrogenesis, cartilage protection and synovial-lined joint pathology.

Design: A high-throughput screen was performed using a cell-based reporter assay for Wnt pathway activity to develop a small molecule designated SM04690. Its properties were evaluated in bone-marrow-derived human mesenchymal stem cells (hMSCs) to assess chondrocyte differentiation and effects on cartilage catabolism by immunocytochemistry and gene expression, and glycosaminoglycan breakdown. In vivo effects of SM04690 on Wnt signaling, cartilage regeneration and protection were measured using biochemical and histopathological techniques in a rodent acute cruciate ligament tear and partial medial meniscectomy (ACLT + pMMx) OA model.

Results: SM04690 induced hMSC differentiation into mature, functional chondrocytes and decreased cartilage catabolic marker levels compared to vehicle. A single SM04690 intra-articular (IA) injection was efficacious in a rodent OA model, with increased cartilage thickness, evidence for cartilage regeneration, and protection from cartilage catabolism observed, resulting in significantly improved Osteoarthritis Research Society International (OARSI) histology scores and biomarkers, compared to vehicle.

Conclusions: SM04690 induced chondrogenesis and appeared to inhibit joint destruction in a rat OA model, and is a candidate for a potential disease modifying therapy for OA.

Qiu Y, Arbogast T, Lorenzo SM, Li H, Tang SC, Richardson E, Hong O,Cho S, Shanta O, Pang T, Corsello C, Deutsch CK, Chevalier C, Davis EE, Iakoucheva LM, Herault Y, Katsanis N, Messer K, Sebat J. "Oligogenic Effects of 16p11.2 Copy-Number Variation on Craniofacial Development." Cell Rep. 2019 Sep 24; 28(13):3320-3328.e4. PMID: 31553903.


A copy-number variant (CNV) of 16p11.2 encompassing 30 genes is associated with developmental and psychiatric disorders, head size, and body mass. The genetic mechanisms that underlie these associations are not understood. To determine the influence of 16p11.2 genes on development, we investigated the effects of CNV on craniofacial structure in humans and model organisms. We show that deletion and duplication of 16p11.2 have “mirror” effects on specific craniofacial features that are conserved between human and rodent models of the CNV. By testing dosage effects of individual genes on the shape of the mandible in zebrafish, we identify seven genes with significant effects individually and find evidence for others when genes were tested in combination. The craniofacial phenotypes of 16p11.2 CNVs represent a model for studying the effects of genes on development, and our results suggest that the associated facial gestalts are attributable to the combined effects of multiple genes.

Kusenda M, Vacic V, Malhotra D, Rodgers L, Pavon K, Meth J, Kumar RA, Christian SL, Peeters H, Cho, S., Addington A, Rapoport JL, and Sebat J. "The Influence of Microdeletions and Microduplications of 16p11.2 on Global Transcription Profiles." Journal of Child Neurology 30.14 2015: 1947-953.


Copy number variants (CNVs) of a 600 kb region on 16p11.2 are associated with neurodevelopmental disorders and changes in brain volume. The authors hypothesize that abnormal brain development associated with this CNV can be attributed to changes in transcriptional regulation. The authors determined the effects of 16p11.2 dosage on gene expression by transcription profiling of lymphoblast cell lines derived from 6 microdeletion carriers, 15 microduplication carriers and 15 controls. Gene dosage had a significant influence on the transcript abundance of a majority (20/34) of genes within the CNV region. In addition, a limited number of genes were dysregulated in trans. Genes most strongly correlated with patient head circumference included SULT1A, KCTD13, and TMEM242. Given the modest effect of 16p11.2 copy number on global transcriptional regulation in lymphocytes, larger studies utilizing neuronal cell types may be needed in order to elucidate the signaling pathways that influence brain development in this genetic disorder.


Relating Behavioral Clinical Phenotypes to Genotype in Autism Spectrum Disorder

Nominated to present at the 2013 UCSD Undergraduate Research Conference.

Apr. 2013


As geneticists, we are interested in understanding how genes influence complex traits. We have investigated the relationship of genes to neurodevelopment and behavioral phenotypes utilizing Machine Learning Algorithms (MLA) to best predict reciprocating genotypes. This study focuses particularly on the 16p11.2 Copy Number Variant (CNV) of the human genome, which is known to contain the gene Potassium Channel Tetramerization domain 13 (KCTD13) that confers risk to several distinguishing features on brain development and psychiatric features. Deletion is associated with larger head size and BMI, with an increased risk for Intellectual Disability and Autism, while duplication is associated with smaller head size and BMI, and a range of adult psychiatric disorders. We hypothesized that it would be possible to differentiate deletions from duplications at an 80% success rate using the MLA randomForests on phenotypical data.

The Mouse and the Viper: Implications of the VIPR2 gene on a mouse's phenotype.

Aug. 2013

Nominated to present at the 2012 UCSD Summer Undergraduate Research Conference.


Schizophrenia is a highly heritable neuropsychiatric disorder characterized by social withdrawal, delusions, and hallucinations. Our group has identified a genomic duplication of the neuropeptide receptor gene VIPR2 on chromosome 7 that confers significant risk for schizophrenia. As yet, little is known about the effect of this duplication on brain development and behavior. Here, we investigate this by using an animal model. We analyzed the behavior, and response to various tasks, of mice with none to four copies of the VIPR2 gene. Mice were observed for locomotor and stereotyped behaviors as well as pre-pulse inhibition (PPI), all of which are known to be altered when people start developing schizophrenia. Using this model, we will test the hypothesis that high copy numbers of the VIPR2 gene result in locomotor hyperactivity and a deficit in PPI - comparable behaviors found in people with schizophrenia.

RNASeq Analysis of a Yeast Knockout Strain

Performed an RNASeq from RNA extraction to cDNA library construction and analyzing with bioinformatics tools.

June 2013


The purpose of this lab was to use RNAseq and bioinformatics tools in order to find which gene was knocked out of a Saccharomyces cerevisiae yeast strain. Gene expression was quantified and compared to corresponding data of two wild type strains. After thorough pathway analysis using DAVID, Knockout strain #2 was found to be YHL009C, as a number of pathways were being down-regulated, including oxidative phosphorylation (Benjamini = 3.1E-07), intron homing (7.29E-10), and RNA processing (2.11E-02). No pathways were significant up-regulated.