Results Plot

Results Table

Download Table as Tab-Delimited File

Sample Rectum Adenocarcinoma (READ) Dataset (.zip)

Results Plot

Results Table

Download Table as Tab-Delimited File

Tumors: Small blue points; Cell Lines: Labeled points; See documentation for more details

Table of Contents

About TumorComparer

Built by Rileen Sinha, Augustin Luna, Nikolaus Schultz, and Chris Sander using R Shiny.

Cell lines derived from human tumors are often used in pre-clinical cancer research, but some cell lines may be too different from tumors to be good models. Genomic and molecular profiles can be used to guide the choice of cell lines suitable for particular investigations, but not all features may be equally relevant. We present TumorComparer, a computational method for comparing cellular profiles with the flexibility to place a higher weight on functional features of interest. In a first pan-cancer application, we compare 600 cell lines and 8323 tumors of 26 cancer types, using weights to emphasize recurrent genomic alterations or expression dysregulation. We characterize the similarity of cell lines and tumors within and across cancers, identifying outlier and mislabelled cell lines as well as good matches, and identify cancers with an unusually high number of good or poor representative cell lines. The weighted similarity method in the future may be useful to assess genomic-molecular patient profiles for stratification in clinical trials and personalized choice of therapy.


The “Pre-Computed” tab allows users to explore the results of our systematic analysis available in the publication, while the “User Analysis” section provides users the option of running TumorComparer with their own data.

Pre-Computed Analysis

All cancer types from the publication are available. Balloon plots show the most relevant cell lines and a corresponding searchable table.

User Data Analysis

This tab allows users to run TumorComparer on their own data. Users will upload a zip file containing the input files with a pre-specified naming convention.

NOTE: For large computational pipelines or large amounts of data, it is suggested that users run TumorComparer locally.


  • Default (Background) Weight: 0.01 (DEFAULT); all other genes

Users are should review the publication for more information about these parameters. The default weights were used as part of the systematic analysis.

Users should rely on their understanding of the problem they are trying to address to select weights. Users can use 1 and 0 as a starting point for known cancer gene and default weights if the relative difference in the magnitude in importance between the sets of genes is unknown.

Data Formats

Input files

NOTE: You do NOT have to provide input files for all data types (expression: exp, mutation: mut, copy number: cna). Make sure either ALL files for a data type are included OR NONE of them are included, based on your available data.

  • tumor_mut.txt: a file with binary mutation data for tumors
  • tumor_cna.txt: a file with GISTIC data for tumors; this can be 5-values (-2, -1, 0, 1, 2) or continuous
  • tumor_exp.txt: a file with gene expression data for tumors
  • cell_line_mut.txt: See corresponding tumor file
  • cell_line_exp.txt: See corresponding tumor file
  • cell_line_cna.txt: See corresponding tumor file
  • genes_and_weights_mut.txt: a file with weights for cancer-specific set of recurrently mutated genes. A tab-delimited file - the first column has the gene names, and the second column specifies the weights.
  • genes_and_weights_cna.txt: See genes_and_weights_mut.txt
  • genes_and_weights_exp.txt: See genes_and_weights_mut.txt
  • default_weights_for_known_cancer_genes_mut.txt: a file with weights for genes known to be recurrently altered/mutated in cancer (e.g. recurrently mutated genes in TCGA pan-cancer analyses). A two-column tab-delimited file - the first column has the gene names and the second column specifies the weights.
  • default_weights_for_known_cancer_genes_exp.txt: See default_weights_for_known_cancer_genes_mut.txt
  • default_weights_for_known_cancer_genes_cna.txt: See default_weights_for_known_cancer_genes_mut.txt

Example Data

Below is example data for the various supported data types.

  • Sample names should match within the tumor data and separately, within the cell line data
  • Analysis is platform agnostic (see publication; i.e., microarray or RNA-seq can be used)

Expression Data

RCM.1 SW837 SW1463 SNU.61 CaR.1
AREG 11.837 9.195 9.018 9.939 10.44
MLXIPL 9.513 8.823 11.749 9.008 8.029
MUC16 3.062 7.281 6.235 2.774 9.931
ADAMTS15 6.177 5.697 5.272 6.198 6.004
HBB 7.088 2.609 1.093 1.379 6.138
COL1A1 6.967 6.647 5.955 5.993 6.891

Mutation Data

RCM.1 SW837 SW1463 SNU.61 CaR.1
ANKRD30A 0 0 0 0 0
APC 1 1 1 1 0
APOB 0 0 0 0 0
ARHGAP35 0 0 0 0 0
ARID1A 0 0 0 0 0
ARID1B 0 0 0 0 0

Copy Number Data

Below is an example of GISTIC discretized

RCM.1 SW837 SW1463 SNU.61 CaR.1
PFN1P2 0 0 0 0 0.667
PDE4DIP 0 0 0 0 0.667
SEC22B 0 0 0 0 0.667
NOTCH2NL 0 0 0 0 0.667
NBPF10 0 0 0 0 0.667
HFE2 0 0 0 0 0.667

Example Datasets

  • Rectal Adenocarcinoma (READ) TCGA/CCLP: Small dataset, Link

Additional Features

The TumorComparer software package contains a number of additional parameters that might be of interest to users.


We appreciate any feedback/suggestions you may have; please send feedback to publication corresponding authors.


TumorComparer: 0.99.14