HTTP/2 301
date: Sun, 18 Jan 2026 09:50:57 GMT
content-length: 0
location: https://doi.org/10.1101/174474
server: cloudflare
vary: Origin
expires: Mon, 19 Jan 2026 09:50:57 GMT
permissions-policy: interest-cohort=(),browsing-topics=()
cf-cache-status: DYNAMIC
nel: {"report_to":"cf-nel","success_fraction":0.0,"max_age":604800}
strict-transport-security: max-age=31536000; includeSubDomains; preload
report-to: {"group":"cf-nel","max_age":604800,"endpoints":[{"url":"https://a.nel.cloudflare.com/report/v4?s=Q%2BgDV4E4%2FyVCaDB54wb9b0fyZbCYX1ML6%2BeSrIkmxV%2B1ROIPtAhxRq%2B9RCSHTK6LcMW2yiRGXbHJXjzK1BYR9YpLHja07w%3D%3D"}]}
cf-ray: 9bfd30e738d3c537-BLR
alt-svc: h3=":443"; ma=86400
HTTP/2 302
date: Sun, 18 Jan 2026 09:50:57 GMT
content-type: text/html;charset=utf-8
location: https://biorxiv.org/lookup/doi/10.1101/174474
server: cloudflare
vary: Origin
vary: Accept
expires: Sun, 18 Jan 2026 10:07:48 GMT
permissions-policy: interest-cohort=(),browsing-topics=()
cf-cache-status: DYNAMIC
nel: {"report_to":"cf-nel","success_fraction":0.0,"max_age":604800}
strict-transport-security: max-age=31536000; includeSubDomains; preload
report-to: {"group":"cf-nel","max_age":604800,"endpoints":[{"url":"https://a.nel.cloudflare.com/report/v4?s=DTaMAkX4326kFUETCeX%2BQLJjUPeEbvBS644XYG25DavBVQXEpR5Q4NpRcF1%2BB5ON7yp5NqjFCtHTl3W1sBPV95H6jO%2Fnvg%3D%3D"}]}
cf-ray: 9bfd30e78926c537-BLR
alt-svc: h3=":443"; ma=86400
HTTP/1.1 302 Found
Date: Sun, 18 Jan 2026 09:50:57 GMT
Content-Type: text/html; charset=iso-8859-1
Transfer-Encoding: chunked
Connection: keep-alive
server: cloudflare
location: https://www.biorxiv.org/lookup/doi/10.1101/174474
cf-cache-status: DYNAMIC
Nel: {"report_to":"cf-nel","success_fraction":0.0,"max_age":604800}
Report-To: {"group":"cf-nel","max_age":604800,"endpoints":[{"url":"https://a.nel.cloudflare.com/report/v4?s=qT38%2BBHDNTZfNaANW2kHLebmgdBHlFqXr%2FAeXejKQF4GRF1esyFXATBq8jU4NgJopSzcy4qODhLIhFFFwFVxMLPJpoMyAkrTEw%3D%3D"}]}
CF-RAY: 9bfd30e7f851d689-BOM
alt-svc: h3=":443"; ma=86400
HTTP/2 301
date: Sun, 18 Jan 2026 09:50:58 GMT
content-type: text/html; charset=UTF-8
location: https://www.biorxiv.org/content/10.1101/174474v2
cf-ray: 9bfd30eb2e1f1eef-BLR
x-content-type-options: nosniff
x-content-type-options: nosniff
x-drupal-cache: MISS
expires: Sun, 18 Jan 2026 10:20:58 GMT
cache-control: public, max-age=1800
pragma: no-cache
vary: Accept-Encoding
x-highwire-sitecode: biorxiv
x-highwire-smart-code: biorxiv_production
x-varnish: 694571326
via: 1.1 varnish
x-varnish-ttl:
x-varnish-cache:
cf-cache-status: MISS
set-cookie: __cf_bm=IyY6iIXFYybHKN2xcVYB.vq.XKwbz1kZQcnGRG1FvBs-1768729858-1.0.1.1-EpA3mu4FRVnP9ryHdazRI6WBaDT9KrbFOyL7Gm8tXJdfCSK3DLK1rPCoP7qQioQHHoXXlQh_7zlz6vVvqiibxRMbxP8qbVA.TmtCfuObKRY; path=/; expires=Sun, 18-Jan-26 10:20:58 GMT; domain=.www.biorxiv.org; HttpOnly; Secure; SameSite=None
server: cloudflare
HTTP/2 200
date: Sun, 18 Jan 2026 09:51:00 GMT
content-type: text/html; charset=utf-8
content-encoding: gzip
x-content-type-options: nosniff
x-content-type-options: nosniff
x-drupal-cache: MISS
expires: Sun, 19 Nov 1978 05:00:00 GMT
cache-control: no-cache, must-revalidate
set-cookie: SSESS1dd6867f1a1b90340f573dcdef3076bc=5yHYB9r1NKZdXzn1Aw3kYS5KLSVdqN9ZIUxdhf_DvFc; expires=Tue, 10-Feb-2026 13:24:19 GMT; path=/; domain=.biorxiv.org; secure; HttpOnly
content-language: en
x-frame-options: SAMEORIGIN
x-generator: Drupal 7 (https://drupal.org)
link:
; rel="canonical",; rel="shortlink"
vary: Accept-Encoding
x-highwire-sitecode: biorxiv
x-highwire-smart-code: biorxiv_production
x-varnish: 1892443398
age: 0
via: 1.1 varnish
x-varnish-ttl:
x-varnish-cache:
cf-cache-status: DYNAMIC
server: cloudflare
cf-ray: 9bfd30f22dc41eef-BLR
Extracting a Biologically Relevant Latent Space from Cancer Transcriptomes with Variational Autoencoders | bioRxiv
New Results
Extracting a Biologically Relevant Latent Space from Cancer Transcriptomes with Variational Autoencoders
doi: https://doi.org/10.1101/174474

Abstract
The Cancer Genome Atlas (TCGA) has profiled over 10,000 tumors across 33 different cancer-types for many genomic features, including gene expression levels. Gene expression measurements capture substantial information about the state of each tumor. Certain classes of deep neural network models are capable of learning a meaningful latent space. Such a latent space could be used to explore and generate hypothetical gene expression profiles under various types of molecular and genetic perturbation. For example, one might wish to use such a model to predict a tumor’s response to specific therapies or to characterize complex gene expression activations existing in differential proportions in different tumors. Variational autoencoders (VAEs) are a deep neural network approach capable of generating meaningful latent spaces for image and text data. In this work, we sought to determine the extent to which a VAE can be trained to model cancer gene expression, and whether or not such a VAE would capture biologically-relevant features. In the following report, we introduce a VAE trained on TCGA pan-cancer RNA-seq data, identify specific patterns in the VAE encoded features, and discuss potential merits of the approach. We name our method “Tybalt” after an instigative, cat-like character who sets a cascading chain of events in motion in Shakespeare’s “Romeo and Juliet”. From a systems biology perspective, Tybalt could one day aid in cancer stratification or predict specific activated expression patterns that would result from genetic changes or treatment effects.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY 4.0 International license.