File Uploads - VarSome Clinical

Written by Jason Armstrong | Sep 22, 2025 11:48:23 AM

VarSome Clinical is designed to process NGS data, from gene panels to whole genomes, and supports FASTQ or VCF files. To effectively use the platform, it's important to understand the data upload methods and the specific requirements for different file types. Here's what you need to know about accepted files.

Getting Your Data into VarSome Clinical.

Direct File Upload:

To begin, navigate to "Launch" and then select "1. Upload / View files".
Click "Select File(s)" to choose your files. These do not need to be from the same sample or in the same format.
Once selected, click "Start Upload".

Do not navigate away from the upload page until the upload is complete and VarSome Clinical is validating the files. You can safely open the 'Analyses' table in a new tab to view it while files are uploading.

Files are uploaded, analyzed, and details like reads and bases are calculated and displayed. Files with a dark green status are ready for subsequent analysis.

Files that are uploaded free of charge and remain unused for more than 30 days will be automatically deleted from VarSome Clinical.

Illumina BaseSpace Integration:

This feature allows for direct transfer of FASTQ files from BaseSpace to VarSome Clinical.

Access it by hovering over your user name and selecting the "Illumina BaseSpace" option.
You will be redirected to Illumina’s site to log in, choosing either the EU or US BaseSpace Sequence Hub.
Granting the necessary permits is crucial for your projects to synchronize with VarSome Clinical and enable file transfer.
Once synchronized, click the "Download" button to start transferring FASTQ files.

Once the download is complete, the file status changes from "Not available" to "Available". Files previously imported from BaseSpace will show as "Available." If a file was already used in an analysis, you can re-use it by finding the sample and selecting "Re-use sample files". You can disconnect your Illumina BaseSpace account from VarSome Clinical at any time. For more information on Illumina BaseSpace integration, click here.

Organizing Your Samples

You can organize your samples using Sample tags. New tags can be created and existing tags edited or deleted via the "Tags" option next to your user name.

Tags can be added to a sample by clicking the tag icon in the Sample Table. Samples can be filtered by selecting specific tags from the "Tags" box at the top of the page.

Accepted File Types and Requirements

VarSome Clinical has specific requirements for input files to ensure accurate analysis:

Accepted Input Files:

FASTQ files: Must originate exclusively from Illumina or MGI sequencers.
VCF files: Must conform to the VCF standard, regardless of the sequencing platform. These can contain only CNVs or a mix of CNVs and other Structural Variants (SVs), including deletions, duplications, insertions, inversions, breakends, and repeat expansions.
Alignment BAM file (Optional): Can be uploaded alongside a VCF sample to visualize variant coverage.

Specific Requirements for FASTQ Files:

VarSome Clinical expects FASTQ files to conform to Illumina's or MGI's naming conventions.
For paired-end FASTQ files, reads must be properly coordinated between them. Paired-end reads provided in a single FASTQ file are not accepted.
Illumina paired-end files are recognized by having the same name except for the read number (e.g., SampleName_S1_L001_R1_001.fastq.gz and SampleName_S1_L001_R2_001.fastq.gz). Read numbers can also be specified as _1.fastq.gz or _R1.fastq.gz.
MGI paired-end files are parsed as [flow cell ID]_[lane ID]_[barcode ID]_(optional_id)_[read 1/2].fastq.gz, with the read number alone or preceded by "R".
For samples with more than two paired-end files (e.g., across multiple lanes), a specific naming structure is required (e.g., E12345_34_4321_L001_R1_001.fastq.gz, E12345_34_4321_L002_R1_001.fastq.gz, etc.).

Specific Requirements for VCF Files:

All VCFs must be compliant with the VCF standard.

For SNPs/INDELs annotation (Germline/Somatic analysis from VCF):

Must include only specific SNVs and INDELs; "NON_REF" variants or variants with "N" in the ALT field are not accepted.
Must include a valid genotype (GT) field for each variant entry.
Expected to contain variants from a real human sample, typically with a maximum of around 4 or 5 million variants.

For SVs annotation (SV sub-analysis from VCF):

Must include duplications and/or deletions where the type is shown in the ALT field (e.g., <DUP>, <DEL>), or other SVs like insertions (<INS>), inversions (<INV>), and breakends (<BND>).
The general <CNV> category is not accepted if a more specific category like <DEL> or <DUP> can be applied.
Must include a valid genotype (GT) field for each variant entry.
Do not include other types of SVs like large chromosomal rearrangements or gene fusions, as these are not currently supported.

Tips for VCF Validation:

Before uploading, we recommend checking the format of your VCF file.

You can do this by running a bcftools command like bcftools norm -m -any -NO v file.vcf. Bcftools is sensitive to malformed VCFs and will fail if the file doesn't conform to the standard, providing an error message if fields are incomplete or malformed.

Other validation tools like vcf-validator from EBIvariation or VCFtools are also available.

Automating Your Workflow with API

For laboratories seeking to streamline and automate their data analysis pipeline, VarSome offers a full API. Our API helps automate the data analysis process, including data upload. Extensive documentation for the VarSome API is available.

If you are interested in our solutions or want to chat with one of our team members, please get in touch!

View full post