How easy was it to learn to perform RNA Gene expression data analysis?

Mohit Mazumder, Ph.D.
4 min readMay 6, 2021
Data analysis for a deeper understanding of biology.

My first experience with NGS data analysis from the unknowns to adding newer dimensions to complete the biological data analysis.

RNA-Seq is a technique that performs analysis of transcriptome data generated by next-generation sequencing technologies or by microarrays. Success in the analysis of transcriptome is largely dependent on bioinformatics tools developed to support the different steps in the process.

Learning the Omics “logic” with the help of Omicslogic and T-Bioinfo Platform. With the ever-increasing complexity of heterogeneous datasets, a one-stop-shop solution for multiple data types processing and integration is needed. The platform combines the power of High-Performance Computing for processing large-scale datasets with an intuitive interface that eliminates the need for coding and advanced machine learning algorithms for data integration and mining.

Comprehensive Bioinformatics: Platform for Multi-Omics Integration

The RNA-seq section of T-BioInfo provides a flexible approach to the analysis of transcriptome data with several known and new algorithms (“modules”) included and specially designed analysis features.

The analysis pipelines go across the twelve different functional sections (analysis stages) found on the interactive graph, which will process your data from start to finish by utilizing the section-specific algorithms (modules).

1. Data Pre-Processing: cleaning the primers in raw reads and format transfer; Result: cleaned NGS data or array data represented as NGS pseudo-reads.

2. Data Simulation: The expression of isoforms of genes is simulated; Result: artificial NGS data introduces errors representing the expression of pre-defined splice variants.

3. Error Correction: correction of sequencing errors: Result: about 75% of the sequencing errors will be corrected

4. Mapping on Genome or Genes: alignment of reads against reference genome or mRNAs; Result: alignments of reads against references

5. Exon Detection: detection of expected exons in the reference genome; Result: GTF file that annotates predicted exons in the genome

6. Mapping on exon junctions: how exons are linked in isoforms according to NGS data; Result: alignments of reads against exon junctions.

7. Isoform Construction: splice variants are generated based on found exon junctions; Result: GTF file that annotates the predicted splice variants

8. GTF file processing: merging different annotations of the genome; Result: balanced annotation of the genome based on several NGS data sets.

9. Mapping Statistics: selection of the correct mapping for a read; Result: posterior probability for a read to be generated by a specific genome site

10. Expression Table: generation of expression values for genes and isoforms: Result: table of expressions across genes and isoforms

11. Differential Expression: differential expression according to predefined contrasts between biological conditions; Result: up and downregulation of genes

12. Mining analysis results: machine learning methods and integration of results for several parallel analysis pipelines; Result: compression of results and comparison of parallel analyses.

Color coding to make sense of the sections built to connect sections logically connected to streamline the analysis and flow of processed data from one tool to another.

In my opinion, anyone determined to learn can now easily learn and practice from examples created from several publications using the Omicslogic and the TBRC T-Bioinfo platform to analyze Big NGS data from their browser :) :).

Resources

https://learn.omicslogic.com/: Free and hands-on courses (Subscription based)

https://server.t-bio.info/: Student/ training & Research licenses for students, researchers, faculty, and scientists in India, Africa & USA and Online globally.

https://transcriptomics.omicslogic.com/: mentor guided training program with scheduled and one-on-one sessions.

bioinformatics data science expert mohit mazumder PhD computational biology expert 15 years education training data analysis director
https://www.linkedin.com/in/mohit-mazumder-phd-09b71b11/

--

--

Mohit Mazumder, Ph.D.

Ph.D. in Computational Biology | Bioinformatics Project experience ~12 years over 80 projects & Online Tutor | Director of Global Business Development Pine Bio