A subscription to JoVE is required to view this content. Sign in or start your free trial.
Here, we describe a detailed protocol for an LC-MS-based sequencing method that can be used as a direct method to sequence short RNA (<35 nt per run) without a cDNA intermediate, and as a general method to sequence different nucleotide modifications in a single study at single-base precision.
Mass spectrometry (MS)-based sequencing approaches have been shown to be useful in direct sequencing RNA without the need for a complementary DNA (cDNA) intermediate. However, such approaches are rarely applied as a de novo RNA sequencing method, but used mainly as a tool that can assist in quality assurance for confirming known sequences of purified single-stranded RNA samples. Recently, we developed a direct RNA sequencing method by integrating a 2-dimensional mass-retention time hydrophobic end-labeling strategy into MS-based sequencing (2D-HELS MS Seq). This method is capable of accurately sequencing single RNA sequences as well as mixtures containing up to 12 distinct RNA sequences. In addition to the four canonical ribonucleotides (A, C, G, and U), the method has the capacity to sequence RNA oligonucleotides containing modified nucleotides. This is possible because the modified nucleobase either has an intrinsically unique mass that can help in its identification and its location in the RNA sequence, or can be converted into a product with a unique mass. In this study, we have used RNA, incorporating two representative modified nucleotides (pseudouridine (Ψ) and 5-methylcytosine (m5C)), to illustrate the application of the method for the de novo sequencing of a single RNA oligonucleotide as well as a mixture of RNA oligonucleotides, each with a different sequence and/or modified nucleotides. The procedures and protocols described here to sequence these model RNAs will be applicable to other short RNA samples (<35 nt) when using a standard high-resolution LC-MS system, and can also be used for sequence verification of modified therapeutic RNA oligonucleotides. In the future, with the development of more robust algorithms and with better instruments, this method could allow sequencing of more complex biological samples.
Mass spectrometry (MS)-based sequencing methods, including top-down MS and tandem MS1,2,3,4, have been developed for direct sequencing of RNA. However, in situ fragmentation techniques for effectively generating high-quality RNA ladders in mass spectrometers currently can not be applied to de novo sequencing5,6. Furthermore, it is not very trivial to analyze the traditional one-dimensional (1D) MS data for de novo sequencing of even one purified RNA sequence, and it would be even more challenging for MS sequencing of mixed RNA samples7,8. Therefore, a two-dimensional (2D) liquid chromatography (LC)-MS-based RNA sequencing method has been developed, incorporating production of 2D mass-retention time (tR) ladders to replace 1D mass ladders, making it much easier to identify ladder components needed for de novo sequencing of RNAs8. However, the 2D LC-MS-based RNA sequencing method is mainly limited to purified synthetic short RNA, as it cannot read a complete sequence solely based on one single ladder, but must rely on two co-existing adjacent ladders (5´- and 3´-ladders)8. More specifically, this approach requires bidirectional paired-end reads for reading terminal nucleobases in the low-mass region8. The added complexity of the paired-end reading results in this method being untenable for sequencing of RNA mixtures because confusion is raised on which ladder fragment belongs to which ladder for the unknown samples.
To overcome the abovementioned barriers in MS-based RNA sequencing approaches and to broaden such applications in direct RNA sequencing, two issues must be addressed: 1) how to generate a high-quality mass ladder that can be used to read a complete sequence, from the first nucleotide to the last in an RNA strand, and 2) how to effectively identify each RNA/mass ladder in a complex MS dataset. Together with well-controlled acid degradation, we have developed a new sequencing method by introducing a hydrophobic end labeling strategy (HELS) into the MS-based sequencing technique, and successfully addressed these two issues by adding a hydrophobic tag at either 5´- and/or 3´-end of the RNAs to be sequenced9. This method creates an “ideal” sequence ladder from RNA—each ladder fragment derives from site-specific RNA cleavage exclusively at each phosphodiester bond, and the mass difference between two adjacent ladder fragments is the exact mass of either the nucleotide or nucleotide modification at that position 8,9,10. This is possible because we include a highly controlled acidic hydrolysis step, which fragments the RNA, on average, once per molecule, before it is injected into the instrument. As a result, each degradation fragment product is detected on the mass spectrometer and all fragments together form a sequencing ladder8,9,10. This new strategy enables complete reading of an RNA sequence from one single ladder of an RNA strand without paired-end reading from the other ladder of the RNA, and additionally allows MS sequencing of RNA mixtures with multiple different strands that contain combinatorial nucleotide modifications9. By adding a tag at the 5´- and/or 3´-end of the RNA, the labeled ladder fragments display a significant delay of tR, which can help to distinguish the two mass ladders from each other and also from the noisy low-mass region. The mass-tR shift caused by adding the hydrophobic tag facilitates mass ladder identification and simplifies data analysis for sequence generation. Furthermore, the addition of the hydrophobic tag can help to identify the terminal base in the strand by preventing its corresponding ladder fragment from being in the noisy low-mass-tR region due to the mass and hydrophobicity increase caused by the tag, thus allowing identification of the complete sequence of an RNA from a single ladder; no paired-end reads are required. As a result, we have previously demonstrated the successful sequencing of a complex mixture of up to 12 RNA distinct strands without the use of any advanced sequencing algorithm9, which opens the door for de novo MS sequencing of RNA containing both canonical and modified nucleotides and makes it more feasible for the sequencing of mixed and more complex RNA samples. In fact, using 2D-HELS MS Seq, we have even successfully sequenced a mixed population of tRNA samples10 and are actively expanding its application to other complex RNA samples.
To facilitate 2D-HELS MS Seq to directly sequence a broader range of RNA samples, here we will focus on the technical aspects of this sequencing approach and will cover all of the essential steps needed when applying the technique towards direct sequencing of RNA samples. Specific examples will be used to illustrate the sequencing technique, including synthetic single RNA sequences, mixtures of multiple distinct RNA sequences, and modified RNAs containing both canonical and modified nucleotides such as pseudouridine (ψ) and 5-methylcytosine (m5C). Since RNAs all contain phosphodiester bonds, any type of RNA can be acid-hydrolyzed to generate an ideal sequence ladder for 2D-HELS MS Seq under optimal conditions8,9. However, detection of all ladder fragments of a given RNA is instrument dependent. On a standard high-resolution LC-MS (40K), the minimal loading amount for sequencing a purified short RNA sample (<35 nt) is 100 pmol per run. However, more material is required (up to 400 pmol per RNA sample) when additional experiments must be conducted (e.g., to distinguish isomeric base modifications that share identical masses). The protocol used in sequencing the model synthetic modified RNAs will also be applicable to sequencing broader RNA samples, including biological RNA samples with unknown base modifications. However, an even larger sample amount, such as 1000 pmol for sequencing tRNA (~76 nt) using a standard LC-MS instrument, is required to sequence the complete tRNA with all the modifications, and an advanced algorithm must be developed for its de novo sequencing10.
1. Design RNA oligonucleotides
2. Label the 3´-end of RNAs with biotin
3. Capture biotinylated RNA sample on streptavidin beads
4. Acid hydrolysis of RNA to generate MS ladders for sequencing
5. Convert ψ to CMC-ψ adduct
6. LC-MS measurement
7. Automate RNA sequence generation by a computational algorithm
NOTE: This procedure is shown only for RNA #1 in Figure 1c.
8. Sequencing RNA mixtures
Introducing a biotin tag to the 3´-end of RNA to produce easily-identifiable mass-tR ladders. The workflow of the 2D-HELS MS Seq approach is demonstrated in Figure 1a. The hydrophobic biotin label introduced to the 3´-end of the RNA (see Section 2) increases the masses and tRs of the 3´-labeled ladder components when compared to those of their unlabeled counterparts. Thus, the 3´-ladder curve is shifted to greater y-axis values (due ...
Unlike tandem-based MS fragmentation, highly controlled acidic hydrolysis is used in the 2D-HELS MS Seq approach to fragment the RNA before analysis with a mass spectrometer9,10. As a result, each acid-degraded fragment can be detected by the instrument, forming the equivalent of a sequencing ladder. Under optimal conditions, this method creates an “ideal” sequence ladder from RNA via, on average, one-per-molecule site-specific RNA cleavage e...
The authors have filed a provisional patent related to the technology discussed in this manuscript.
The authors acknowledge the R21 grant from National Institutes of Health (1R21HG009576) to S. Z. and W. L. and New York Institute of Technology (NYIT) Institutional Support for Research and Creativity grants to S. Z., which supported this work. The authors would like to thank PhD student Xuanting Wang (Columbia University) for assisting in figure-making, and thank Prof. Michael Hadjiargyrou (NYIT), Prof. Jingyue Ju (Columbia University), Drs. James Russo, Shiv Kumar, Xiaoxu Li, Steffen Jockusch, and other members of the Ju lab (Columbia University), Dr. Yongdong Wang (Cerno Bioscience), Meina Aziz (NYIT), and Wenhao Ni (NYIT) for helpful discussions and suggestions for our manuscript.
Name | Company | Catalog Number | Comments |
5' DNA Adenylation kit | New England Biolabs | E2610S | 50uM concentration |
6550 Q-TOF mass spectrometer | Agilent Technologies | 5991-2116EN | Coupled to a 1290 Infinity LC system |
A(5´)pp(5´)Cp-TEG-biotin-3´ | ChemGenes | 91718 | HPLC purified |
ATPγS | Sigma-Aldrich | 11162306001 | Lithium salt |
Bicine | Sigma-Aldrich | B8660 | BioXtra, ≥99% (titration) |
Biotin maleimide | Vector Laboratories | SP-1501 | Long arm |
C18 column | Waters | 186003532 | 50 mm × 2.1 mm Xbridge C18 column with a particle size of 1.7 μm |
Centrifugal Vacuum Concentrator | Labconco | Refrig 115v/60hz 7310022 | Labconco CentriVap |
ChemBioDraw | PerkinElmer | ChemDraw Prime | Generate a chemical structure and property data of structures & fragments |
CMC (N-cyclohexyl-Nʹ-(2-morpholinoethyl)-carbodiimide metho-p-toluenesulfonate) | Sigma-Aldrich | 2491-17-0 | 95% Purifiy |
Cyanine3 maleimide (Cy3) | Lumiprobe | 11080 | Water insoluble |
DEPC-treated water | Thermo Fisher Scientific | AM9906 | Autoclaved, certified nuclease-free |
Diisopropylamine (DIPA) | Thermo Fisher Scientific | 108-18-9 | 99% Alfa Aesar |
DMSO | Sigma-Aldrich | 276855 | Anhydrous dimethyl sulfoxide, 99.9% |
EDTA | Sigma-Aldrich | E6758 | Anhydrous, crystalline, BioReagent, suitable for cell culture |
Formic acid | Merck | 64-18-6 | 98-100%, ACS reag, Ph Eur |
Hexafluoro-2-propanol (HFIP) | Thermo Fisher Scientific | 920-66-1 | 99% Acros Organics |
LC-MS sample vials | Thermo Fisher Scientific | C4000-11 | Plastic screw thread vials |
LC-MS vial caps | Thermo Fisher Scientific | C5000-54A | Autosampler vial screw thread caps |
Na2CO3 buffer | Sigma-Aldrich | 88975 | BioUltra, >0.1 M Na2CO3, >0.2 M NaHCO3 |
Oligo Clean & Concentrator | Zymo Research | D4060 | Spin column |
OriginLab | OriginLab | OriginPro | Data analysis and graphing software |
pCp-biotin | TriLink BioTechnologies | NU-1706-BIO | 20 ul (1 mM) |
RNA #1--#6 | Integrated DNA Technologies | Custom RNA oligos | 19nt-21nt single-stranded RNAs, used without further purification |
Rocking platform shaker | VWR | Orbital Shaker Standard 1000 | Speed Range 40 to 300 rpm |
Streptavidin magnetic beads | Thermo Fisher Scientific | 88816 | Binding approx. 55ug biotinylated rabbit lgG per mg of beads |
Sulfonated Cyanine3 maleimide | Lumiprobe | 11380 | Water soluble |
T4 DNA ligase 1 | New England Biolabs | M0202S | 400 units/uL |
T4 polynucleotide kinase | Sigma-Aldrich | T4PNK-RO | From phage T4 am N81 pse T1 infected Escherichia coli BB |
Tris-HCl buffer | Sigma-Aldrich | T6455 | Tris-HCl Buffer, pH 10, 10×, Antigen Retriever |
Urea | Sigma-Aldrich | 81871 | Urea for synthesis. CAS No. 57-13-6, EC Number 200-315-5. |
Request permission to reuse the text or figures of this JoVE article
Request PermissionThis article has been published
Video Coming Soon
Copyright © 2025 MyJoVE Corporation. All rights reserved