|-------------------------------------------------------------------------- | Resistance Gene Identifier (RGI) Documentation |-------------------------------------------------------------------------- Before you run the RGI scripts, make sure you have installed needed external tools: |-------------------------------------------------------------------------- | Install open reading frame callers : Prodigal, http://prodigal.ornl.gov/, https://github.com/hyattpd/prodigal/wiki/Installation |-------------------------------------------------------------------------- # Install Prodigal - conda should be installed $ conda install --channel bioconda prodigal # Mac OS X $ brew tap homebrew/science $ brew install homebrew/science/prodigal # Linux Redhat / Centos $ sudo yum groupinstall 'Development Tools' && sudo yum install curl git irb python-setuptools ruby $ git clone https://github.com/Linuxbrew/brew.git ~/.linuxbrew $ export PATH="$HOME/.linuxbrew/bin:$PATH" $ export MANPATH="$HOME/.linuxbrew/share/man:$MANPATH" $ export INFOPATH="$HOME/.linuxbrew/share/info:$INFOPATH" # Linux Debian / Ubuntu $ brew tap homebrew/science $ brew install homebrew/science/prodigal |-------------------------------------------------------------------------- | Install alignment software : BLAST and DIAMOND |-------------------------------------------------------------------------- |-------------------------------------------------------------------------- | BLAST ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ |-------------------------------------------------------------------------- - Tested with BLAST 2.2.28, BLAST 2.2.31+ and 2.5.0+ on linux 64 and Mac OS X * You can alson run the following command to install blast. This will only install version 2.2.28 $ sudo apt-get install ncbi-blast+ - Test blast install with the following command: $ makeblastdb * Biopython http://biopython.org/DIST/docs/install/Installation.html#sec12 * Run the following command to install Bio-python $ sudo apt-get install python-biopython * Download the database - card.json from Downloads on the CARD website (a copy may be included with this release) |-------------------------------------------------------------------------- | DIAMOND | - https://ab.inf.uni-tuebingen.de/software/diamond | - https://github.com/bbuchfink/diamond | - https://github.com/bbuchfink/diamond/releases |-------------------------------------------------------------------------- * Install diamond using conda packages $ conda install --channel bioconda diamond |-------------------------------------------------------------------------- | Running RGI: |-------------------------------------------------------------------------- Open a terminal, type: $ python rgi.py -h Check software version: $ python rgi.py -sv or python rgi.py --software_version Check data version: $ python rgi.py -dv or python rgi.py --data_version |-------------------------------------------------------------------------- | RGI inputs |-------------------------------------------------------------------------- $ python rgi.py -h usage: rgi.py [-h] [-t INTYPE] [-i INPUTSEQ] [-n THREADS] [-o OUTPUT] [-e CRITERIA] [-c CLEAN] [-d DATA] [-l VERBOSE] [-a ALIGNER] [-r DATABASE] [-sv] [-dv] Resistance Gene Identifier - Version 3.1.2 optional arguments: -h, --help show this help message and exit -t INTYPE, --input_type INTYPE must be one of contig, orf, protein, read (default: contig) -i INPUTSEQ, --input_sequence INPUTSEQ input file must be in either FASTA (contig and protein), FASTQ(read) or gzip format! e.g myFile.fasta, myFasta.fasta.gz -n THREADS, --num_threads THREADS Number of threads (CPUs) to use in the BLAST search (default=32) -o OUTPUT, --output_file OUTPUT Output JSON file (default=Report) -e CRITERIA, --loose_criteria CRITERIA The options are YES to include loose hits and NO to exclude loose hits. (default=NO to exclude loose hits) -c CLEAN, --clean CLEAN This removes temporary files in the results directory after run. Options are NO or YES (default=YES for remove) -d DATA, --data DATA Specify a data-type, i.e. wgs, chromosome, plasmid, etc. (default = NA) -l VERBOSE, --verbose VERBOSE log progress to file. Options are OFF or ON (default = OFF for no logging) -a ALIGNER, --alignment_tool ALIGNER choose between BLAST and DIAMOND. Options are BLAST or DIAMOND (default = BLAST) -r DATABASE, --db DATABASE specify path to CARD blast databases (default: None) -sv, --software_version Prints software number -dv, --data_version Prints data version number INTYPE could be one of 'contig', 'protein' or 'read'. 1. 'contig' means that inputSequence is a DNA sequence stored in a FASTA file, presumably a complete genome or assembly contigs. RGI will predict ORFs de novo and predict resistome using a combination of BLASTP against the CARD data, curated cut-offs, and SNP screening. 2. 'protein', as its name suggests, requires a FASTA file with protein sequences. As above, RGI predict resistome using a combination of BLASTP against the CARD data, curated cut-offs, and SNP screening. 3. 'read' expects raw FASTQ format nucleotide data and predicts resistome using a combination of BLASTX against the CARD data, curated cut-offs, and SNP screening. This is an experimental tool and we have yet to adjust the CARD cut-offs for BLASTX. We will be exploring other metagenomics or FASTQ screening methods. Note that RGI does not perform any pre-processing of the FASTQ data (linker trimming, etc). |-------------------------------------------------------------------------- | RGI outputs |-------------------------------------------------------------------------- RGI output will produce a detailed JSON file, Summary Tab-delimited file and gff3 (where applicable) The JSON is as follows (example shows only one hit): - gene_71|gi|378406451|gb|JN420336.1| Klebsiella pneumoniae plasmid pNDM-MAR, complete sequence: { // Hit 1 gnl|BL_ORD_ID|39|hsp_num:0: { SequenceFromBroadStreet: "MRYIRLCIISLLATLPLAVHASPQPLEQIKQSESQLSGRVGMIEMDLASGRTLTAWRADERFPMMSTFKVVLCGAVLARVDAGDEQLERKIHYRQQDLVDYSPVSEKHLADGMTVGELCAAAITMSDNSAANLLLATVGGPAGLTAFLRQIGDNVTRLDRWETELNEALPGDARDTTTPASMAATLRKLLTSQRLSARSQRQLLQWMVDDRVAGPLIRSVLPAGWFIADKTGASKRGARGIVALLGPNNKAERIVVIYLRDTPASMAERNQQIAGIGAA", "orf_start": 67822, "ARO_name": "SHV-12", "type_match": "Loose", "query": "INDWRLDYNECRPHSSLNYLTPAEFAAGWRN", "evalue": 3.82304, "max-identities": 10, "orf_strand": "-", "bit-score": 24.6386, "cvterm_id": "35914", "sequenceFromDB": "LDRWETELNEALPGDARDTTTPASMAATLRK", "match": "++ W + NE P + + TPA AA R ", "model_id": "103", "orf_From": "gi|378406451|gb|JN420336.1| Klebsiella pneumoniae plasmid pNDM-MAR, complete sequence", "pass_evalue": 1e-100, "query_end": 68607, "ARO_category": { "36696": { "category_aro_name": "antibiotic inactivation enzyme", "category_aro_cvterm_id": "36696", "category_aro_accession": "3000557", "category_aro_description": "Enzyme that catalyzes the inactivation of an antibiotic resulting in resistance. Inactivation includes chemical modification, destruction, etc." }, "36268": { "category_aro_name": "beta-lactam resistance gene", "category_aro_cvterm_id": "36268", "category_aro_accession": "3000129", "category_aro_description": "Genes conferring resistance to beta-lactams." } }, "ARO_accession": "3001071", "query_start": 68515, "model_name": "SHV-12", "model_type": "protein homolog model", "orf_end": 68646 }, ... // Hit 2 ... // Hit 3 ... } |-------------------------------------------------------------------------- | Getting Tab Delimited output after running RGI: |-------------------------------------------------------------------------- Run the following command to get help on how to get the Tab Delimited output $ python convertJsonToTSV.py -h |-------------------------------------------------------------------------- | convertJsonToTSV inputs |-------------------------------------------------------------------------- $ python convertJsonToTSV.py -h usage: convertJsonToTSV.py [-h] [-i AFILE] [-o OUTPUT] [-v VERBOSE] Convert RGI JSON file to Tab-delimited file optional arguments: -h, --help show this help message and exit -i AFILE, --afile AFILE must be a json file generated from RGI in JSON or gzip format e.g out.json, out.json.gz -o OUTPUT, --out_file OUTPUT Output JSON file (default=dataSummary) -v VERBOSE, --verbose VERBOSE include help menu. Options are OFF or ON (default = OFF for no help) |-------------------------------------------------------------------------- | convertJsonToTSV outputs |-------------------------------------------------------------------------- This outputs a tab-delimited text file: dataSummary.txt The tab-output is as follows: --------------------------------------------------------------------- COLUMN HELP_MESSAGE --------------------------------------------------------------------- ORF_ID ------- Open Reading Frame identifier (internal to RGI) CONTIG ------- Source Sequence START ------- Start co-ordinate of ORF STOP ------- End co-ordinate of ORF ORIENTATION ------- Strand of ORF CUT_OFF ------- RGI Detection Paradigm PASS_EVALUE ------- STRICT detection model Expectation value cut-off Best_Hit_evalue ------- Expectation value of match to top hit in CARD Best_Hit_ARO ------- ARO term of top hit in CARD Best_Identities ------- Percent identity of match to top hit in CARD ARO ------- ARO accession of top hit in CARD ARO_name ------- ARO term of top hit in CARD Model_type ------- CARD detection model type SNP ------- Observed mutation (if applicable) AR0_category ------- ARO Categorization bit_score ------- Bitscore of match to top hit in CARD Predicted_Protein ------- ORF predicted protein sequence CARD_Protein_Sequence ------- Protein sequence of top hit in CARD LABEL ------- ORF label (internal to RGI) ID ------- HSP identifier (internal to RGI) |-------------------------------------------------------------------------- | Files Structure |-------------------------------------------------------------------------- `-- rgi/ |-- _data/ |-- card.json |-- _db/ ... (BLAST DBs) |-- _docs/ |-- INSTALL |-- README |-- _tmp/ |-- tests/ |-- __init__.py |-- blastnsnp.py |-- clean.py |-- contigToORF.py |-- contigToProteins.py |-- convertJsonToTSV.py |-- create_gff3_file.py |-- filepaths.py |-- formatJson.py |-- fqToFsa.py |-- load.py |-- rgi.py |-- rrna.py |-------------------------------------------------------------------------- | Loading new card.json: |-------------------------------------------------------------------------- * If new card.json is available. Replace card.json in the directory show above. Use the following command: $ python load.py -h |-------------------------------------------------------------------------- | Load inputs |-------------------------------------------------------------------------- $ python load.py -h usage: load.py [-h] [-i AFILE] Load card database json file optional arguments: -h, --help show this help message and exit -i AFILE, --afile AFILE must be a card database json file |-------------------------------------------------------------------------- | Clean databases |-------------------------------------------------------------------------- * Database is created once the rgi.py is run. Use clean.py to remove databases after new card.json is loaded. * Then run clean.py to clean directory. $ python clean.py -h |-------------------------------------------------------------------------- | Clean inputs |-------------------------------------------------------------------------- $ python clean.py -h usage: clean.py [-h] Removes BLAST databases created using card.json optional arguments: -h, --help show this help message and exit |-------------------------------------------------------------------------- | Format JSON |-------------------------------------------------------------------------- $ python formatJson.py -h usage: formatJson.py [-h] [-i IN_FILE] [-o OUT_FILE] Convert RGI JSON file to Readable JSON file optional arguments: -h, --help show this help message and exit -i IN_FILE, --in_file IN_FILE input file must be in JSON format e.g Report.json -o OUT_FILE, --out_file OUT_FILE Output JSON file (default=ReportFormatted) |-------------------------------------------------------------------------- | Contact Us: |-------------------------------------------------------------------------- For help please contact us at: * CARD Developers