|--------------------------------------------------------------------------
| Resistance Gene Identifier (RGI) Documentation
|--------------------------------------------------------------------------

	Before you run the RGI scripts, make sure you have installed needed external tools:

|--------------------------------------------------------------------------
| Install open reading frame callers : Prodigal, http://prodigal.ornl.gov/, https://github.com/hyattpd/prodigal/wiki/Installation
|--------------------------------------------------------------------------


	# Install Prodigal - conda should be installed

		$ conda install --channel bioconda prodigal

	# Mac OS X

		$ brew tap homebrew/science

		$ brew install homebrew/science/prodigal

	# Linux Redhat / Centos

		$ sudo yum groupinstall 'Development Tools' && sudo yum install curl git irb python-setuptools ruby

		$ git clone https://github.com/Linuxbrew/brew.git ~/.linuxbrew

		$ export PATH="$HOME/.linuxbrew/bin:$PATH"

		$ export MANPATH="$HOME/.linuxbrew/share/man:$MANPATH"
	
		$ export INFOPATH="$HOME/.linuxbrew/share/info:$INFOPATH"

	# Linux Debian / Ubuntu

		$ brew tap homebrew/science

		$ brew install homebrew/science/prodigal


|--------------------------------------------------------------------------
| Install alignment software : BLAST and DIAMOND
|--------------------------------------------------------------------------

|--------------------------------------------------------------------------
| BLAST ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
|--------------------------------------------------------------------------

	- Tested with BLAST 2.2.28, BLAST 2.2.31+ and 2.5.0+ on linux 64 and Mac OS X

	* You can alson run the following command to install blast. This will only install version 2.2.28

	$ sudo apt-get install ncbi-blast+

	- Test blast install with the following command:

	$ makeblastdb

	* Biopython http://biopython.org/DIST/docs/install/Installation.html#sec12
	* Run the following command to install Bio-python

	$ sudo apt-get install python-biopython

	* Download the database - card.json from Downloads on the CARD website (a copy may be included with this release)


|--------------------------------------------------------------------------
| DIAMOND 
| - https://ab.inf.uni-tuebingen.de/software/diamond
| - https://github.com/bbuchfink/diamond
| - https://github.com/bbuchfink/diamond/releases
|--------------------------------------------------------------------------

	* Install diamond using conda packages

	$ conda install --channel bioconda diamond


|--------------------------------------------------------------------------
| Running RGI:
|--------------------------------------------------------------------------

Open a terminal, type: 

	$ python rgi.py -h 

Check software version:

	$ python rgi.py -sv or python rgi.py --software_version

Check data version:

	$ python rgi.py -dv or python rgi.py --data_version

|--------------------------------------------------------------------------
| RGI inputs
|--------------------------------------------------------------------------

	$ python rgi.py -h

		usage: rgi.py [-h] [-t INTYPE] [-i INPUTSEQ] [-n THREADS] [-o OUTPUT]
		              [-e CRITERIA] [-c CLEAN] [-d DATA] [-l VERBOSE] [-a ALIGNER]
		              [-r DATABASE] [-sv] [-dv]

		Resistance Gene Identifier - Version 3.1.2

		optional arguments:
		  -h, --help            show this help message and exit
		  -t INTYPE, --input_type INTYPE
		                        must be one of contig, orf, protein, read (default:
		                        contig)
		  -i INPUTSEQ, --input_sequence INPUTSEQ
		                        input file must be in either FASTA (contig and
		                        protein), FASTQ(read) or gzip format! e.g
		                        myFile.fasta, myFasta.fasta.gz
		  -n THREADS, --num_threads THREADS
		                        Number of threads (CPUs) to use in the BLAST search
		                        (default=32)
		  -o OUTPUT, --output_file OUTPUT
		                        Output JSON file (default=Report)
		  -e CRITERIA, --loose_criteria CRITERIA
		                        The options are YES to include loose hits and NO to
		                        exclude loose hits. (default=NO to exclude loose hits)
		  -c CLEAN, --clean CLEAN
		                        This removes temporary files in the results directory
		                        after run. Options are NO or YES (default=YES for
		                        remove)
		  -d DATA, --data DATA  Specify a data-type, i.e. wgs, chromosome, plasmid,
		                        etc. (default = NA)
		  -l VERBOSE, --verbose VERBOSE
		                        log progress to file. Options are OFF or ON (default =
		                        OFF for no logging)
		  -a ALIGNER, --alignment_tool ALIGNER
		                        choose between BLAST and DIAMOND. Options are BLAST or
		                        DIAMOND (default = BLAST)
		  -r DATABASE, --db DATABASE
		                        specify path to CARD blast databases (default: None)
		  -sv, --software_version
		                        Prints software number
		  -dv, --data_version   Prints data version number


	INTYPE could be one of 'contig', 'protein' or 'read'.

	1. 'contig' means that inputSequence is a DNA sequence stored in a FASTA file, presumably a complete genome or assembly contigs. RGI will predict ORFs de novo and predict resistome using a combination of BLASTP against the CARD data, curated cut-offs, and SNP screening.

	2. 'protein', as its name suggests, requires a FASTA file with protein sequences. As above, RGI predict resistome using a combination of BLASTP against the CARD data, curated cut-offs, and SNP screening.

	3. 'read' expects raw FASTQ format nucleotide data and predicts resistome using a combination of BLASTX against the CARD data, curated cut-offs, and SNP screening. This is an experimental tool and we have yet to adjust the CARD cut-offs for BLASTX.  We will be exploring other metagenomics or FASTQ screening methods. Note that RGI does not perform any pre-processing of the FASTQ data (linker trimming, etc).


|--------------------------------------------------------------------------
| RGI outputs
|--------------------------------------------------------------------------

	RGI output will produce a detailed JSON file, Summary Tab-delimited file and gff3 (where applicable)

	The JSON is as follows (example shows only one hit):

	- gene_71|gi|378406451|gb|JN420336.1| Klebsiella pneumoniae plasmid pNDM-MAR, complete sequence: {
		// Hit 1
		gnl|BL_ORD_ID|39|hsp_num:0: {
			SequenceFromBroadStreet: "MRYIRLCIISLLATLPLAVHASPQPLEQIKQSESQLSGRVGMIEMDLASGRTLTAWRADERFPMMSTFKVVLCGAVLARVDAGDEQLERKIHYRQQDLVDYSPVSEKHLADGMTVGELCAAAITMSDNSAANLLLATVGGPAGLTAFLRQIGDNVTRLDRWETELNEALPGDARDTTTPASMAATLRKLLTSQRLSARSQRQLLQWMVDDRVAGPLIRSVLPAGWFIADKTGASKRGARGIVALLGPNNKAERIVVIYLRDTPASMAERNQQIAGIGAA",
			"orf_start": 67822,
			"ARO_name": "SHV-12",
			"type_match": "Loose",
			"query": "INDWRLDYNECRPHSSLNYLTPAEFAAGWRN",
			"evalue": 3.82304,
			"max-identities": 10,
			"orf_strand": "-",
			"bit-score": 24.6386,
			"cvterm_id": "35914",
			"sequenceFromDB": "LDRWETELNEALPGDARDTTTPASMAATLRK",
			"match": "++ W  + NE  P  + +  TPA  AA  R ",
			"model_id": "103",
			"orf_From": "gi|378406451|gb|JN420336.1| Klebsiella pneumoniae plasmid pNDM-MAR, complete sequence",
			"pass_evalue": 1e-100,
			"query_end": 68607,
			"ARO_category": {
			    "36696": {
				    "category_aro_name": "antibiotic inactivation enzyme",
				    "category_aro_cvterm_id": "36696",
				    "category_aro_accession": "3000557",
				    "category_aro_description": "Enzyme that catalyzes the inactivation of an antibiotic resulting in resistance.  Inactivation includes chemical modification, destruction, etc."
				},
				"36268": {
			        "category_aro_name": "beta-lactam resistance gene",
			        "category_aro_cvterm_id": "36268",
			        "category_aro_accession": "3000129",
			        "category_aro_description": "Genes conferring resistance to beta-lactams."
			    }
			},
			"ARO_accession": "3001071",
			"query_start": 68515,
			"model_name": "SHV-12",
			"model_type": "protein homolog model",
			"orf_end": 68646
		},
		...
		// Hit 2
		...
		// Hit 3
		...
	}

|--------------------------------------------------------------------------
| Getting Tab Delimited output after running RGI:
|--------------------------------------------------------------------------

	Run the following command to get help on how to get the Tab Delimited output

	$ python convertJsonToTSV.py -h

|--------------------------------------------------------------------------
| convertJsonToTSV inputs
|--------------------------------------------------------------------------

	$ python convertJsonToTSV.py -h

		usage: convertJsonToTSV.py [-h] [-i AFILE] [-o OUTPUT] [-v VERBOSE]

		Convert RGI JSON file to Tab-delimited file

		optional arguments:
		  -h, --help            show this help message and exit
		  -i AFILE, --afile AFILE
		                        must be a json file generated from RGI in JSON or gzip
		                        format e.g out.json, out.json.gz
		  -o OUTPUT, --out_file OUTPUT
		                        Output JSON file (default=dataSummary)
		  -v VERBOSE, --verbose VERBOSE
		                        include help menu. Options are OFF or ON (default =
		                        OFF for no help)

|--------------------------------------------------------------------------
| convertJsonToTSV outputs
|--------------------------------------------------------------------------

	This outputs a tab-delimited text file: dataSummary.txt

	The tab-output is as follows:
			
	---------------------------------------------------------------------		
	COLUMN  		HELP_MESSAGE
	---------------------------------------------------------------------
	ORF_ID ------- Open Reading Frame identifier (internal to RGI)
	CONTIG ------- Source Sequence
	START ------- Start co-ordinate of ORF
	STOP ------- End co-ordinate of ORF
	ORIENTATION ------- Strand of ORF
	CUT_OFF ------- RGI Detection Paradigm
	PASS_EVALUE ------- STRICT detection model Expectation value cut-off
	Best_Hit_evalue ------- Expectation value of match to top hit in CARD
	Best_Hit_ARO ------- ARO term of top hit in CARD
	Best_Identities ------- Percent identity of match to top hit in CARD
	ARO ------- ARO accession of top hit in CARD
	ARO_name ------- ARO term of top hit in CARD
	Model_type ------- CARD detection model type
	SNP ------- Observed mutation (if applicable)
	AR0_category ------- ARO Categorization
	bit_score ------- Bitscore of match to top hit in CARD
	Predicted_Protein ------- ORF predicted protein sequence
	CARD_Protein_Sequence ------- Protein sequence of top hit in CARD
	LABEL ------- ORF label (internal to RGI)
	ID ------- HSP identifier (internal to RGI)

|--------------------------------------------------------------------------
| Files Structure
|--------------------------------------------------------------------------


`-- rgi/
   |-- _data/
   	   |-- card.json
   |-- _db/
   		... (BLAST DBs)
   |-- _docs/
   		|-- INSTALL
   		|-- README
   |-- _tmp/
   |-- tests/
   |-- __init__.py
   |-- blastnsnp.py
   |-- clean.py
   |-- contigToORF.py
   |-- contigToProteins.py
   |-- convertJsonToTSV.py
   |-- create_gff3_file.py
   |-- filepaths.py
   |-- formatJson.py
   |-- fqToFsa.py
   |-- load.py
   |-- rgi.py
   |-- rrna.py
   

|--------------------------------------------------------------------------
| Loading new card.json:
|--------------------------------------------------------------------------

	* If new card.json is available. Replace card.json in the directory show above. Use the following command:

	$ python load.py -h


|--------------------------------------------------------------------------
| Load inputs
|--------------------------------------------------------------------------

	$ python load.py -h

		usage: load.py [-h] [-i AFILE]

		Load card database json file

		optional arguments:
		  -h, --help            show this help message and exit
		  -i AFILE, --afile AFILE
		                        must be a card database json file


|--------------------------------------------------------------------------
| Clean databases
|--------------------------------------------------------------------------

	* Database is created once the rgi.py is run. Use clean.py to remove databases after new card.json is loaded.

	* Then run clean.py to clean directory.

	$ python clean.py -h


|--------------------------------------------------------------------------
| Clean inputs
|--------------------------------------------------------------------------

	$ python clean.py -h

		usage: clean.py [-h]

		Removes BLAST databases created using card.json

		optional arguments:
		  -h, --help  show this help message and exit

|--------------------------------------------------------------------------
| Format JSON
|--------------------------------------------------------------------------

	$ python formatJson.py -h 

		usage: formatJson.py [-h] [-i IN_FILE] [-o OUT_FILE]

		Convert RGI JSON file to Readable JSON file

		optional arguments:
		  -h, --help            show this help message and exit
		  -i IN_FILE, --in_file IN_FILE
		                        input file must be in JSON format e.g Report.json
		  -o OUT_FILE, --out_file OUT_FILE
		                        Output JSON file (default=ReportFormatted)	

|--------------------------------------------------------------------------
| Contact Us:
|--------------------------------------------------------------------------

	For help please contact us at:

	* CARD Developers <card@mcmaster.ca>