lja -o lja_ec --reads reads.fa --diploid 00:00:00 2Mb INFO: Hello! You are running La Jolla Assembler (LJA), a tool for genome assembly from PacBio HiFi reads 00:00:00 2Mb INFO: LJA pipeline started 00:00:00 2Mb INFO: Performing initial correction with k = 501 00:00:00 0Mb INFO: Reading reads 00:00:00 0Mb INFO: Extracting minimizers 00:06:51 14.5Gb INFO: Finished read processing 00:06:51 14.5Gb INFO: 27993052 hashs collected. Starting sorting. 00:06:53 14.5Gb INFO: Finished sorting. Total distinct minimizers: 750742 00:06:53 14.5Gb INFO: Starting construction of sparse de Bruijn graph 00:06:53 14.5Gb INFO: Vertex map constructed. 00:06:53 14.5Gb INFO: Filling edge sequences. 00:20:28 16.8Gb INFO: Finished sparse de Bruijn graph construction. 00:20:28 16.8Gb INFO: Collecting tips 00:20:28 16.8Gb INFO: Added 27911 artificial minimizers from tips. 00:20:28 16.8Gb INFO: Collected 1590729 old edges. 00:20:28 16.8Gb INFO: New minimizers added to sparse graph. 00:20:28 16.8Gb INFO: Refilling graph with old edges. 00:21:05 16.8Gb INFO: Filling graph with new edges. 00:21:06 16.8Gb INFO: Finished fixing sparse de Bruijn graph. 00:21:09 16.9Gb INFO: Starting to extract disjointigs. 00:21:12 16.9Gb INFO: Finished extracting 158429 disjointigs of total size 828749357 00:21:23 0Mb INFO: Loading disjointigs from file "lja_ec/k501/disjointigs.fasta" 00:21:37 3.1Gb INFO: Filling bloom filter with k+1-mers. 00:22:32 3.1Gb INFO: Filled 3442374841 bits out of 23980045696 00:22:32 3.1Gb INFO: Finished filling bloom filter. Selecting junctions. 00:23:49 3.2Gb INFO: Collected 480346 junctions. 00:23:53 3.2Gb INFO: Starting DBG construction. 00:23:53 3.2Gb INFO: Vertices created. 00:24:12 3.2Gb INFO: Filled dbg edges. Adding hanging vertices 00:24:12 3.2Gb INFO: Added 6 hanging vertices 00:24:12 3.2Gb INFO: Merging unbranching paths 00:24:13 3.2Gb INFO: Ended merging edges. Resulting size 207989 00:24:25 3.2Gb INFO: Cleaning edge coverages 00:24:25 3.2Gb INFO: Collecting alignments of sequences to the graph 00:29:35 17.3Gb INFO: Alignment collection finished. Total length of alignments is 219169445 00:29:36 17.3Gb INFO: Precorrecting reads 00:29:41 17.3Gb INFO: Applying corrections to reads 00:29:41 17.3Gb INFO: Applied correction to 24851 reads 00:29:41 17.3Gb INFO: Corrected simple errors in 24851 reads 00:29:41 17.3Gb INFO: Applying changes to the graph 00:31:01 18.5Gb INFO: Collecting and storing read suffixes 00:33:15 18.5Gb INFO: Correcting dinucleotide errors in reads 01:05:56 18.5Gb INFO: Applying corrections to reads 01:06:35 18.5Gb INFO: Applied correction to 23059 reads 01:06:35 18.5Gb INFO: Corrected 23059 dinucleotide sequences 01:06:35 18.5Gb INFO: Marking reliable edges 01:06:35 18.5Gb INFO: Marked 3773 edges in 1437 paths as reliable 01:06:35 18.5Gb INFO: Correcting low covered regions in reads with K = 800 01:07:11 18.5Gb INFO: Applying corrections to reads 01:07:48 18.5Gb INFO: Applied correction to 11691 reads 01:07:48 18.5Gb INFO: Corrected low covered regions in 11691 reads with K = 800 01:07:48 18.5Gb INFO: Applying changes to the graph 01:10:22 18.9Gb INFO: Marking reliable edges 01:10:22 18.9Gb INFO: Marked 783 edges in 502 paths as reliable 01:10:22 18.9Gb INFO: Correcting low covered regions in reads with K = 2000 01:10:47 18.9Gb INFO: Applying corrections to reads 01:10:48 18.9Gb INFO: Applied correction to 349 reads 01:10:48 18.9Gb INFO: Corrected low covered regions in 349 reads with K = 2000 01:10:48 18.9Gb INFO: Applying changes to the graph 01:14:12 19Gb INFO: Correcting dinucleotide errors in reads 01:50:50 19Gb INFO: Applying corrections to reads 01:50:51 19Gb INFO: Applied correction to 984 reads 01:50:51 19Gb INFO: Corrected 984 dinucleotide sequences 01:50:51 19Gb INFO: Marking reliable edges 01:50:52 19Gb INFO: Marked 761 edges in 496 paths as reliable 01:50:52 19Gb INFO: Correcting low covered regions in reads 01:52:50 19Gb INFO: Applying corrections to reads 01:53:09 19Gb INFO: Applied correction to 2500 reads 01:53:09 19Gb INFO: Corrected low covered regions in 3377 reads 01:53:09 19Gb INFO: Marking reliable edges 01:53:09 19Gb INFO: Marked 730 edges in 479 paths as reliable 01:53:09 19Gb INFO: Correcting low covered regions in reads with K = 3500 01:53:55 19Gb INFO: Applying corrections to reads 01:53:55 19Gb INFO: Applied correction to 141 reads 01:53:55 19Gb INFO: Corrected low covered regions in 141 reads with K = 3500 01:53:55 19Gb INFO: Applying changes to the graph 01:55:48 19.1Gb INFO: Printing reads to fasta file "lja_ec/k501/corrected.fasta" 02:03:09 2Mb INFO: Initial correction results with k = 501 printed to "lja_ec/k501/corrected.fasta" 02:03:09 2Mb INFO: Performing second phase of error correction using k = 5001 02:03:09 0Mb INFO: Reading reads 02:03:09 0Mb INFO: Extracting minimizers 02:07:22 9.6Gb INFO: Finished read processing 02:07:22 9.6Gb INFO: 66976419 hashs collected. Starting sorting. 02:07:24 11.1Gb INFO: Finished sorting. Total distinct minimizers: 3146794 02:07:25 11.1Gb INFO: Starting construction of sparse de Bruijn graph 02:07:27 11.1Gb INFO: Vertex map constructed. 02:07:27 11.1Gb INFO: Filling edge sequences. 02:17:13 21.8Gb INFO: Finished sparse de Bruijn graph construction. 02:17:13 21.8Gb INFO: Collecting tips 02:17:14 22.3Gb INFO: Added 14638 artificial minimizers from tips. 02:17:14 22.3Gb INFO: Collected 6302810 old edges. 02:17:14 22.3Gb INFO: New minimizers added to sparse graph. 02:17:14 22.3Gb INFO: Refilling graph with old edges. 02:19:10 22.3Gb INFO: Filling graph with new edges. 02:19:10 22.3Gb INFO: Finished fixing sparse de Bruijn graph. 02:19:22 22.6Gb INFO: Starting to extract disjointigs. 02:19:28 22.6Gb INFO: Finished extracting 40566 disjointigs of total size 993515347 02:19:50 0Mb INFO: Loading disjointigs from file "lja_ec/k5001/disjointigs.fasta" 02:20:05 3.3Gb INFO: Filling bloom filter with k+1-mers. 02:21:13 3.3Gb INFO: Filled 3659791109 bits out of 25300632992 02:21:13 3.3Gb INFO: Finished filling bloom filter. Selecting junctions. 02:22:34 3.3Gb INFO: Collected 339193 junctions. 02:22:37 3.3Gb INFO: Starting DBG construction. 02:22:37 3.3Gb INFO: Vertices created. 02:22:55 3.3Gb INFO: Filled dbg edges. Adding hanging vertices 02:22:55 3.3Gb INFO: Added 3 hanging vertices 02:22:55 3.3Gb INFO: Merging unbranching paths 02:22:57 3.3Gb INFO: Ended merging edges. Resulting size 38167 02:23:13 3.3Gb INFO: Cleaning edge coverages 02:23:13 3.3Gb INFO: Collecting alignments of sequences to the graph 02:23:13 3.3Gb INFO: Storing suffixes of read paths of length up to 10000000 02:27:03 13.5Gb INFO: Alignment collection finished. Total length of alignments is 18424956 02:27:03 13.5Gb INFO: Correcting dinucleotide errors in reads 02:27:49 13.5Gb INFO: Applying corrections to reads 02:27:49 13.5Gb INFO: Applied correction to 46 reads 02:27:49 13.5Gb INFO: Corrected 46 dinucleotide sequences 02:27:49 13.5Gb INFO: Marking reliable edges 02:27:49 13.5Gb INFO: Marked 414 edges in 342 paths as reliable 02:27:49 13.5Gb INFO: Correcting low covered regions in reads 02:34:17 13.9Gb INFO: Applying corrections to reads 02:34:18 13.9Gb INFO: Applied correction to 4541 reads 02:34:18 13.9Gb INFO: Corrected low covered regions in 4568 reads 02:34:18 13.9Gb INFO: Collapsing bulges 02:34:19 13.9Gb INFO: Applying corrections to reads 02:34:19 13.9Gb INFO: Applied correction to 7 reads 02:34:19 13.9Gb INFO: Collapsed bulges in 796 reads 02:34:19 13.9Gb INFO: Applying changes to the graph 02:35:14 14.8Gb INFO: Running second round of error correction 02:35:14 14.8Gb INFO: Correcting dinucleotide errors in reads 02:35:22 14.8Gb INFO: Applying corrections to reads 02:35:22 14.8Gb INFO: Applied correction to 3 reads 02:35:22 14.8Gb INFO: Corrected 3 dinucleotide sequences 02:35:22 14.8Gb INFO: Correcting dinucleotide errors in reads 02:35:30 14.8Gb INFO: Applying corrections to reads 02:35:30 14.8Gb INFO: Applied correction to 0 reads 02:35:30 14.8Gb INFO: Corrected 0 dinucleotide sequences 02:35:30 14.8Gb INFO: Marking reliable edges 02:35:30 14.8Gb INFO: Marked 157 edges in 152 paths as reliable 02:35:30 14.8Gb INFO: Correcting low covered regions in reads 02:37:27 14.8Gb INFO: Applying corrections to reads 02:37:27 14.8Gb INFO: Applied correction to 180 reads 02:37:27 14.8Gb INFO: Corrected low covered regions in 197 reads 02:37:27 14.8Gb INFO: Correcting dinucleotide errors in reads 02:37:35 14.8Gb INFO: Applying corrections to reads 02:37:35 14.8Gb INFO: Applied correction to 0 reads 02:37:35 14.8Gb INFO: Corrected 0 dinucleotide sequences 02:37:35 14.8Gb INFO: Remarking reliable edges 02:37:35 14.8Gb INFO: Correcting tips using reliable edge marks 02:37:38 14.8Gb INFO: Applying corrections to reads 02:37:38 14.8Gb INFO: Applied correction to 854 reads 02:37:38 14.8Gb INFO: Collapsing bulges 02:37:39 14.8Gb INFO: Applying corrections to reads 02:37:39 14.8Gb INFO: Applied correction to 8 reads 02:37:39 14.8Gb INFO: Collapsed bulges in 832 reads 02:37:39 14.8Gb INFO: Applying changes to the graph 02:38:32 14.8Gb INFO: Started gap closing procedure 02:38:47 25.1Gb INFO: Found 1044 potential overlaps. Aligning. 02:40:35 25.1Gb INFO: Collected 576 overlaps. Looking for unique overlaps 02:40:46 25.1Gb INFO: Collected 501 unique overlaps. 02:40:46 25.1Gb INFO: Adding new connections to the graph 02:43:32 25.1Gb INFO: Correcting tips using reliable edge marks 02:44:41 25.1Gb INFO: Applying corrections to reads 02:44:41 25.1Gb INFO: Applied correction to 8724 reads 02:44:42 25.1Gb INFO: Applying changes to the graph 02:45:42 25.1Gb INFO: Could not correct 6162 reads. They will be removed. 02:45:43 25.1Gb INFO: Uncorrected reads were removed. 02:45:43 25.1Gb INFO: Applying changes to the graph 02:46:44 25.1Gb INFO: Looking for unique edges 02:46:44 25.1Gb INFO: Marked 1424 long edges as unique 02:46:44 25.1Gb INFO: Marking extra edges as unique based on read paths 02:46:46 25.1Gb INFO: Marked 1643 edges as unique 02:46:46 25.1Gb INFO: Splitting graph with unique edges 02:46:46 25.1Gb INFO: Processing 841 components 02:46:46 25.1Gb INFO: Finished unique edges search. Found 1989 unique edges 02:46:46 25.1Gb INFO: Analysing repeats of multiplicity 2 and looking for additional unique edges 02:46:46 25.1Gb INFO: Finished processing of repeats of multiplicity 2. Found 5 erroneous edges. 02:46:46 25.1Gb INFO: Correcting reads using unique edge extensions 02:46:47 25.1Gb INFO: Applying corrections to reads 02:46:47 25.1Gb INFO: Applied correction to 1298 reads 02:46:47 25.1Gb INFO: Collecting bad edges 02:46:47 25.1Gb INFO: Removed 103 disconnected edges 02:46:49 25.1Gb INFO: Could not correct 264 reads. They will be removed. 02:46:49 25.1Gb INFO: Uncorrected reads were removed. 02:46:49 25.1Gb INFO: Looking for more unique edges 02:46:49 25.1Gb INFO: Finished unique edges search. Found 1192 unique edges 02:46:49 25.1Gb INFO: Looking for more unique edges 02:46:49 25.1Gb INFO: Finished unique edges search. Found 1257 unique edges 02:46:49 25.1Gb INFO: Correcting reads using unique edge extensions 02:46:50 25.1Gb INFO: Applying corrections to reads 02:46:50 25.1Gb INFO: Applied correction to 235 reads 02:46:50 25.1Gb INFO: Collecting bad edges 02:46:50 25.1Gb INFO: Removed 139 disconnected edges 02:46:52 25.1Gb INFO: Could not correct 231 reads. They will be removed. 02:46:52 25.1Gb INFO: Uncorrected reads were removed. 02:46:52 25.1Gb INFO: Attempting to rescue small circular highly covered components 02:46:53 25.1Gb INFO: Could not correct 0 reads. They will be removed. 02:46:53 25.1Gb INFO: Uncorrected reads were removed. 02:46:53 25.1Gb INFO: Rescued 0 circular highly covered components 02:46:53 25.1Gb INFO: Applying changes to the graph 02:47:48 25.1Gb INFO: Started gap closing procedure 02:48:21 38.8Gb INFO: Found 612 potential overlaps. Aligning. 02:49:30 38.8Gb INFO: Collected 91 overlaps. Looking for unique overlaps 02:49:36 38.8Gb INFO: Collected 16 unique overlaps. 02:49:37 38.8Gb INFO: Adding new connections to the graph 02:51:00 38.8Gb INFO: Correcting tips using reliable edge marks 02:51:00 38.8Gb INFO: Applying corrections to reads 02:51:00 38.8Gb INFO: Applied correction to 0 reads 02:51:01 38.8Gb INFO: Applying changes to the graph 02:52:28 38.8Gb INFO: Printing reads to fasta file "lja_ec/k5001/corrected_reads.fasta" 02:59:41 2Mb INFO: Second phase results with k = 5001 printed to "lja_ec/k5001/corrected_reads.fasta" 02:59:41 2Mb INFO: Performing repeat resolution by transforming de Bruijn graph into Multiplex de Bruijn graph 02:59:41 0Mb INFO: Loading graph from fasta 03:00:07 2.2Gb INFO: Finished loading graph 03:00:19 3.1Gb INFO: Looking for unique edges 03:00:19 3.1Gb INFO: Marked 1476 long edges as unique 03:00:19 3.1Gb INFO: Marking extra edges as unique based on read paths 03:00:19 3.1Gb INFO: Marked 1707 edges as unique 03:00:19 3.1Gb INFO: Splitting graph with unique edges 03:00:19 3.1Gb INFO: Processing 828 components 03:00:20 3.1Gb INFO: Finished unique edges search. Found 1878 unique edges 03:00:20 3.1Gb INFO: Analysing repeats of multiplicity 2 and looking for additional unique edges 03:00:20 3.1Gb INFO: Finished processing of repeats of multiplicity 2. Found 0 erroneous edges. 03:00:20 3.1Gb INFO: Resolving repeats 03:00:20 3.1Gb INFO: Constructing paths 03:01:36 5Gb INFO: Building graph 03:01:36 5Gb INFO: Increasing k 03:06:01 5Gb INFO: Finished increasing k 03:06:01 5Gb INFO: Exporting remaining active transitions 03:06:01 5Gb INFO: Export to Dot 03:06:01 5Gb INFO: Export to GFA and compressed contigs 03:06:58 5.7Gb INFO: Finished repeat resolution 03:07:07 2Mb INFO: Performing polishing and homopolymer uncompression 03:07:42 0.5Gb INFO: Aligning reads back to assembly 03:17:28 20.7Gb INFO: Finished alignment. 03:17:28 20.7Gb INFO: Printing alignments to "lja_ec/uncompressing/alignments.txt" 03:18:04 35.3Gb INFO: Reading and processing initial reads from ["reads.fa"] 04:06:51 36.9Gb INFO: Uncompressing homopolymers in contigs Child process crashed