Merge data

Many workflows benefit from running DRAM2 processes separately in a map-reduce or scatter and gather method. This is especially the case with Snakemake. When this is the case you then need to combine the results after the run completes. This is where the merge command comes in.

Limitations

At this time, only the merging of gene calls and annotations are supported. These are the first two steps and are the most time consuming parts of DRAM. It it almost always the case that once these two steps are run separately and merged creating a new distillate and annotation file is not a difficult task. However, as data sets get bigger and bigger DRAM2’s post processing steps become more and more cumbersome. It is the long term goal that all DRAM2 outputs will be merged in the future.

Inputs:

Input (Use)

Type

Notes

DRAM2 Output Folders(ARGUMENT)

Directory Path, for DRAM Outputs

Unless you pass the force flag, the folders must contain DRAM2 project config files. If those conditions are not met, DRAM2 will look for the “genes” folder and the “annotations.tsv” file by name and skip folders where neither are found.

annotations file (-a/–annotations)

File path, pointing to an annotations.tsv file

Use with the force flag. If you have your annotations in a custom location then you can combine them with this option. You can use it as many times as you need to point to as many files as you want.

genes folder(-g/–genes)

Directory Path, for DRAM2 called genes

Use with the force flag. If you moved the genes directories for any reason, you can point to each of them by using this flag multipal times. The genes will be combind in the output if there are no colisions in the nameing.

force (-f/–force)

Flag

Skip any config checks and merge the files no mater the cost. If you use this command you except responsibility for the state of the result. Overwriten data, incompatible annotations may result.

Output:

The output will be a DRAM2 output directory as specified by the name passed to the :ref: dram2 command’s -o/–output option. This output directory will have all the data specified from the input options above and a new project_metadata file.

examples: