## Running clues v1 for simulation study

You run these after your isweep `workflow/simulate/` is made.

1. `conda env create -f clues-environment.yaml`
2. `git clone https://github.com/standard-aaron/clues.git`
3. `python coarse-coal.py` (see args; put under main experiment directory)
  - Too fine-grained of epochs may run very slowly.
4. `bcftools query -l some_vcf | head -n some_number > relate.subsample.txt` (under main experiment directory)
  - Too many samples may run very slowly.
5. Download clues/ from GitHub and place it somewhere on local.
  - https://github.com/standard-aaron/clues
6. Modify the output of `clues/inference.py`. See below.
7. Download Relate software and place it somewhere on local.
  * https://myersgroup.github.io/relate/index.html
  - Place it in your same software folder as other browning lab software
8. Update the clues.yaml file to reflect files, folders on local.
9. `snakemake all -c1 -n --configfile clues.yaml`
  - Do a dry run of snakemake.
10. `nohup snakemake all -c1 --configfile clues.yaml --jobs 1 --keep-going --cluster "[options]" & ``
  - See snakemake documentation to modify options.

<!-- **Note that** the `terminalscripts/inference.py` is from the CLUES developers.
https://github.com/standard-aaron/clues
https://github.com/avaughn271/CLUES2/tree/main -->

### How to modify `clues/inference.py`

At line 345, instead of

```
print('#'*10)
print()
print('logLR: %.4f'%(-res.fun+logL0))
print()
print('MLE:')
print('========')
print('epoch\tselection')
for s,t,u in zip(S,timeBins[:-1],timeBins[1:]):
        print('%d-%d\t%.5f'%(t,u,s))
```

change the script to say

```
print('#'*10)
print()
print('logLR: %.4f'%(-res.fun+logL0))
print()
print('MLE:')
print('========')
print('epoch\tselection')
for s,t,u in zip(S,timeBins[:-1],timeBins[1:]):
        print('%d-%d\t%.5f'%(t,u,s))

fileout=open(args.out+'.selcoef','w')
fileout.write(str(S))
fileout.write('\n')
fileout.close()
```

At line 177, instead of

```
epochs = np.arange(0.0,tCutoff,int(args.tSkip))
# loading population size trajectory
if args.coal != None:
  Nepochs = np.genfromtxt(args.coal,skip_header=1,skip_footer=1)
  N = 0.5/np.genfromtxt(args.coal,skip_header=2)[2:-1]
  N = np.array(list(N)+[N[-1]])
  Ne = N[np.digitize(epochs,Nepochs)-1]
else:
  Ne = args.N * np.ones(int(tCutoff))
```

change the script to say

```
epochs = np.arange(0.0,tCutoff,int(args.tSkip))
# loading population size trajectory
if args.coal != None:
  Nepochs = np.genfromtxt(args.coal,skip_header=1,skip_footer=1)
  N = 0.5/np.genfromtxt(args.coal,skip_header=2)[2:-1]
  N = np.array(list(N)+[N[-1]])
  Ne = []
  digitized = np.digitize(epochs,Nepochs)-1
  for d in digitized:
    try:
      Ne.append(N[d])
    except:
      Nlast=N[-1]
      Ne.append(Nlast)
  Ne = np.array(Ne)
else:
  Ne = args.N * np.ones(int(tCutoff))
```

This ought to be more robust to the last index of population size array

Be careful of an error in tabs and spaces if modifying with vim.

I recommend modifying on a local computer text editor. Then, copy over to cluster.

### Debugging

- Addressed an issue with RelateFileFormats wanting uncompressed vcfs.
- Sometimes there isn't a SNP mapping to a tree. Crashes the relate_branch rule.
- clues modified: output selection coefficient, last index for Ne 
- Plans to compare against clues v2?
  - `git clone https://github.com/avaughn271/CLUES2`