Now:
- Finish C backend
- Combine nested maps into single ParFor 
- Extend C backend to turn ParFor/IndexReduce/IndexScan expressions into CUDA kernels 

Long term:
- Indexing by boolean masks
- Support 'output' parameter of ufuncs 
- Garbage collection 


On pause:
- Adverb semantics for conv
- Code generation for conv

Maybe never?
- Adverb-level vectorization 
- Split up outermost scope into execution plan, 
  i.e. [local_vars1 = run_llvm(fn1, inputs), 
        local_vars2 = parfor(fn2, inputs, local_vars1)
        ...
        return result_vars
       ]
       
Old:
- Only run tiling on perfectly nested code
