tLLM Building Energy-Optimal Systems for On-Device LLM Inference ongoing SBVR Kernels Hardware-Friendly Kernel Design of SBVR ongoing HPC Project Optimizing GPT-2 on Multi-Node, Multi-GPU Environments past