Target:
How can we improve the interpretability of deep learning networks when applied to source code?
Project Objectives:
Develop a Web-Based Annotation and Visualization Tool
- Convert existing Tkinter annotation tools to a web application.
- Visualize concept clusters using interactive elements like word clouds, dendrograms, and sunburst charts.
- Integrate manual annotations and LLM-generated labels.
Generate Interpretability Datasets Using RAID
- Utilize the RAID (Rapid Automatic Interpretability Datasets) tool to create labeled datasets.
- Incorporate Tree-sitter for generating abstract syntax trees and apply B-I-O labeling.
- Visualize datasets similarly to Hugging Face's dataset viewer.
Ideally, we will combine these functionalities into a master app and/or repository. The aim is to have an app usable by both industry professionals and academics looking to improve their models or advance their research into deep learning and/or language models.
Figures
Web App Pipeline

RAID Workflow
