Tutorials#
Walkthrough
Learn the basics
Activation Patching
Causal intervention
Attribution Patching
Approximate patching
Boundless DAS
Identifying Causal Mechanisms in Alpaca
Dictionary Learning
Sparse autoencoders
Logit Lens
Decode activations
LoRA
Fine tuning for sentiment analysis
- LoRA for Sentiment Analysis
- Setup
- Prepare Data
- Prepare our Model
- LLM Fine Tuning
- Activation Patching
- Set Up
- Patching Experiment
- Limitations
- Trying on a bigger model
- Attribution Patching
- Boundless DAS
- Setup (Ignore)
- Price Tagging game
- Prealign Task
- Boundless DAS
- Dictionary Learning
- Setup
- Apply SAE
- Logit Lens
- Walkthrough
- 1️⃣ First, let’s start small
- 2️⃣ Bigger
- 3️⃣ I thought you said huge models?
- Next Steps
- Getting Involved!