⚖️ Applying World Model Learning to Legal Document Classification
When a lawyer reads a court document, they don't treat it as a bag of words. They track relationships: what did the court decide, what did the plaintiff argue, what did the defendant claim — and how do these connect to the constitutional question at stake?
I built a model that tries to learn this way.
THE CORE IDEA
Instead of fine-tuning LegalBERT with a standard classification head, I adapted Causal-JEPA (Nam et al., 2026) — a world model for video — to treat each legal document as a world of interacting objects:
• Ruling chunks (court / judge / ordered...)
• Plaintiff chunks (petitioner / appellant...)
• Defendant chunks (respondent / appellee...)
• Other chunks
• Constitutional category label ← the fifth slot
The category is not a label stamped on top. It is a fifth object, participating in the same masked prediction learning as the legal content.
HOW IT LEARNS
Some slots are hidden, and the model infers them from the visible ones — in both directions:
• Label masked → infer category from ruling + plaintiff + defendant
• Ruling masked → infer ruling content using category as context
• Defendant masked → infer defendant content given category + plaintiff
This forces the model to learn what constitutional categories mean in terms of how legal language relates across slots — not just surface word correlations.
TWO-STAGE TRAINING
Stage 1 — C-JEPA pretraining: self-supervised slot interaction learning, no classification loss. When the category slot is masked, the model predicts it using a Rectified Generalized Gaussian loss (Kuang et al., 2026) rather than MSE — encouraging sparse, well-separated category representations in the latent space. Document slots (ruling, plaintiff, defendant) use standard MSE reconstruction.
Stage 2 — Encoder frozen. Lightweight classification head trained with cross-entropy on mean document slot representations, mirroring the paper's ALOE evaluation design.
HONEST CAVEATS
• No true t₀ anchor: legal documents are static, so the code approximates the paper's first-frame identity anchor as the first visible chunk of the same slot type.
• No action signal: all five slots are object states. The paper's Uₜ (action/proprioception) has no legal equivalent — absent by design.
• Slot assignment by keyword matching, not learned.
Full code in PyTorch + HuggingFace on my website. Curious whether others have tried slot-based approaches for legal NLP.
#LegalAI #NLP #MachineLearning #WorldModels #JEPA #LegalBERT #LegalTech #AIResearch