top of page
Refresh

⚖️ Applying World Model Learning to Legal Document Classification

When a lawyer reads a court document, they don't treat it as a bag of words. They track relationships: what did the court decide, what did the plaintiff argue, what did the defendant claim — and how do these connect to the constitutional question at stake?

I built a model that tries to learn this way.

THE CORE IDEA

Instead of fine-tuning LegalBERT with a standard classification head, I adapted Causal-JEPA (Nam et al., 2026) — a world model for video — to treat each legal document as a world of interacting objects:

  • Ruling chunks (court / judge / ordered...)
  • Plaintiff chunks (petitioner / appellant...)
  • Defendant chunks (respondent / appellee...)
  • Other chunks
  • Constitutional category label ← the fifth slot

The category is not a label stamped on top. It is a fifth object, participating in the same masked prediction learning as the legal content.

HOW IT LEARNS

Some slots are hidden, and the model infers them from the visible ones — in both directions:

  • Label masked → infer category from ruling + plaintiff + defendant
  • Ruling masked → infer ruling content using category as context
  • Defendant masked → infer defendant content given category + plaintiff

This forces the model to learn what constitutional categories mean in terms of how legal language relates across slots — not just surface word correlations.

TWO-STAGE TRAINING

Stage 1 — C-JEPA pretraining: self-supervised slot interaction learning, no classification loss. When the category slot is masked, the model predicts it using a Rectified Generalized Gaussian loss (Kuang et al., 2026) rather than MSE — encouraging sparse, well-separated category representations in the latent space. Document slots (ruling, plaintiff, defendant) use standard MSE reconstruction.

Stage 2 — Encoder frozen. Lightweight classification head trained with cross-entropy on mean document slot representations, mirroring the paper's ALOE evaluation design.

HONEST CAVEATS

  • No true t₀ anchor: legal documents are static, so the code approximates the paper's first-frame identity anchor as the first visible chunk of the same slot type.
  • No action signal: all five slots are object states. The paper's Uₜ (action/proprioception) has no legal equivalent — absent by design.
  • Slot assignment by keyword matching, not learned.

Full code in PyTorch + HuggingFace on my website. Curious whether others have tried slot-based approaches for legal NLP.

 

#LegalAI #NLP #MachineLearning #WorldModels #JEPA #LegalBERT #LegalTech #AIResearch
 

bottom of page