How to Ground A Language for Legal Discourse In a Prototypical Perceptual Semantics

1. How to Ground A Language for Legal Discourse In a Prototypical Perceptual Semantics L. Thorne McCarty Rutgers University

2. Background Papers An Implementation of Eisner v. Macomber, in ICAIL-'95. Computational reconstruction of 1920 corporate tax case. Based on a theory of prototypes and deformations. Some Arguments About Legal Arguments, in ICAIL-'97. Critical review of the literature. Discussion of The Correct Theory in Section 5: Legal reasoning is a form of theory construction... A judge rendering a decision in a case is constructing a theory of that case... If we are looking for a computational analogue of this phenomenon, the first field that comes to mind is machine learning...

3. ICAIL-'97, Section 5: Most machine learning algorithms assume that concepts have classical definitions, with necessary and sufficient conditions, but legal concepts tend to be defined by prototypes. When you first look at prototype models [Smith and Medin, 1981], they seem to make the learning problem harder, rather than easier, since the space of possible concepts seems to be exponentially larger in these models than it is in the classical model. But empirically, this is not the case. Somehow, the requirement that the exemplar of a concept must be similar to a prototype (a kind of horizontal constraint) seems to reinforce the requirement that the exemplar must be placed at some determinate level of the concept hierarchy (a kind of vertical constraint). How is this possible? This is one of the great mysteries of cognitive science. It is also one of the great mysteries of legal theory. ...

4. Summary Contemporary trends in machine learning have now shed new light on the subject. In this paper, I will describe my Recent work on manifold learning: Clustering, Coding and the Concept of Similarity, arXiv:1401.2411 [cs.LG] (10 Jan 2014). Work in progress on deep learning (forthcoming, 2015): Differential Similarity in Higher Dimensional Spaces: Theory and Applications. Deep Learning with a Riemannian Dissimilarity Metric. Taken together, this work leads to a logical language grounded in a prototypical perceptual semantics, with implications for legal theory.

5. Prototype Coding What is prototype coding? The basic idea is to represent a point in an n-dimensional space by measuring its distance from a prototype in several specified directions. Furthermore, we want to select a prototype that lies at the origin of an embedded, low-dimensional, nonlinear subspace, which is in some sense optimal.

6. Manifold Learning S. Rifai, Y.N. Dauphin, P. Vincent, Y. Bengio, X. Muller, The Manifold Tangent Classifier, in NIPS 2011: Three hypotheses: 1. ... 2. The (unsupervised) manifold hypothesis, according to which real world data presented in high dimensional spaces is likely to concentrate in the vicinity of non-linear sub-manifolds of much lower dimensionality ... [citations omitted] 3. The manifold hypothesis for classification, according to which points of different classes are likely to concentrate along different sub-manifolds, separated by low density regions of the input space.

7. Manifold Learning The Probabilistic Model: Brownian motion with a drift term. More precisely, a diffusion process generated by the following differential operator: The invariant probability measure is proportional to . Thus is the gradient of the log of the probability density. eU x U x

8. Manifold Learning U x , y , zOx 6 y 6 z 6 U x , y , zOx 5 y 5 z 5

9. Manifold Learning The Geometric Model: To implement the idea of prototype coding, we choose: A radial coordinate, , which follows . The directional coordinates, 1 , 2 ,...,n1 , orthogonal to . But we actually want a lower-dimensional subspace, obtained by projecting our diffusion process onto a k1 dimensional subset of the directional coordinates. The device we need is a Riemannian metric, , which we interpret as a measure of dissimilarity. Crucially, the dissimilarity metric should depend on the probability measure. U x U x gij x

10. Manifold Learning Find a principal axis for the coordinate. Choose the principal directions for the 1 , 2 ,..., k 1 coordinates. To compute the coordinate curves, follow the geodesics of the Riemannian metric in each of the k1 principal directions.

11. Manifold Learning Prototypical Clusters Probability density is a mixture: These two prototypical clusters are exponentially far apart. It is natural to refer to this model as a theory of differential similarity. e U x p1 e U 1 x p2 e U 2x

12. Deep Learning S. Rifai, Y.N. Dauphin, P. Vincent, Y. Bengio, X. Muller, The Manifold Tangent Classifier, in NIPS 2011: Three hypotheses: 1. The semi-supervised learning hypothesis, according to which learning aspects of the input distribution p(x) can improve models of the conditional distribution of the supervised target p(y|x) ... [citation omitted]. This hypothesis underlies not only the strict semi-supervised setting where one has many more unlabeled examples at his disposal than labeled ones, but also the successful unsupervised pretraining approach for learning deep architectures [citations omitted]. 2. ... 3. ...

13. Deep Learning Historically, used as a benchmark for supervised learning: Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-Based Learning Applied to Document Recognition." Proceedings of the IEEE, 86(11):2278-2324 (November, 1998). We will treat it as a problem in unsupervised feature learning. Standard Example: MNIST pixels 60,000 training set images 10,000 test set images 2828

14. Deep Learning 7X7 patch 60,000 images 600,000 patches 49 dimensions 12 dimensions sample scan encode scan 14X14 patch 48 dimensions encode 12 dimensions encode Category: 4 12 dimensions48 dimensions

15. Deep Learning is estimated from the data using the mean shift algorithm. at a prototype. The prototypical clusters partition the space of 600,000 patches. U x U x=0 35 Prototypes

16. Deep Learning is estimated from the data using the mean shift algorithm. at a prototype. The prototypical clusters partition the space of 600,000 patches. U x U x=0 35 Prototypes

17. Deep Learning Geodesic Coordinates for Two Prototypes

18. Deep Learning 7X7 patch 60,000 images 600,000 patches 49 dimensions 12 dimensions sample scan encode scan 14X14 patch 48 dimensions encode 12 dimensions encode Category: 4 12 dimensions48 dimensions

19. Deep Learning General Procedure: Construct the product manifold from the encoded values of the smaller patches. Construct a submanifold using the Riemannian dissimilarity metric. encode Category: 4 12 dimensions48 dimensions

20. The Logical Language Rewrite the top four patches as a logical product: Use the syntax of my Language for Legal Discourse (LLD):

21. The Logical Language For this interpretation, we need a logical language based on category theory: Define: Categorical Product In Man, this is the product manifold. Define: Categorical Subobject In Man, this is a submanifold. objects morphisms Set abstract sets arbitrary mappings Top topological spaces continuous mappings Man differential manifolds smooth mappings

22. The Logical Language For this interpretation, we need a logical language based on category theory: Define: Categorical Product In Man, this is the product manifold. Define: Categorical Subobject In Man, this is a submanifold. objects morphisms Set abstract sets arbitrary mappings Top topological spaces continuous mappings Man differential manifolds smooth mappings logic classical intuitionistic ????

23. The Logical Language Sequent Calculus: Actor and Corporation are interpreted as differential manifolds. macomber and so are interpreted as points on these manifolds. Control is interpreted as a submanifold of the product manifold. A sequent is interpreted as a morphism.

24. The Logical Language Structural Rule for cut: Introduction and Elimination Rules for conjunction: Horn Axioms: This is sufficient for horn clause logic programming.

25. The Logical Language Novel Property: A proof is a composition of morphisms in the category Man, i.e., it is a smooth mapping of differential manifolds.

26. The Logical Language Novel Property: A subspace is not always a submanifold. Implications for Godel's Theorem? Implications for Learnability? Note: If we are looking for a learnable knowledge representation language, we want it to be as restrictive as possible. 1.0 0.5 0.5 1.0x 1.0 0.5 0.5 1.0 y

27. The Logical Language Introduction and Elimination Rules for existential quantifiers: Introduction and Elimination Rules for universal quantifiers: Introduction and Elimination Rules for implication: Axioms for simple embedded implications:

28. The Logical Language Conclusion: We have thus reconstructed, with a semantics grounded in the category of differential manifolds, Man, the full intuitionistic logic programming language in: Clausal Intuitionistic Logic. I. Fixed-Point Semantics, J. of Logic Programing, 5(1): 1-31 (1988). Clausal Intuitionistic Logic. II. Tableau Proof Procedures, J. of Logic Programing, 5(2): 93-132 (1988).

29. Defining the Ontology of LLD From A Language for Legal Discourse. I. Basic Features, in ICAIL'89: There are many common sense categories underlying the representation of a legal problem domain: space, time, mass, action, permission, obligation, causation, purpose, intention, knowledge, belief, and so on. The idea is to select a small set of these common sense categories, ... and develop a knowledge representation language that faithfully mirrors the structure of this set. The language should be formal: it should have a compositional syntax, a precise semantics and a well-defined inference mechanism. ...

30. Defining the Ontology of LLD Count Terms and Mass Terms Events/Actions and Modalities Over Actions Permissions and Obligations, IJCAI '83. Modalities Over Actions, KR '94. Knowledge and Belief S.N. Artemov, The Logic of Justification, Rev. of Symbolic Logic, 7(1): 1-36 (2008). M. Fitting, Reasoning with Justifications (2009).

31. Probability Geometry Logic Toward a Theory of Coherence

32. Probability Geometry Logic Artificial Intelligence Toward a Theory of Coherence

33. Probability Geometry Logic Artificial Intelligence Stochastic Differential Geometry: Emery & Meyer (1989), Hsu (2002) Toward a Theory of Coherence

34. Probability Geometry Logic Artificial Intelligence Stochastic Differential Geometry: Emery & Meyer (1989), Hsu (2002) MacLane & Moerdijk, Sheaves in Geometry and Logic (1992) Toward a Theory of Coherence

35. Toward a Theory of Coherence Logic Geometry Probability

36. Toward a Theory of Coherence Logic Geometry Probability Constraints Logic is constrained by the geometry.

37. Toward a Theory of Coherence Logic Geometry Probability Constraints Logic is constrained by the geometry. Geometric model is constrained by the probabilistic model.

38. Toward a Theory of Coherence Logic Geometry Probability Constraints Logic is constrained by the geometry. Geometric model is constrained by the probabilistic model. Probability measure is constrained by the data.

39. Toward a Theory of Coherence Logic Geometry Probability Constraints Logic is constrained by the geometry. Geometric model is constrained by the probabilistic model. Probability measure is constrained by the data. Conjecture: The existence of these mutual constraints makes theory construction possible.

40. Toward a Theory of Coherence ICAIL-'97, Section 5: Somehow, the requirement that the exemplar of a concept must be similar to a prototype (a kind of horizontal constraint) seems to reinforce the requirement that the exemplar must be placed at some determinate level of the concept hierarchy (a kind of vertical constraint). How is this possible? This is one of the great mysteries of cognitive science. It is also one of the great mysteries of legal theory.

41. Toward a Theory of Coherence ICAIL-'97, Section 5: Somehow, the requirement that the exemplar of a concept must be similar to a prototype (a kind of horizontal constraint) seems to reinforce the requirement that the exemplar must be placed at some determinate level of the concept hierarchy (a kind of vertical constraint). How is this possible? This is one of the great mysteries of cognitive science. It is also one of the great mysteries of legal theory. Q: Is the mystery now solved?

Technology

How to Ground A Language for Legal Discourse In a Prototypical Perceptual Semantics