1. How to Ground A Language for Legal Discourse In a
Prototypical Perceptual Semantics L. Thorne McCarty Rutgers
University
2. Background Papers An Implementation of Eisner v. Macomber,
in ICAIL-'95. Computational reconstruction of 1920 corporate tax
case. Based on a theory of prototypes and deformations. Some
Arguments About Legal Arguments, in ICAIL-'97. Critical review of
the literature. Discussion of The Correct Theory in Section 5:
Legal reasoning is a form of theory construction... A judge
rendering a decision in a case is constructing a theory of that
case... If we are looking for a computational analogue of this
phenomenon, the first field that comes to mind is machine
learning...
3. ICAIL-'97, Section 5: Most machine learning algorithms
assume that concepts have classical definitions, with necessary and
sufficient conditions, but legal concepts tend to be defined by
prototypes. When you first look at prototype models [Smith and
Medin, 1981], they seem to make the learning problem harder, rather
than easier, since the space of possible concepts seems to be
exponentially larger in these models than it is in the classical
model. But empirically, this is not the case. Somehow, the
requirement that the exemplar of a concept must be similar to a
prototype (a kind of horizontal constraint) seems to reinforce the
requirement that the exemplar must be placed at some determinate
level of the concept hierarchy (a kind of vertical constraint). How
is this possible? This is one of the great mysteries of cognitive
science. It is also one of the great mysteries of legal theory.
...
4. Summary Contemporary trends in machine learning have now
shed new light on the subject. In this paper, I will describe my
Recent work on manifold learning: Clustering, Coding and the
Concept of Similarity, arXiv:1401.2411 [cs.LG] (10 Jan 2014). Work
in progress on deep learning (forthcoming, 2015): Differential
Similarity in Higher Dimensional Spaces: Theory and Applications.
Deep Learning with a Riemannian Dissimilarity Metric. Taken
together, this work leads to a logical language grounded in a
prototypical perceptual semantics, with implications for legal
theory.
5. Prototype Coding What is prototype coding? The basic idea is
to represent a point in an n-dimensional space by measuring its
distance from a prototype in several specified directions.
Furthermore, we want to select a prototype that lies at the origin
of an embedded, low-dimensional, nonlinear subspace, which is in
some sense optimal.
6. Manifold Learning S. Rifai, Y.N. Dauphin, P. Vincent, Y.
Bengio, X. Muller, The Manifold Tangent Classifier, in NIPS 2011:
Three hypotheses: 1. ... 2. The (unsupervised) manifold hypothesis,
according to which real world data presented in high dimensional
spaces is likely to concentrate in the vicinity of non-linear
sub-manifolds of much lower dimensionality ... [citations omitted]
3. The manifold hypothesis for classification, according to which
points of different classes are likely to concentrate along
different sub-manifolds, separated by low density regions of the
input space.
7. Manifold Learning The Probabilistic Model: Brownian motion
with a drift term. More precisely, a diffusion process generated by
the following differential operator: The invariant probability
measure is proportional to . Thus is the gradient of the log of the
probability density. eU x U x
8. Manifold Learning U x , y , zOx 6 y 6 z 6 U x , y , zOx 5 y
5 z 5
9. Manifold Learning The Geometric Model: To implement the idea
of prototype coding, we choose: A radial coordinate, , which
follows . The directional coordinates, 1 , 2 ,...,n1 , orthogonal
to . But we actually want a lower-dimensional subspace, obtained by
projecting our diffusion process onto a k1 dimensional subset of
the directional coordinates. The device we need is a Riemannian
metric, , which we interpret as a measure of dissimilarity.
Crucially, the dissimilarity metric should depend on the
probability measure. U x U x gij x
10. Manifold Learning Find a principal axis for the coordinate.
Choose the principal directions for the 1 , 2 ,..., k 1
coordinates. To compute the coordinate curves, follow the geodesics
of the Riemannian metric in each of the k1 principal
directions.
11. Manifold Learning Prototypical Clusters Probability density
is a mixture: These two prototypical clusters are exponentially far
apart. It is natural to refer to this model as a theory of
differential similarity. e U x p1 e U 1 x p2 e U 2x
12. Deep Learning S. Rifai, Y.N. Dauphin, P. Vincent, Y.
Bengio, X. Muller, The Manifold Tangent Classifier, in NIPS 2011:
Three hypotheses: 1. The semi-supervised learning hypothesis,
according to which learning aspects of the input distribution p(x)
can improve models of the conditional distribution of the
supervised target p(y|x) ... [citation omitted]. This hypothesis
underlies not only the strict semi-supervised setting where one has
many more unlabeled examples at his disposal than labeled ones, but
also the successful unsupervised pretraining approach for learning
deep architectures [citations omitted]. 2. ... 3. ...
13. Deep Learning Historically, used as a benchmark for
supervised learning: Y. LeCun, L. Bottou, Y. Bengio, and P.
Haffner. "Gradient-Based Learning Applied to Document Recognition."
Proceedings of the IEEE, 86(11):2278-2324 (November, 1998). We will
treat it as a problem in unsupervised feature learning. Standard
Example: MNIST pixels 60,000 training set images 10,000 test set
images 2828
15. Deep Learning is estimated from the data using the mean
shift algorithm. at a prototype. The prototypical clusters
partition the space of 600,000 patches. U x U x=0 35
Prototypes
16. Deep Learning is estimated from the data using the mean
shift algorithm. at a prototype. The prototypical clusters
partition the space of 600,000 patches. U x U x=0 35
Prototypes
17. Deep Learning Geodesic Coordinates for Two Prototypes
19. Deep Learning General Procedure: Construct the product
manifold from the encoded values of the smaller patches. Construct
a submanifold using the Riemannian dissimilarity metric. encode
Category: 4 12 dimensions48 dimensions
20. The Logical Language Rewrite the top four patches as a
logical product: Use the syntax of my Language for Legal Discourse
(LLD):
21. The Logical Language For this interpretation, we need a
logical language based on category theory: Define: Categorical
Product In Man, this is the product manifold. Define: Categorical
Subobject In Man, this is a submanifold. objects morphisms Set
abstract sets arbitrary mappings Top topological spaces continuous
mappings Man differential manifolds smooth mappings
22. The Logical Language For this interpretation, we need a
logical language based on category theory: Define: Categorical
Product In Man, this is the product manifold. Define: Categorical
Subobject In Man, this is a submanifold. objects morphisms Set
abstract sets arbitrary mappings Top topological spaces continuous
mappings Man differential manifolds smooth mappings logic classical
intuitionistic ????
23. The Logical Language Sequent Calculus: Actor and
Corporation are interpreted as differential manifolds. macomber and
so are interpreted as points on these manifolds. Control is
interpreted as a submanifold of the product manifold. A sequent is
interpreted as a morphism.
24. The Logical Language Structural Rule for cut: Introduction
and Elimination Rules for conjunction: Horn Axioms: This is
sufficient for horn clause logic programming.
25. The Logical Language Novel Property: A proof is a
composition of morphisms in the category Man, i.e., it is a smooth
mapping of differential manifolds.
26. The Logical Language Novel Property: A subspace is not
always a submanifold. Implications for Godel's Theorem?
Implications for Learnability? Note: If we are looking for a
learnable knowledge representation language, we want it to be as
restrictive as possible. 1.0 0.5 0.5 1.0x 1.0 0.5 0.5 1.0 y
27. The Logical Language Introduction and Elimination Rules for
existential quantifiers: Introduction and Elimination Rules for
universal quantifiers: Introduction and Elimination Rules for
implication: Axioms for simple embedded implications:
28. The Logical Language Conclusion: We have thus
reconstructed, with a semantics grounded in the category of
differential manifolds, Man, the full intuitionistic logic
programming language in: Clausal Intuitionistic Logic. I.
Fixed-Point Semantics, J. of Logic Programing, 5(1): 1-31 (1988).
Clausal Intuitionistic Logic. II. Tableau Proof Procedures, J. of
Logic Programing, 5(2): 93-132 (1988).
29. Defining the Ontology of LLD From A Language for Legal
Discourse. I. Basic Features, in ICAIL'89: There are many common
sense categories underlying the representation of a legal problem
domain: space, time, mass, action, permission, obligation,
causation, purpose, intention, knowledge, belief, and so on. The
idea is to select a small set of these common sense categories, ...
and develop a knowledge representation language that faithfully
mirrors the structure of this set. The language should be formal:
it should have a compositional syntax, a precise semantics and a
well-defined inference mechanism. ...
30. Defining the Ontology of LLD Count Terms and Mass Terms
Events/Actions and Modalities Over Actions Permissions and
Obligations, IJCAI '83. Modalities Over Actions, KR '94. Knowledge
and Belief S.N. Artemov, The Logic of Justification, Rev. of
Symbolic Logic, 7(1): 1-36 (2008). M. Fitting, Reasoning with
Justifications (2009).
31. Probability Geometry Logic Toward a Theory of
Coherence
32. Probability Geometry Logic Artificial Intelligence Toward a
Theory of Coherence
33. Probability Geometry Logic Artificial Intelligence
Stochastic Differential Geometry: Emery & Meyer (1989), Hsu
(2002) Toward a Theory of Coherence
34. Probability Geometry Logic Artificial Intelligence
Stochastic Differential Geometry: Emery & Meyer (1989), Hsu
(2002) MacLane & Moerdijk, Sheaves in Geometry and Logic (1992)
Toward a Theory of Coherence
35. Toward a Theory of Coherence Logic Geometry
Probability
36. Toward a Theory of Coherence Logic Geometry Probability
Constraints Logic is constrained by the geometry.
37. Toward a Theory of Coherence Logic Geometry Probability
Constraints Logic is constrained by the geometry. Geometric model
is constrained by the probabilistic model.
38. Toward a Theory of Coherence Logic Geometry Probability
Constraints Logic is constrained by the geometry. Geometric model
is constrained by the probabilistic model. Probability measure is
constrained by the data.
39. Toward a Theory of Coherence Logic Geometry Probability
Constraints Logic is constrained by the geometry. Geometric model
is constrained by the probabilistic model. Probability measure is
constrained by the data. Conjecture: The existence of these mutual
constraints makes theory construction possible.
40. Toward a Theory of Coherence ICAIL-'97, Section 5: Somehow,
the requirement that the exemplar of a concept must be similar to a
prototype (a kind of horizontal constraint) seems to reinforce the
requirement that the exemplar must be placed at some determinate
level of the concept hierarchy (a kind of vertical constraint). How
is this possible? This is one of the great mysteries of cognitive
science. It is also one of the great mysteries of legal
theory.
41. Toward a Theory of Coherence ICAIL-'97, Section 5: Somehow,
the requirement that the exemplar of a concept must be similar to a
prototype (a kind of horizontal constraint) seems to reinforce the
requirement that the exemplar must be placed at some determinate
level of the concept hierarchy (a kind of vertical constraint). How
is this possible? This is one of the great mysteries of cognitive
science. It is also one of the great mysteries of legal theory. Q:
Is the mystery now solved?