Upload
ijsret
View
214
Download
0
Embed Size (px)
Citation preview
8/20/2019 A comparative study on Machine Learning for Computational Learning Theory
http://slidepdf.com/reader/full/a-comparative-study-on-machine-learning-for-computational-learning-theory 1/5
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 9, September 2015
A comparative study on Machine Learning for Computational
Learning Theory
Madhur Aggarwal#1, Anuj Bhatia#2 #1B.Tech, IT from BharatiVidyapeeth's College of Engg., Software Developer at Plumslice Labs Pvt Ltd.
#2B.Tech. ECE from Graphic Era University, ATG Developer at Accordion Systems Pvt. Ltd.
AbstractFrom the past two decades one of the mainstays ofinformation technology is Machine Learning and with
that, a rather vital, albeit generally hidden, aspects of ourlife. Generalizing from examplesMachine learning
algorithms can figure out how to execute significanttasks. This is cost-effective and of- ten feasible wheremanual programming is not. In thispaper weprovide a
comprehensive analysis of various approaches ofmachine learning based on different domains with their
pros and cons. A brief comparison has been made between the different techniques based on certain parameters.
Keywords — machine learning, theory, computationallearning
I. IntroductionMachine learning systems automatically cram programs
from data. This is often a awfully engaging different tomanually making them, and within the last decade theuse of machine learning has unfold quickly all overcomputing and on the far side. Machine learning is usedin internet search, spam filters, recommender systems,
ad placement, credit marking, fraud detection, stockcommerce, drug style, and plenty of alternativeapplications. A recent report from the McKinsey worldInstitute asserts that machine learning (a.k.a. data
processing or predictive analytics) are the driving force
of consecutive massive wave of innovation [1]. Somefine textbooks are out there to concerned practitionersand researchers e.g. [2] [3]. However, a lot of of the“folk knowledge” that's required to effectively create
machine learning applications isn't freely available in
them. As a outcome, several machinelearning assignments take for much longer than
necessary or finish up giving fewer- than-idealoutcomes. however a lot of of this folks knowledge is
fairly simple to connect.Machine learning has turn into a scientific discipline thatoperational communication of its ideas remnants an art.
The concept of a calculated study of machine learning is by no way novel to computer science. As an example,
study in the fields known as inductive interference andapplied pattern recognition typically addresses drawback
of inferring a good rule from given information. Surveysand highpoints of these rich and diversefields[4] [5] [6
[7]. Whereas variety of ideas from these older areashave tested relevant to the current study.
The demand of computational efficiency isnow adefinite and vital concern
Inductive interference model usually look for learningalgorithms that do precise identification within the limit
the categories of
functions taken aretypically so giant thatenhanced complexity outcomes don’t seem to
be attainable. Whereas sometimes finds complexityoutcomes leads to the pattern recognition
works, computational efficiency is in generally asecondary concernStudy in computational learning theory noticeably has
some connection with empirical machine learninganalysis conducted within the field of computer science
As may well be expected, this connection differs instrength and connection from problem to problemIdyllically the 2 fields would balance one another in a
very significant method, withexperimental analysis advising new theorems to be
evidenced and vice-versa. several of the problems half-tracked by artificial intelligence but they seem veryadvanced and are badly understood in their
biological incarnation, to the purpose that they're presently on the far
side mathematical formalization.
II. Phases of Machine Learning Representation: -A classifier should be delineated insome formal language that the pc can handle. [8]
Conversely,selecting a illustration for a learneris equivalent to picking the set of classifiers thait can probably learn. This set is named as the
hypothesis space of the learner. If a classifier isn't withinthe hypothesis space, it can-not be learnedA connected question that we’ll address in late section
is a way to represent the input, i.e., what options to use. Evaluation:An analysis function also
knownas objective function or rating function) isrequired to tell apart smart classifiers from unhealthyones. The anslysis function used inside by the algorithm
8/20/2019 A comparative study on Machine Learning for Computational Learning Theory
http://slidepdf.com/reader/full/a-comparative-study-on-machine-learning-for-computational-learning-theory 2/5
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 9, September 2015
could dissent from the outside one that we wish theclassifier to optimize, for easy optimization (see below)and owing to the problems mentioned within the nextsection. Optimization: Finally, we'd need a way to go
looking amongst the classifiers within the language forthe maximum- scoring one. The
selection of optimisation method [8] is essential tothe competance of the learner, and co-
jointly helps verify the classifier made if the evolution
function has over one optimum its common for novellearners to begin out victimization ready-to-
shelf optimizers, that are later changed by custom-designed ones
III. Issues and ChallengesStatistical Predicate Invention: -Establish invention inILP and concealed variable discovery in applied
mathematics learning are extremely 2 faces of identical problem. Researchers in each group typically agree
that this is often key (if not the key) problematic formachine learning. Without base inventionlearning can forever be narrow in essence each word
within the wordbook is associate unreal predicate, withseveral layers of years of invention among it and also
the sensory percepts on that it's ultimately based.sadly, advancement to date has been narrow.The consent looks to be that the matter is simply too
exhausting, and it’s unclear what to try anddo regarding it.
Generalizing across Domains Machine: -Learninghas historically been outlined as simplifying across tasks
from a similar domain, and within the previousfew decades we’ve learned to try to to this with success.Though, the obtrusive distinction amid machinelearners and folks is that individuals will simplify acrossdomains with good ease. For instance Wall Street
hires countless physicists regarding finance but theyknow nothing about finance .However they apprehend
plenty regarding physics and also the maths it needs,and someway this transfers quite well
to valuation choices and forecasting the stock market
palce place. Machine learners will do nothing of the sort.If the predicates telling 2 domains are completely
different, there’s simply nothing the learner will knockoff the new domain given what it learned within
the recent one
Learning Many Levels of Structure: -So far-off,
in statistical relational learning (SRL) we'vegot developed technologically advanced algorithms for
studying from structured inputs and structuredoutputs, however not for studying structured internal
demonstrations. In each ILP and statistical learningmodels generally have solely 2 levels of structure. As anexample, in support vector machines the 2 levels are thekernel and therefore the linear combination, and inILP the 2 levels are the clauses and their
conjunction. Whereas 2 levels are in essence sufficient torepresent any operate of interest, they're an especially
inefficient method to signify most functions. Byhaving several levels and recycling structure we are ableto typically acquire representations that are
exponentially a lot more compact..
Deep Combination of Learning and Inference: -Inference is vital in structured
learning, however analysis on the 2 has been for themost part separate so far. This has led toa inexplicable state of affairs wherever we tend to spend
plenty of data and central processing unit time learninginfluential models, then again we've to try to
to approximate illation over them, dropping some(possibly much) of that power. Researchers wouldlike biases
and illation must be economical, therefore efficient illation ought to be the bias. We must always design ourlearners from scratch to find out the foremost powerfumodels they'll, subject to the constraint that illation overthem should be efficient always (ideally real time).
Learning to Map between Representations: -An
application space wherever structurelearning will have lot of influence is illustration
mapping.3 major issues during this space are entityresolution (matching objects) schema matching(matching predicates) and metaphysics alignment
(matching concepts). We've got algorithms fordetermining every ofthose issues individually, assumptive the others have
previously been resolved. However in maximum reaapplications they're all arise at the same time, and no
any of the “one piece” algorithms work. this can be ahaul of great sensible significance as a result ofintegration wherever organizations pay maximum oftheir Information Technology budget, and while
not resolving it the “automated Web” (Webservices, linguistics internet, etc.) will not ever reallytake off.
Learning in the Large: -Structured learningis presumably to give off in massive domains, as a result
of in little ones it's usually not too tough to handengineer a “upright enough” set of propositional options
So far, for the foremost half, we’ve functioned on micro problems (e.g. distinguishing promoter regions in DNA)our emphasis ought to shift more and more to macro-
8/20/2019 A comparative study on Machine Learning for Computational Learning Theory
http://slidepdf.com/reader/full/a-comparative-study-on-machine-learning-for-computational-learning-theory 3/5
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 9, September 2015
problems (e.g. modeling the whole metabolic network inan exceedingly cell. We'd like to find out “in themassive,” and this doesn’t simple mean massivedatasets. it's several sizes: learning in made domainswith several reticulate theories: learning with lots of
data, lots of knowledge, or both; taking massive systemsand replacement the normal pipeline design with
conjoint inference and learning; learning models withlots of factors rather than millions; nonstop, open-endedlearning; etc.
Structured Prediction with Intractable Inference: -Max-
margin coaching of organized models like HMMs andPCFGs has turn into fashionable in recent years. One
among its engaging options is that, once reasoning istractable, learning is additionally tractable. Thiscontrasts with most probability and Bayesian ways that
will continue intractable. Yet, most attention grabbingAI issue comprises intractable reasoning. However we
tend to optimize margins once reasoning is approximatecomprise intractable reasoning. However we tend tooptimize margins once reasoning is approximate?
However will approxapproximate reasoning act with theoptimizer? Will we acclimate current improvementalgorithmscurrent improvement algorithms toform them sturdy with relevance reasoning errors, or will we got to develop
fresh ones? We'd like to reply these queries if max-margin ways are to interrupt out of the slender vary of
structures they'll presently handle efficiently.
Reinforcement Learning with Structured Time: -The Markov theory isnice for dominant the complexness of sequent decision issues, however it's co-jointly a
straitjacket .Within the universe systems have memory,some interfaces are quick and a few are slow andlong uneventful periods substitute with bursts of
activity. We need to find out at multiple time scales atthe same time, and with a fashionable structure of
actions and intervals. This is often a lotof advanced, however itshould conjointly facilitate create reinforcementlearning a lot more efficient At coarse scales,
rewards are virtually fast, and RL is simple. At advancedscales, rewards are distant, however by spreadingrewards across scales we might be competent to
also ready to greatly speed up learning.
Expanding SRL to Statistical Relational AI: -We must
reach out to alternative subfields of AI, as a resultof they have a similar issues as we do: they need logical
and applied math approaches, every solves solely asection of the matter, and what's very required may bea combination of the 2. we wish to use learning to
larger and bigger items of a whole AI system. Forinstance, natural language process involves an oversizedvariety of subtasks (analyzing, referenceresolution, word meaning clarification, participant rolelabelling, etc.). By now, learning has been
applied principally to every one in isolation, ignoringtheir interactions. We want to drive towards a solution
to the whole problem.
Learning to Debug Programs: -Machine learning
is creating inroads into different fieldsof computer science: systems, networking, computer
code engineering, databases, design, graphics, HCIetc. This can be a good chance to possess impact, and a
good source of wealthy issues to drive the sphere. Onefield that look set for advancementis automated debugging. Debugging is very long and
time consuming, and was one in all theinitial applications of ILP. Though, within the early
days there was no information for learning to correctand learners couldn't get terribly sofar. Nowadays we've the net and large repositories
of source code. Even higher, we are able to leveragemass collaboration. Anytime a coder repairs a bug, wehave a tendency to probably have a chunk otraining information. If coders allow us tomechanically record their corrects, debugging traces
compiler messages, etc., and lead them to a centrarepository, we'll presently have an outsized corpus o
bugs and bug fixes.
IV.
Methods
A. Basic Al gori thm
For machine learning basic algorithm are used tosolve a binary classification problem namely as:
Naive Bayes,
Nearest Neighbors,
The Perceptron,
K-means which can be used
B. Algori thm in computational Learning
Pitt et al.[9] then notice that illustration categoriesdiagrammatical by k-terms-DNF and k-clause-CNF are
correctly confined inside the categories k-CNF, andtherefore the category k-clause-CNF is polynomiallylearnable by k-CNF, and therefore the category k
clause-CNF is polynomially learnable by k-DNF. andtherefore the author [9] evidenced that for
any mounted k>2, learning k-term-DNF by k-TERM-DNF and learning k-clause-CNF by k-clause-CNF are
NP-hard issues
8/20/2019 A comparative study on Machine Learning for Computational Learning Theory
http://slidepdf.com/reader/full/a-comparative-study-on-machine-learning-for-computational-learning-theory 4/5
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 9, September 2015
The outcomes of [9] are necessary in this they show thetremendous machine benefit that will be multiplied
by even handed modification ofhypothesis illustration. This will be seen asa restricted however obvious validation of the rule of
thumb in AI that illustration is vital. By stirring to a a lotof powerful hypothesis category H rather
than demand on the a lot of natural alternative H=C,move from an NP onerous downside go a polynomialtime resolution
Further positive outcomes for polynomial timelearning embrace the algorithmic rule of Haussler et al
[10] for learning the category of internal dividingBoolean formulae. His algorithmic rule is noteworthy
for the actual fact that the time quality be contingentlinearly on the dimensions of objectformula, however solely logarithmically on the overall
variety of variable n; therefore if there are several inaptattributes the time needed going to be quiet diffident.
The show that there needn't be specific centeringmechanism within the descriptions of distribution openmodel for distinguishing those variables that are
appropriate for a learning algorithmicrule, however somewhat this task is combined within thealgorithms themselves. Alike outcomes are given forlinearly divisible categories by Littlestone [11], andlately a model of learning within the presence
of time several moot features was suggested byBlum[12].
Rivest [13] Take k-decision lists, and provides a polynomial time algorithmic rule learning kDL by kDL
for any constant k. Author additionally proved that kDLappropriately embraces each kCNF and KDNF.Ehrenfeucht et al. [14] studied decision trees.Author
outlined a measure of however stable a choice tree istermed a rank. For decision tree of a hard and fast rankr, they provide a polynomial
time algorithmic learning algorithmic learning algorithmrule that continuously output a rank r decision tree
Abe [15]gave a polynomial time rule for learning a category offormal languages called semi linear sets. Helmbold et al.[16] offer methods for learning nested variations
of categories already acknowledged to be polynomiallylearnable. These embrace categories like class ofall set of Zk closed underneath addition and
subtraction and also the class of nested variations ofrectangles within the plane.There are several efficient rules that learn illustration
categories outlined over Euclidean domains. Most ofthose are supported on the innovative work of Blumer et
al. [17] on learning and also the Vapinkchervonenkisdimension, which will be mentioned in bigger aspectlater. These algorithms display the polynomial
learnability of, amid others the category of all rectanglesin n- dimensional area, and also the intersection ofn half planes in two dimensional area.Gold et al. [18] gave the primary illustration primarily
based hardness outcomes that smear to the dispersal for
method of learning. Author proven that the matter odiscovering the smallest deterministic finite
automation per a given sample is NP complete; theoutcomes of Haussler et al. [19] areoften simply smeared to Gold’s result to show that
learning deterministic finite automata of size n by deterministic finite automata of size n can't be
achieved in polynomial time if RP= NP there are sometechnical difficulties concerned in properly processing
the matter of learning finite automata withinthe distribution free model. Gold’s outcomes wereenhanced by Li et al [20], who demonstrates tha
discovering an automation 9/8 larger than thelittlest consistent automation continues to be NP
complete.Pitt et al. [21] radically enhanced the outcomes of Gold
by verifying that deterministic finite automata of size
n can not be erudite in polynomial time by deterministicfinite automata of size nk for any fastened worth k>0except RP = NP. Their outcomes have open thelikelihood of an effective learning algorithmic rulethrough deterministic finite automata or an algorithmic
rule through some completely different illustration othe sets recognized by automata.
V. Conclusion
We have discussed algorithm and approaches formachine learning based on different domains. They have
some strength and weaknesses, but the motive of thesework are to make machine learning less complex in theaspect of theory computational learning and provideaccurate learning at different point of time at differenconditions.
References[1] J. Manyika, M. Chui, B. Brown, J. Bughin, RDobbs, C. Roxburgh, and A. Byers, “Big data: The nextfrontier for innovation, competition, and productivity,”
Technical report, McKinsey Global Institute, 2011.[2] T. M. Mitchell, “Machine Learning,” McGraw-Hill
New York, NY, 1997.[3] I. Witten, E. Frank, and M. Hall, “Data Mining
Practical Machine Learning Tools and Techniques,”Morgan Kaufmann, San Mateo, CA, 3rd edition, 2011.[4] D. Angluin, C. smith, “Inductive inferences theoryand method,” ACM computing surveys, 15, 1983
pp.237-269.
8/20/2019 A comparative study on Machine Learning for Computational Learning Theory
http://slidepdf.com/reader/full/a-comparative-study-on-machine-learning-for-computational-learning-theory 5/5
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882
Volume 4, Issue 9, September 2015
[5] R. dudas and P. hart, “Pattern classification andscene analysis,” John valley and sons, 1973. [6] L.devroye, “Automatic Pattern Recognition: A studyof probability of error,” IEEE Transaction on patternanalysis and machine intelligence 1998. Pp.530-543.
[7] V.N. Vapnik, “Estimation of dependences based onempirical data,” springer verlag, 1982.
[8] P. Domingos, “A Few Useful Things to Know aboutMachine Learning,” University of Washington Seattle,WA 98195-2350.
[9] L. Pitt, L.G. Valiant, “computational limitation onlearning for examples,” journal of the ACM, 35(4),
1988, pp.965-984.[10] D. Haussler, “Generalizing the PAC model: sample
size bounds from metric dimension based uniformconvergence results.[11] N. Littlestone, “Learning quickly when irrelevant
attributes abound: a new linear theory, IEEE, 1988, pp.120-129.
[12] M. Li, P. Vitanyi, “A theory of learning simpleconcepts under simple distribution and average casecomplexity for the universal distribution,” IEEE, 1989,
pp.34-39[13] R. Rivest, “Learning decision lists,” MachineLearning, 2(3, 1987), pp. 229-246.[14] A. Ehrenfeucht, D.Haussler, “Learning decisiontrees from random examples,” workshop on
computational learning theory, Morgan Publisher, 1990, pp.182-194.
[15] N. Abe, “Polynomial learnability of semi linearsets,” Proceeding of the 1991 workshop on
computational learning theory, 1991, pp.25-40.[16] D. Helmbold, R. Solan, M. Warmuth, “Learningnested differences of intersection closed concept
classes,” workshop on computational learning theory,1988.[17] A. Blumer, A. Ehrenfeucht, D. Haussler, M.
Warmuth, “ Occam’s razor,” information processingletter, 24, 1987, pp. 377-380.
[18] E.M. Gold, “Complexity of automationidentification from given data,” information and control,37, 1978, pp. 302-320.[19] S.Judd, “Learning in neural networks, “proceedings
of the 1988 workshop on computational learning theory,1988, pp.2-8.[20] M. Li, U. Vazirani, “On the learnability of finite
automata,” proceedings of the 1988 workshop oncomputational learning theory, 1988, pp.359-370.[21] A. Blum, “An Approximation algorithm for 3-
coloring,” proceedings of the 21st ACM symposium onthe theory of computing, 1990, pp. 535-542.