Upload
vuongthien
View
224
Download
1
Embed Size (px)
Citation preview
138
CHAPTER 7: CONCLUSIONS AND AREAS FOR FURTHER
STUDY
Summary of Findings
The purpose of this study was to explore the relationship between students’
writing samples and their LMS usage patterns, and to investigate their potential
predictive value towards a specific student’s usage patterns within the course LMS. How
do what and how students write in submitted assignments relate to their usage patterns in
an LMS? Do specific types of writing (for example, word length of assignments) relate
more directly to LMS usage? Does a specific course type (file/assignment-dominant,
assignment/file-dominant, module/assignment-dominant, or file/module-dominant) relate
more directly to LMS usage? What does modeling tell us about how students’ writing
samples connect to how they engage themselves with an LMS in a hybrid course?
Based on the literature provided for similar studies in tutorial software programs
(Cocea & Weibelzahl, 2009), it was decided to use the Fast Clicker (FC) and High Times
Out (HTO) LMS usage patterns to emulate patterns presented in the literature by these
same phenomena. Since there was no intention of tying these patterns together with
performance data, the patterns were used as means to illustrate pattern training and
detection, and treated throughout the study as having neutral value. Therefore, the terms
“engagement” and “disengagement,” though applied liberally in the literature in regard to
the absence and presence (respectively) of these patterns among the subjects, were of no
applicability in this study.
139
Outlier and quartile rankings were determined for each student in each course and
each writing assignment selected for each course, generating correlation tables to display
relationships between the page views (PV) session metrics and URL token categorical
rankings overall, and for each course type (FA, AF, MA, and FM). Correlation tables
were also generated for writing sample (WS) metrics and linguistic categorical rankings
against the entire PV dataset as well as tables for each WS assignment type (A, an
average word count less than 1000; B, an average word count between 1000 and 2000; C,
an average word count between 2000 and 3500; and D, an average word count greater
than 3500).
After that, decision trees were generated for both the Fast Clicker (FC) usage
pattern and the High Times Out (HTO) usage pattern. 72 decision trees were generated
in all, 36 per usage pattern, which included WS-only and ALL versions of the entire data
set of records, each of the four course types (FA, AF, MA, and MF), and each of the four
writing sample classes (WA, WB, WC, and WD). For these 18 combinations of decision
trees for each LMS usage pattern examined (FC and HTO), both full and pruned versions
of all decision trees were generated, giving a total of 36 per pattern, and 72 overall.
After the decision trees were generated, they were then used for a discussion of
how they might be used in a typical hybrid course to suggest students who have
tendencies, through their writing samples, to adopt the FC and HTO LMS usage patterns.
The generation of decision trees for other patterns of interest, the effect of the patterns on
student learning, as well as the systematic embodiment of these decision trees in software
for the actual prediction of these patterns, were left for future research.
140
Interpretation of Findings and Suggestions for Further Research
What evidence do the findings provide to answer to these questions? Student
LMS usage is largely dictated by the instructor and the course, and instructors generally
structure courses the same way in LMS’s, no matter how much the content and level of
courses differ. They generally do what they’re comfortable doing in the LMS. The page
views (PV) metrics, which include not only the total page views (TPV) but also the
course page views (CPV), and some calculated values based upon these page views as
they exist in the date and time matrix (called session metrics) are shown in Chapter 4. It
reveals the means, medians, and standard deviations of these session metrics over all of
the courses.
The average of the total page views means for all students in all courses was
872.7, whereas the median of all students over all courses was 588.5, which is
considerably less than the average. This indicates that those students who had ranked on
the high side of their peers in total page views for the semester tended to have higher
values than those who ranked on the low side. This is not surprising, since the number of
page views that a student could have in a semester is virtually. Whereas there is a lower
limit: zero. For students who wish to exist within and pass these hybrid courses, 0 page
views are not an option. Minimally, they should have some minimum number of total
page views for the semester that they would need to accumulate in the process of
submitting assignments and perhaps minimally participating in graded discussions.
It should be noted that the total page views (TPV) category is highly dependent
upon the number of courses in which the student is enrolled that semester that used the
LMS. Some students will have a large difference between their TPV levels and their
141
CPV (course page views) levels. As such, the CPV levels are more effective at revealing,
in a raw manner, an individual student’s participation in the LMS aspects of that
particular course. In this study, the mean of means for all students in course page views
was 416.2. The median of all medians produced by the CPV values over all courses was
341.5. The standard deviation over all of these values for CPV is 94.5, which shows
much less variation than the TPV standard deviation of 202.2. This isn’t surprising,
because all students were in the courses in which they enrolled, but some students were in
other courses, which inflated the TPV value but not the CPV value.
The other page views session metrics of interest here are the AMC (average
minutes per click) and RSD (repeated sessions days) values, mostly because they most
affected which students were pre-classified with Fast Clicker and High Times Out
patterns, respectively. In the AMC category, both the mean of means and the median of
medians were 0.5. This finding provides an excellent baseline for evaluating students
against this pattern: The more a student is below .5 in her AMC metric, the more that
student looks like a Fast Clicker in her LMS usage.
The RSD mean of means was 17.7, but the median of medians was only 11.75.
Unfortunately, this value does not give the apparent baseline that was observed in the
AMC metric above, and provided less confidence in finding a meaningful pattern as the
study continued. However, since High Times Out patterns were drawn from higher
values of RSD, the same general conclusion might be drawn regarding this pattern: The
higher the value that a student has for her RSD metric, the more that student looks like a
High Times Out student in her LMS usage. The only element that is missing in this
finding is a solid number with which to determine what is meant by a “high” value for
142
RSD. As shall be seen, this concern will be nullified through preprocessing the values
for each student into rankings against the values for that student’s course peers.
Among the page views URL token categorical dimensions (Assig, Conve, Files,
Grade, Modul, Topic, View, and Wiki), the mean of means produces a virtual tie for
dominance among the categories. Assignments (Assig, 26.6) and Files (Files, 24.5)
clearly lead all other categories. In itself, this does not say much about the students, but
actually more about the courses that were included in the study. For these courses, the
predominant activity in the LMS was completing and submitting assignments and
viewing files to access course content. With this finding, it was not surprising that A or F
ended up as one of the top two values of all four course types (FA, AF, MA, and FM).
Even in an LMS like Canvas ™ which is rich with tools and options, most hybrid
courses fall into one of about four basic categories, mostly differentiated by how content
is delivered. Distance courses will likely be different, as course discussions and
socialization are borne by the LMS and do not happen in the brick and mortar classroom.
So URL token categories such as Topic and View and perhaps Conversations will
become more prominent in the scenario of the distance course.
The surprising result is the virtual tie for third place of prominence between
Modules (Modul, 10.7) and Grade (10.4). One might expect the Grade value to be higher
among courses that are primarily used the LMS as a grade book to report grades to
students. However, all of the courses included in this study provided some content via
the LMS, in the Files, Modules, or Wiki areas. Therefore, the high value for Grades is
somewhat mystifying, until one considers that Grades is the one element that is common
to all courses. All of the courses, no matter how they were designed, generally made
143
grades available to students. So a mean of means should naturally reflect this reality to a
certain extent.
The Modules category (Modul) was lower than the Files and Assignments
categories simply because fewer courses were designed to deliver content via modules
than those which were designed to deliver content via Files. The surprising result is the
consistently low value of the Wiki category. The Wiki pages inside the Canvas™ LMS
are provided for instructors to generate pages to deliver course content directly to the
students within the course. It is apparent that none of the instructors in the study courses
were using that feature to any significant extent. As this LMS was still being rolled out at
the time of data collection, it is a great possibility that the courses were still being
migrated from the old LMS into the new one, and instructors were more likely to simply
link to files (even HTML files) brought over from the old LMS than to create new Wiki
pages to distribute course content. Perhaps as this LMS becomes more mature in this
institution, the value of the Wiki category in the page views will increase to prominence.
Among the descriptive statistics from writing samples, the metrics cannot be
directly compared to the page views metrics because there are more writing samples than
there are courses. In all, there were 27 writing assignments captured among 14 courses,
averaging almost two writing assignments per course (some had three collected, others
only one). Therefore, it is prudent to view the descriptive statistics of these writing
samples separately from the course descriptive statistics.
The mean of means among all writing samples for total word count (WC in the
table) is 1528. The median of medians is 1067. This median is well below the mean,
again due to the relatively unlimited upper bound and the rock solid lower bound (it
144
would not be prudent for a student to turn in an assignment with a word count of 0, but
may be prudent to turn one in with a word count of 5000). Among average words per
sentence (WPS), this metric is a difficult one because it is already a mean when reported
for each student. So a mean WPS among students in a course is already a mean of
means. So the descriptive statistic revealed at the beginning of this chapter is a mean of
means of means. However, for what it’s worth, this value was 28.1 among means
(averages) and 24 in the median of medians, which is a pattern that is consistent with the
rest of our data. The percentage of six-letter or greater words in the writing samples,
percentage of dictionary words used in the writing samples, and number of function
words used in writing samples, were all consistent between their means of means,
medians of medians, and standard deviations.
In the writing samples categorical values, which are drawn from percentages of
words used in writing samples that were included within the various categories (and some
could be in more than one category), the use of cognitive-mechanic (cogmech) words
trumped all other categories with a mean of means of 16.6. The next closest category
was the prepositional word use category (preps, 14.8), followed by a trio of categories
that had similar values (verb, 12.4; relativ, 11.5, and pronoun, 10.8). In all of these
categories, the means of means were very close in value to the medians of medians,
though curiously the medians were always lower than the means, just slightly so. The
article category was a visible exception, but was slightly lower at a higher precision.
Just purely focusing on descriptive statistics alone, future research could be done
in observing more closely the balance of percentages within any specific course offering.
As there are a number of parameters that can be used (this study chose 8 from among 25
145
PV URL token categories), a serious study in this area may include more such categories,
or perhaps new ones as they are added as new features in the LMS.
Interpretation of Correlational Results
Three separate correlational tests were conducted. The key to these tests was
found in the procedure created to rank students into low outliers (0), quartiles (1 through
4, with 1 being the lower 25%), and high outliers (5). This ranking system allowed
students in one course, for example, which required a writing assignment of 3000 words,
to be compared in terms of word count to students in a course which only required a
writing assignment of 750 words. Had analysis been done on the raw word count values
among these students, they would have simply been grouped according to the courses
within which they were enrolled. By ranking the students among their peers (for
example, a student submitting an assignment with 2500 words in the first example would
rank similarly to a student submitting an assignment with 600 words in the second
example), it was possible then to do much of this correlational analysis and to compare
the page views and writing sample features of all 366 students in the study and various
subsets of those students. This system also gave some credence to outliers in various
categories, but did not allow the outliers to inordinately affect the results in those
categories where they existed. As will be apparent in the decision tree analysis,
sometimes being an outlier in a specific category was a chief discriminator in
classification of that student.
The first set of correlational tests, the page views session metrics (TPV, CPV,
etc.), were tested for relationships with the page views URL token categories (Assig,
146
Conve, etc.). The table, and a brief analysis were provided in Chapter 4. What is perhaps
most striking about this table are the strong relationships that occur between the students
pre-classified as Fast Clicker students through their low average minutes per click (AMC)
rankings, and their page views category rankings. Certainly one would expect a strong
relationship between AMC rankings and other session metrics, such as TAM (total
accumulated minutes). But the categorical values are based on percentages, not
frequencies, so by chance they should not be directly correlated with the session metrics
according to instrument design. Yet strong relationships, positive and negative, existed
within this table, with some significant as high as a virtual p=0.000.
With a degree of freedom of 364, a relatively low Pearson’s r coefficient value
becomes significant, as with this number of samples in any given correlation test it would
be truly difficult for chance to produce an erroneous relationship.
One of the striking results of this test is that, in general, students who rank higher
than their peers in visiting Assignments in the LMS have the lowest rankings in nearly all
session metrics. Activities that are associated with the process of preparing and
submitting assignments do not lend themselves to a high number of clicks in the LMS.
In almost opposite (though equally significant) relationship to the PV session
metrics is the Grade category. A student’s percentage of visits to the Grade area of the
LMS is generally positive in relation to the session categories. Therefore, students who
have the highest rankings in the Grade category also have the highest rankings in total
page views (TPV), course page views (CPV), and the other session metrics. The only
exception is Grade’s negative significant correlation to the average minutes per session
(AMS) and average minutes per click (AMC), where higher rankings in Grade checking
147
activity meant lower rankings in average minutes per session and average minutes per
click. This is logically understandable, as it stands to reason that students who are doing
non-productive (purely from a work production perspective) activities tend to spend less
time in the LMS as they quickly login to check grades and then log back out again.
In viewing those students pre-classified as Fast Clicker (FC) students, the only
positive significant correlation with page views URL token categories is with
Conversations and Grades. As noted above, grade-checking activity inside the LMS does
not produce work in terms of assignments nor consume course content. The only other
category that is not obvious in terms of work producing and content consumption is
Conversations, where a student composes and reads messages to and from the instructors
or other students in the course. This category, too, is positively correlated with Fast
Clicker students, though it is not as intuitively logical than was the Grades activity.
Composing conversations (internal email-like messages) takes time, and does not seem to
fit with the Fast Clicker approach. However, reading messages is a very quick and click-
laden process. With this in mind, the interpretation is that students with higher rankings
in Conversations often simply read their messages, but rarely compose them. It is the
only scenario that makes sense in this context.
The only other results from this PV Aggregate table that is of great interest is the
Modules category (Modul). This category is negatively correlated with ACS (average
clicks per session) and with the Fast Clicker classification, but positively correlated with
TAM (total accumulated minutes), AMS (average minutes per session) and AMC
(average minutes per click). From these results it could be interpreted that students who
rank high in accessing modules (in those courses which use them to provide course
148
content) proceed through a session at a much slower rate than those who rank low in
Modules-related activity. This means that the Modules category can be associated, more
than any of the other categories, with course content. Modules activity is course content
consumption activity, which is antithetical to Fast Clicker LMS usage patterns.
The High Times Out classification of students showed much lower correlation
with page views URL token categorical rankings, though it joined Fast Clicker students
in being positively correlated with grade activity (showing that students who checked
their grades a lot sometimes logged in more than once a day at various times in the day to
do so) and negatively correlated with assignments activity (showing that students who
spent time viewing, producing, and submitting assignments tended to do so in one sitting
and did not have as many repeated logins during a given day).
Therefore, without even testing the writing sample data against the LMS usage
data, some patterns of LMS use emerged which provided answers to this study’s research
questions. These patterns could be further explored in future work within this area,
especially since all parameters lay within the page views data freely available to all
instructors and administrators within the LMS. This study only used session metrics that
were generated from the time-date stamps provided through the page views data.
Matrices of day and time were also processed for each student, but were not used since
they had little relevance towards the Fast Clicker or High Times Out patterns that were
explored in this study. These matrices are a whole other area of exploration that could be
undertaken in future work in this field.
The second set of correlational tests, the Page Views Course Types Aggregates,
was conducted to primarily explore one of the sub-questions of the study, “Does a
149
specific course type (file/assignment-dominant, assignment/file-dominant,
module/assignment-dominant, or file/module-dominant) reveal itself more readily in its
corresponding LMS usage?” The FA (Files--Assignments) course type revealed strong
negative correlations between the TPV (total page views) metric and the two primary
categories of this course type, Assignments and Files. Again, Grades and Files had
relatively similarly strong correlations except with opposite polarity to one another. As
with the PV Aggregate analysis of the entire dataset of records, students who were ranked
high in visits to course content areas were ranked low in visits to their grade books. The
AF (Assignments--Files) Course Aggregate correlational analysis confirmed the results
from the FA course test, with some small variations. The MA (Modules--Assignments)
Course Aggregate correlational analysis also confirmed the antithetical relationship
between visits to the grade book and content areas, but in this case the strongest negative
correlations were in the Assignments category, with a sprinkling of significant negative
correlations in the Modules category, as one might expect. Finally, the FM (Files--
Modules) Course Aggregate correlational analysis confirmed the results as mentioned
above, but with two glaring differences: First, the category with the strongest significant
correlations was not one of the primary categories for that course type: it was the
Assignments category. Secondly, Grades and Assignments actually matched polarity in
one page views session metric, that of AMS (average minutes per session), where they
were both negatively correlated.
150
Table 7.1
Type FC -- Neg FC -- Pos HTO -- Neg HTO -- Pos
FA None None None None
AF Assignments Conversations Files None
MA Module Conversations,
Grades
None None
FM Files, Modules, View,
Wiki
None None None
Table 7.1: Course Type Correlations by Category with FC and HTO Usage Patterns
Table 7.1, above, displays the results In terms of correlating with the two example
usage patterns for this study, Fast Clicker (FC) and High Times Out (HTO). It is
apparent from the table that none of these course types provided solid relationships
between the page views URL token categories and the High Times Out pattern, but three
of the four provided solid relationships between those categories and the Fast Clicker
usage pattern.
Future work in this area could include an expansion of how the datasets might be
sub-divided, such as by course requirement type (Gen-Ed, elective, major-required,
major-elective, etc.) or by class level (freshman, sophomore, junior, senior), or any other
way that might make sense. Each of these different views of the courses would provide
their own correlations among the parameters and would also provide their own decision
trees to produce tendencies as has been done in this study with the two subgroups
discussed herein.
The third set of correlational tests, the Writing Sample Metrics Aggregate
Analysis, finally begins to focus upon the primary research question of the study, “How
151
do what and how students write in submitted assignments relate to their usage patterns in
an LMS?”
As the page views course type analyses gave insight into the Fast Clicker usage
pattern, the writing samples seem to give the most insight into the High Times Out usage
pattern. For HTO classified students, the SIXLev (student ranking in usage percentage of
six-letter words or greater) was positively correlated with the HTO usage pattern
students, and the use of prepositions and relativity words were both negatively correlated
with those students. The only category correlating with the Fast Clicker usage pattern
was a positive one in the student’s use of verbs in writing samples.
The student’s use of verbs in writing was also positively correlated with that
student’s visit to the Grades area of the LMS, but negatively correlated to the student’s
visit to the Files area. This supports and confirms the antithetical relationship between
viewing the grade book and viewing content areas. What these students’ uses of verbs in
writing has to do positively with checking the grade book and negatively with viewing
files is another area for further study.
The final set of correlational tests was conducted with classifications of writing
assignments by word count, in an attempt to answer the research question regarding
writing sample type and its affect on recognizing patterns in LMS use. As detailed
above, the writing samples were classified as A through D, with A being assignments
with average word count of less than 1000, B with 1000 to 2000, C with 2000 to 3500,
and D with 3500 or above.
In Class A writing assignments, the use of six-letter or greater words (SIXLev)
and the use of prepositions were the writing categories with the highest number of
152
correlations with page views URL token categories. The FC usage pattern correlated
negatively with the use of six-letter or greater words, while the HTO usage pattern
correlated positively with the same category (SIXLev), but negatively with prepositions
use.
In Class B writing assignments, the relativity word use category (relLev) becomes
prominent, with 10 significant correlations in the 19 categories. However, it only
correlates with one of the two usage patterns examined in this study, negatively, with the
HTO pattern. Relativity words usage is also positively correlated with visits to the
Assignments area of the LMS, and negatively with the View area. What is most
interesting about this class of writing assignments is what is missing: Grade and
Conversations have no significant correlations with any of the writing metrics or
categories in Class B assignments.
In Class C and Class D writing assignments, most of the significant correlations
fall away as the degree of freedom drops as well. In Class C writing, only HTO
correlates with the use of prepositions, but as this is the only significant correlation
(except for its closely-associated by design session metric, RSDLevs), it is suspected of
being spurious. In Class D, there are a few more significant correlations, but none
affecting the FC or HTO LMS usage patterns into which this study seeks some insight.
One final set of correlational tests was performed with “binned” students, or with
only sets of students that were pre-classified as FC, HTO, or Both (FC and HTO). These
tests were performed to determine if there were specific relationships between writing
sample metrics and categories and page views metrics and URL token categories within
these groups of students. The plan was to use any correlations that were produced in
153
these tests to inform and guide the construction of decision trees in the next set of
procedures.
In the Bin All (all FC, HTO, and/or Both pre-classified students), a number of
significant correlations were generated by the tests. The dominant writing categories
were SIXlev (six-letter or greater) word use, functional words used (FUNlev), pronoun
use, and cognitive-mechanical word use. What is surprising with this test is the
emergence of dominant categories that have not shown up as dominant (except for
SIXlev) in previous correlational tests. Also surprising in this Bin All table is that the
visits to URLs where the students viewed content (such as streaming videos) were
significantly correlated over five writing categories, both positively with SIXlev (which
was also positively correlated with most of the page views session metrics) and
negatively correlated with DIClev, FUNlev, pronouns and relativity word use.
The binned FC students also had a number of significant correlations, especially
in FUNlev, pronoun use, and prepositions use. The View category was the only page
views URL token category with a significant relationship to the LMS usage pattern.
In HTO and Both, the correlations drop out considerably, demonstrating that
binning these two categories for this pattern is unproductive.
In summary, the correlational tests do show that there are a number of significant
relationships between what students write and submit in their writing assignments, and
how they use their course LMS. As this study is not designed to show causality,
inferences were not made regarding these correlations. However, the most significant
result of these findings is that they set up a lot of avenues for future research, some of
154
which may examine the issue of causality or at least explore each of the significant
relationships more closely.
Decision Trees
Decision trees were generated to produce tendencies toward both the FC and HTO
LMS usage patterns, as a way to look at the relationships discovered in the correlation
mining from a different perspective. Although decision trees are generally used for
predicting outcomes in test data based on the training data, the number of parameters
involved in this study makes the use of the generated decision trees as predictive tools
less than adequate. However, the value of these decision trees is squarely centered upon
the eliciting of the metrics and categories that tend to arise again and again among these
72 trees. As such, it was helpful to create Wordle™ images that represented the
prevalence of metrics and categories as they arose within the 36 decision trees generated
for each of the two patterns.
FC Metrics and Categories Frequencies in Generated Decision Trees
Among the decision trees generated to produce tendencies of students fitting the
Fast Clicker LMS usage pattern, the Wordle™ image (Figure 7.1) shown below
represents the metrics and categories that were included in the trees, with their number of
inclusions making them more prominent (in terms of size) in the display.
Figure 7.1
Figure 7.1: Wordle™ Diagram of FC Decision Tree Discriminators
A quick glance at
among the branches of all 36 trees that were generated for the FC pattern: cogLev,
SIXlev, DIClev, Module, and artLev.
metrics (SIXlev and DIClev), two were from
artLev), and one was from the page views
a good balance between parameters
note that Grades, Files, and Assignments,
descriptive statistics and correlations, were not prevalent enough in the decision trees to
be prominently displayed in the diagram.
: Wordle™ Diagram of FC Decision Tree Discriminators
at the image reveals five parameters that are the most prevalent
among the branches of all 36 trees that were generated for the FC pattern: cogLev,
SIXlev, DIClev, Module, and artLev. Two of the parameters were from writing sample
metrics (SIXlev and DIClev), two were from writing sample categories (cogLev and
artLev), and one was from the page views URL token categories (Modul). This pro
parameters in all three major datasets. It is of interest to also
note that Grades, Files, and Assignments, despite how prominent they were in the
descriptive statistics and correlations, were not prevalent enough in the decision trees to
be prominently displayed in the diagram.
155
s five parameters that are the most prevalent
among the branches of all 36 trees that were generated for the FC pattern: cogLev,
of the parameters were from writing sample
writing sample categories (cogLev and
categories (Modul). This provides
. It is of interest to also
despite how prominent they were in the
descriptive statistics and correlations, were not prevalent enough in the decision trees to
Future work with students in the Fast Clicker pattern should include
investigation into those five
Fast Clicker LMS usage pattern.
HTO Metrics and Categories Frequencies in Generated Decision Trees
Among the decision trees generated to produce tendencies of students
High Times Out LMS usage pattern, the Wordle™
represents the metrics and categories that were included in the trees, with their number of
inclusions making them more prominent (in terms of size) in the display.
Figure 7.2
Figure 7.2: Wordle™ Diagram of
A quick glance of the image presents more parameters of prominence than the FC
pattern image, which is understandable as this usage pattern showed more variation
Future work with students in the Fast Clicker pattern should include
those five specific parameters, and their actual predictive value for this
Fast Clicker LMS usage pattern.
HTO Metrics and Categories Frequencies in Generated Decision Trees
Among the decision trees generated to produce tendencies of students
MS usage pattern, the Wordle™ image shown below, in Figure
represents the metrics and categories that were included in the trees, with their number of
inclusions making them more prominent (in terms of size) in the display.
: Wordle™ Diagram of HTO Decision Tree Discriminators
of the image presents more parameters of prominence than the FC
pattern image, which is understandable as this usage pattern showed more variation
156
Future work with students in the Fast Clicker pattern should include further
specific parameters, and their actual predictive value for this
HTO Metrics and Categories Frequencies in Generated Decision Trees
Among the decision trees generated to produce tendencies of students fitting the
, in Figure 7.2,
represents the metrics and categories that were included in the trees, with their number of
inclusions making them more prominent (in terms of size) in the display.
of the image presents more parameters of prominence than the FC
pattern image, which is understandable as this usage pattern showed more variation
157
among the parameters during the correlation tests. Therefore, a more thoughtful
investigation of the diagram is required. One parameter dominated all others: prepsLev.
Three parameters were of secondary prominence: Grade, SIXlev, and artLev. The other
parameters gradually were reduced in stature indicating that they were liberally and
variously distributed among the decision trees generated for the HTO pattern. It was
interesting that one of the four highest parameters was from writing sample metrics
(SIXlev), two were from writing sample categories (prepsLev and artLev), and one was
from the page views URL token categories (Grade). The SIXlev and artLev parameters
were also prominent in the FC decision trees. It is also of interest to note that the Grade
category did appear as prominent in these trees, but again, Files, Modules, and
Assignments, despite how prominent they were in the descriptive statistics and
correlation tests, were not prevalent enough in the decision trees to be prominently
displayed in the diagram.
Future work with students in the High Times Out pattern should include further
investigation into those four specific parameters, and their actual predictive value for this
High Times Out LMS usage pattern.
Sample Test Class Results from Decision Trees
Finally, in running data through the generated decision trees from a sample class
of students taught in the same LMS during the Spring 2013 semester, these results were
produced:
In the FC Decision Tree Tendency Composite Table (Table 6.5), several
tendencies stood up as consistently high among scores from all three writing assignments
158
in the sample course. Four students, from among the 15, were identified as having
tendencies towards the Fast Clicker LMS usage pattern based upon the features of their
writing samples. Two of those students were shown to have composite tendencies by two
different decision trees. Perhaps it is more interesting to note the decision trees that
provided the composites, over the students who were tentatively classified. The
instruments were F4 and F6, which are the DecTreeFC_AF_All and
DecTreeFC_WA_All, respectively. This shows that future work in building predictive
models (over the exploratory ones here), should focus more on decision trees generated
with both WS and PV parameters (All), and subsets of students based on course type
(AF) and writing assignment type (WA), over the entire dataset of students from all
classes and exhibiting all writing types.
In the HTO Decision Tree Tendency Composite Table (Table 6.10), only two
students were classified consistently over all three writing assignments, and neither were
classified by more than one tree. However, the two trees that classified the two students
were the same versions of the trees as the FC trees mentioned above. This confirms that,
for both patterns, exploration should be centered on decision trees generated from a
combination of both the PV and WS data, and on subsets gathered from the same specific
course types and writing sample assignment classes.
Summary of Conclusions
The conclusions may be summarized in this manner:
• There is a relationship between how a student writes and how a student
uses an LMS. This relationship can be represented by some of the general
159
patterns produced in the study, such as the tendency for students who visit
the Grades and Conversations (messaging) part of the LMS to spend less
time and clicks in the content areas of the courses, such as in Files and
Modules.
• Because this relationship exists, there is great potential for building
predictive models of LMS usage based on student writing samples (e.g.,
with Decision Trees). These models were explored and discussed, and
show promise for future studies in this area.
• The writing characteristics that are the best ones to use for this modeling
will vary somewhat according to the way the course is built and managed
in the LMS as well as the type and length of the writing sample involved.