没有合适的资源?快使用搜索试试~ 我知道了~
Using machine learning to predict student difficulties from lear...
0 下载量 174 浏览量
2021-02-12
00:10:15
上传
评论
收藏 1.24MB PDF 举报
温馨提示
Using machine learning to predict student difficulties from learning session data
资源推荐
资源详情
资源评论
Artif Intell Rev
https://doi.org/10.1007/s10462-018-9620-8
Using machine learning to predict student difficulties
from learning session data
Mushtaq Hussain
1
· Wenhao Zhu
1
· Wu Zhang
1
·
Syed Muhammad Raza Abidi
1
· Sadaqat Ali
1
© Springer Science+Business Media B.V., part of Springer Nature 2018
Abstract The student’s performance prediction is an important research topic because it can
help teachers prevent students from dropping out before final exams and identify students that
need additional assistance. The objective of this study is to predict the difficulties that students
will encounter in a subsequent digital design course session. We analyzed the data logged
by a technology-enhanced learning (TEL) system called digital electronics education and
design suite (DEEDS) using machine learning algorithms. The machine learning algorithms
included an artificial neural networks (ANNs), support vector machines (SVMs), logistic
regression, Naïve bayes classifiers and decision trees. T he DEEDS system allows students to
solve digital design exercises with different levels of difficulty while logging input data. The
input variables of the current study were average time, total number of activities, average
idle time, average number of keystrokes and total related activity for each exercise during
individual sessions in the digital design course; the output variables were the student(s) grades
for each session. We then trained machine learning algorithms on the data from the previous
session and tested the algorithms on the data from the upcoming session. We performed
k-fold cross-validation and computed the receiver operating characteristic and root mean
square error metrics to evaluate the models’ performances. The results show that ANNs
and SVMs achieve higher accuracy than do other algorithms. ANNs and SVMs can easily
be integrated into the TEL system; thus, we would expect instructors to report improved
student’s performance during the subsequent session.
Keywords Machine learning · Educational data mining (EDM) · Decision support tools ·
E-learning · Neural networks (NN) · Support vector machine (SVM)
B
Wenhao Zhu
whzhu@shu.edu.cn
1
School of Computer Engineering and Science, Shanghai University, Shanghai, China
123
M. Hussain et al.
1 Introduction
The educational advantages of e-learning include online teaching and course delivery, which
do not require physical classrooms for students. Compared to traditional modes of learning,
e-learning is less expensive, and a larger number of students can register for online c ourses.
However, in e-learning, there is no direct communication between students and teachers.
Therefore, e-learning poses some challenges. First, it is difficult for instructors to assess the
effectiveness of a course. Second, the dropout rate of students in e-learning courses is much
higher than that in traditional modes of learning. Third, assessing student’s performance is
difficult. Fourth predicting at-risk students in new courses is also difficult. Finally, teachers
are interested in predicting students’ expected results on upcoming assessments (Lykourent-
zou et al. 2009; Pahl and Donnellan 2002; Smith-Gratto 1999; Kuzilek et al. 2015; Bakki
et al. 2015).
Web-based learning environments such as massive open online courses (MOOC), digital
electronics education and design suite (DEEDS) and learning management systems (LMSs)
allow teachers to study student performances using logged student data, but teachers may
have difficulty analyzing the student logs. MOOCs and LMSs are popular types of web-based
learning platforms; they provide free higher education to the entire world and offer courses
from different universities. Furthermore, they provide administration, documentation, content
assembly, student management and self-services (He et al. 2015). LMSs are online portals
for both students and teachers that facilitate teacher-student interactions and allow them to
perform their educational tasks and activities. More-over LMSs deliver courses to students,
and the students can select their own courses through a course selection process (Imran et al.
2014). MOOCs are free web-based learning platforms that supply all their courses online.
Students can register and attend these courses from any location (Kloft et al. 2014). These
web-based learning environments affect how teachers and students think during class, and
they can be used to predict a student’s performance during the next class or a student’s
behavior at different times. In addition, these environments can be used to improve course-
related content (Chen et al. 2000).
Predicting a student’s progress in a class or session through, for example, quizzes, assign-
ments, exams, and session activities can provide instructors with in-depth information on the
progress of students throughout the course. To achieve this goal, researchers have applied
various machine learning and statistical techniques to data acquired from both traditional and
online universities.
In traditional universities, researchers mostly use a student’s educational history (e.g., quizzes,
midterm exams, degrees, and attended schools) and demographic information (country, sex,
race and zip code) to predict student’s performance.
Acharya and Sinha (2014) forecast students’ performances using machine learning tech-
niques (e.g., C4.5, sequential minimal optimization (SMO), Naïve bayes, 1-NN (1-Nearest
Neighborhood), and MLP (multi-layer perceptron) with input features (e.g., gender, income,
board marks and attendance). They applied correlation-based feature selection (CBFS) tech-
niques to improve the model performances and determined that SMO achieves a higher
effective average testing accuracy (66%) than do other methods.
De Albuquerque et al. (2015) employed artificial neural networks (ANNs) to predict
student’s performance. These models achieved high accuracy (85%) using input features
such as grades, periods of study and school scores.
123
Using machine learning to predict student difficulties…
Marbouti et al. (2016) used logistic regression, support vector machines (SVMs), decision
trees (DTs), ANNs and a Naïve bayes classifier (NBC) to identify at-risk students in advance
of the next course. This study used input features, such as grades, attendance, quizzes, weekly
homework, team participation, project milestones, mathematical modeling activity tasks, and
exams from an offline course. Analysis of the results found that the NBC algorithm provided
satisfactory accuracy (85%).
Huang and Fang (2013) performed a study that used machine learning techniques to
predict student academic performance in engineering courses. In this study, the input fea-
tures included course grades from all semesters and the output variable was exam scores.
The researchers observed that SVMs are suitable for predicting an individual student’s per-
formance and that multilinear regression is suitable for forecasting the performance of all
students in a course.
Abu Saa (2016) performed a study to find the best classifier to predict student’s perfor-
mance in higher education using social and personal input features.
Some probabilistic models (i.e., Bayesian knowledge tracing) have been used to predict
student’s performance by analyzing logs compiled during student computer gameplay (Käser
et al. 2017). However, these models do not predict hidden patterns of students.
Furthermore, in traditional universities, some statistical methods have been used to predict
student’s performance; these include linear mixed-effect models (LMEM) and survival analy-
sis techniques that use variable multimodal data (heart rate, step count, weather condition and
learning activity) as input along with cumulative student pre-enrollment and semester-wise
information (Di Mitir et al. 2017; Ameri et al. 2016).
Currently, most universities provide courses using e-learning systems accessible from
any location. Scientists use input features common to these e-learning systems (e.g., time,
activity, assessment and online discussion forums) to forecast student performance.
Kotsiantis et al. (2003) predicted student’s performance on final exams using machine
learning techniques (e.g., Naïve bayes, 3-NN, RIPPER C4.5 and WINNOW). They used
demographic features as inputs (e.g., sex, age, marital status, and number of children) along
with performance-related input features (e.g., meetings and assignment grades) from an e-
learning system and found that the Naïve bayes approach achieved a higher average accuracy
(73%) than did the alternatives.
Hu et al. (2014) developed a student warning system using e-learning system features such
as course login time, average login time and delay in reading the assignment. They found
that C4.5 and CART achieved satisfactory accuracy (93 and 94%, respectively).
Kaur and Kaur (2015) examined student difficulties in a course on mathematics, system
analysis, and design using data mining techniques. They used test grades as input features and
determined that AdaBoost was the best classifier for predicting the difficulties that students
would experience in subjects.
Vahdatetal.(2015) used process mining (PM) and complexity matrix (CM) methods to
analyze the relationship between grades and students’ learning processes using DEEDS data.
They concluded that the average student grades are positively correlated with the CM and
that difficulty is negatively correlated with the CM. In addition, they determined that process
discovery using PM and CM models provides valuable information regarding student learning
processes.
Chen et al. (2000) used database systems and decision tree techniques on e-learning system
logs to check the performance of students using an approach helpful for teachers.
Hlosta et al. (2017) introduced a self-learning system using machine learning algorithm
to find at-risk students in a new course without any previous history data. This study demon-
strated that XGBoost achieved the best performance.
123
M. Hussain et al.
He et al. (2015) provided early predictions of at-risk students in a MOOC course using an
LR technique by analyzing assignment and lecture features.
Arnold and Pistilli (2012) developed a learner analytics system that allowed teachers
to give real-time support to students and solved student retention problems. Moreover, this
system depends on student demographic characteristics, past academic history, student efforts
and student grades. The results showed that students using the analytics system achieved
higher grades compared to those who did not use the system.
Liu and d’Aquin (2017) used a supervised learning algorithm to p redict student’s perfor-
mance. They investigated how demographic variables and online learning activities affect
student’s performance. Furthermore, they used the k-prototypes clustering algorithm to find
the group of weak students who needed additional help from the teacher. They concluded
that the successful groups of students mostly came from privilege and most of these students
complete their higher education.
The authors Kai et al. (2017) used the J-48 and J-Rip classifiers to identify students who
do not continue past the course orientation stage and found that these models provide good
information to teachers that can aid in student retention.
Another study Elbadrawy et al. (2015) predicted student grades using a collaborative
multi-regression model based on students’ performances, activities and Moodle interactions
as features. The results revealed that the performance of a collaborative multi-regression
model using Moodle interaction features is comparable to that of a linear regression model.
Studies have also been conducted that use ANNs with only slight modifications to classify
students based on to their final grades using web-based education system features (VOD-
watching times, courseware download times and BBS posting times) (Zheng et al. 2013).
Some commonly used machine learning techniques have been investigated to predict student’s
performance and identify at-risk students in e-learning course (Kuzilek et al. 2015).
Recently, an early predictive model was developed using student demographic, LMS
data, and aptitude-related features. The authors developed a learning analytic system with an
applied LR model that sent emails to high-risk students (Jayaprakash et al. 2014).
The majority of early studies that focused on predicting student grades and learning
behaviors in upcoming course sessions used datasets from traditional universities and e-
learning systems. However, their input features did not reflect the students’ performances
during in-session problem-solving exercises or projects. None of the existing studies predicted
students’ performances in a technology-enhanced learning (TEL) domain. Most studies used
academic input features (e.g., GPA (grade point average) and grade and semester marks)
and non-academic input features (e.g., age and gender), which are less effective for making
timely predictions concerning student’s performance in a TEL system. Moreover, it may be
costly to collect these data. Some of the early studies that used training and test data from
the same course suffered from the same difficulty; such methods do not help the teacher
correctly evaluate the model accuracy in the succeeding session because student course
outlines and activities change from one class to another. Performance prediction regarding
future coursework sessions based on log data is not a straightforward task because every
session has its own difficulties and unique problems. Prediction also depends on course
features and teaching techniques; thus, it is important to build an intelligent TEL data system
that forecasts the difficulty of the upcoming session.
The first step in improving student learning is being able to predict the difficulty that
students will have with the subsequent class session. Predicting student difficulty for the next
session using DEEDS logs is important to both the instructor and to the students in MOOCs
and TEL systems. However, because teachers of MOOCs and TEL systems are not machine
learning experts, they cannot easily interpret the DEEDS log data. A data interpretation
123
Using machine learning to predict student difficulties…
feature can be easily integrated into a TEL system or a MOOC to identify the students’
difficulties, improve their learning performances, and prevent performance degradation in
the subsequent session. In addition, such predictions allow the instructor to use the DEEDS
logs in the learning model to determine the probability that a student will encounter difficulties
in the next session and to provide feedback to the student in real time. Overall, by using this
method, the instructor is better able to prepare students who experience difficulties before
they start their subsequent sessions. Thus, this approach is expected to increase retention and
provide advance information about the challenges that individual students experience.
This study used machine learning algorithms to predict individual students’ difficulties
in the subsequent session of a TEL system when the students performed different activities
(problem-solving exercises, laboratory assignments, reading course-related materials, etc.)
during the course session. These data can be easily integrated into DEEDS and MOOCs to
assist teachers in identifying potential student difficulties in upcoming sessions. There does
not appear to be any related prior research on using TEL and machine learning techniques to
predict student difficulties in a subsequent session of a digital design course.
In the current study, we used log data obtained from the DEEDS (https://www.
digitalelectronicsdeeds.com), a TEL tool and virtual digital electronic laboratory used by
instructors at the University of Genoa, Italy, both in and out of the classroom to improve
student learning. Students remember concepts better when reading course-related materials
using the TEL system than when reading course-related materials without the TEL system
(Vahdatetal.2015). Additionally, the DEEDS is an e-learning environment used by stu-
dents to complete various laboratory assignments in electronic and information engineering
classes at the University of Genoa (Donzellini and Ponta 2007). By applying the DEEDS to
massive open online courses (MOOCs) and learning management systems (LMSs), teachers
can easily track students activities and provide students with news, guidelines and feedback
(Donzellini and Ponta 2007).
Our main goals were as follows:
• To identify the most appropriate machine learning algorithms for predicting the difficulty
an individual student would have in the next session of a digital design course based on
prior session activities and the current session.
• To investigate which machine learning algorithms used in the current study are appro-
priate for predicting student difficulty in the next session of digital design course while
using the fewest features.
The results of the current study show that SVMs and ANNs are appropriate machine learning
models to predict a student’s performance as well as the difficulty a student will experience
over the entire next session in a digital design course. The remainder of this paper is organized
as follows: Section 2 includes problem formulation. Section 3 describes the materials and
methods, Sect. 4 presents the experimental results, and Sect. 5 presents conclusions and
describes future work.
2 Problem formulations
The DEEDs is a technology-enhanced learning and virtual digital electronic laboratory used to
improve student learning. The problem of predicting student difficulty in the DEEDs involved
investigating the most appropriate machine learning algorithm to predict student difficulties in
terms of the grades they would earn in the subsequent session of digital design course exercises
123
剩余26页未读,继续阅读
资源评论
weixin_38624914
- 粉丝: 7
- 资源: 950
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- TH2024005基于微信平台的文玩交易小程序ssm.zip
- java高校职工工资管理系统
- 零基础学AI-python语言:python基础语法(课件部分)
- IMT5G推进组发布5G无人机应用白皮书
- 基于Java SSM写的停车场管理系统,加入了车牌识别和数据分析
- 2025年P气瓶充装模拟考试卷
- 【java毕业设计】基于spring boot心理健康服务系统(springboot+vue+mysql+说明文档).zip
- 基于vue+ssm816企业在线培训系统全套(源码+万字LW).zip
- 【java毕业设计】springbootJava物业智慧系统(springboot+vue+mysql+说明文档).zip
- 【源码+数据库】基于java Swing+mysql实现的学生选课信息系统
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功