Project Archive: Machine Learning for Data Analysis

This is a snapshot of project information archived on 2 September 2022. Please contact the project team for most recent updates.

Machine Learning for Data Analysis

Subject: Information Technology

Audience: Graduate and senior undergraduate students, interested researchers

Book Cover:


Created date: July 12, 2022

Updated date: August 30, 2022


  • Attribution
  • Non-Commercial


  • Project Managers
  • Lead Authors
  • Lead Editors


The University of Maryland College Park, Biocomputational Engineering (BCE) program in Bioengineering (BIOE) department is committed to open science ideas to support the development of our students into ethical leaders in their communities. In line with this, we have commenced an attempt to shift to open educational resources (OERs) in Data Science and Machine Learning courses. The BCE program is one of the first programs in the country to teach students biology and computer science combined, and our instructors are designing the courses by adjusting graduate-level content to meet the learning needs of senior undergraduate students. Our BCE program has successfully launched its first semester courses in Fall 2021,


ENBC332: Statistics, Data Analysis, and Data Visualization, with the respect to the OER missions and goals to use textbooks freely available for current semester courses,


ENBC311: Python for Data Analysis, and ENBC403: Research Methods in Biological Data Mining, and we propose to continue this work by revising and designing future course,

ENBC321: Machine Learning for Data Analysis

, with the OER standards. In the BCE program, we are committed to excellent instruction and scholarship in a supportive and inclusive environment where all students can succeed. The Machine Learning for Data Analysis course is multifaceted and involves various resources to be taught effectively. This course is in high demand, ranked 3


best jobs in the U.S. (

Johnson (2022, Feb 2); Smith (2022, Feb 2)

), and the BCE program is increasingly one of the newest majors at the University of Maryland and our nation. Specifically, the

local institutions


Frederick National Lab, National Institutes of Health (NIH), Food and Drug Administration (FDA), National Institute of Standards and Technology (NIST) as well as other companies showed considerable interest in hiring data scientists. To help aid in the development of OER resources for this consequential course and considering the COVID-19 pandemic situation, we designed previous and current courses so that materials would be available for our students even when they cannot attend our classes in person. We intend to follow such availability for ENBC321 as well. We pursued the University’s goal of

advancing the culture of scholarship


fearlessly forward: in pursuit of excellence and impact for the public good

and the moral goal of

science for everyone

and hope to continue to do so with a M.O.S.T grant (


). We believe we will be able to continue to promote such a culture within our Machine Learning course, BIOE department, our university, and beyond. You can download the project summary:


Short Description:

This course will instruct students in the fundamentals of machine learning methods through examples of biological phenomena and clinical data analysis. This course is designed to share knowledge of real-world data science and aid to learn complex machine learning theories, algorithms, and coding libraries in a simple way. The structure of this course is designed to walk students step-by-step into the world of machine learning. The course will cover major topics in Machine Learning such as supervised learning (i.e., regression, classification), unsupervised learning, association rule learning, reinforcement learning, deep learning, dimensionality reduction, and model selection and boosting. This course is packed with practical machine learning exercises that are based on real-life examples. Students will learn machine learning theory, but they will also get hands-on practice building their models using programming tools such as Python



Introduction to Machine Learning

What is the ML? What are the applications of ML in our lives? How to learn ML in the best efficient way?

What should I expect in this course?


Simple linear Regression, Multiple Regression

Advanced regression models: Support Vector Regression; Random Forest regression

Survival analysis: Cox proportional hazard regression


Logistic regression

K-nearest neighbor

Support vector machine


Decision tree model

Random forest

Unsupervised learning models


Hierarchical clustering

The optimum number of clusters

Introduction to Reinforcement Learning

What is RL?


Thompson sampling

Introduction to Deep Learning

What is ANN?

From ANN to DNN

What is CNN?

Convolutional Neural Networks

Convolutional operation

Relu Layer


Dimensionality Reduction

What is DR? Why do we need DR?

What are the methods to reduce the dimensionality of the data?

Principal Component Analysis (PCA)

Linear Discriminant Analysis (LDA)

Model Selection and boosting



hyperparameter tuning