Machine Learning

A Full-Semester Course (To be offered at JHU)

By Dr. Muhammad Ali Yousuf

Last updated: 7/03/22

 

Objective

The objective of this course is to teach various machine learning algorithms with a few applications in mind, mostly covered via end-of-course projects.

 

General Links

Subscribe to

 

My weekly newspaper on Machine Learning and Data Science, (Updated every Monday, 10:00 AM EST)

 

My Twitter list, Data Science (#Data, #DataScience, #DeepLearning, #MachineLearning, #Analytics #ArtificialIntelligence)

 

My Data Science Collection on YouTube, https://tinyurl.com/DataScienceYouTube

 

Book(s)

 

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems 1st Edition

 

Part I: The Fundamentals of Machine Learning

Chapter 1 The Machine Learning Landscape

Chapter 2 End-to-End Machine Learning Project

Chapter 3 Classification

Chapter 4 Training Models

Chapter 5 Support Vector Machines

Chapter 6 Decision Trees

Chapter 7 Ensemble Learning and Random Forests

Chapter 8 Dimensionality Reduction

 

Part II: Neural Networks and Deep Learning

Chapter 9 Up and Running with TensorFlow

Chapter 10 Introduction to Artificial Neural Networks

Chapter 11 Training Deep Neural Nets

Chapter 12 Distributing TensorFlow Across Devices and Servers

Chapter 13 Convolutional Neural Networks

Chapter 14 Recurrent Neural Networks

Chapter 15 Autoencoders

Chapter 16 Reinforcement Learning

 

See the book page: http://shop.oreilly.com/product/0636920052289.do

 

Python Machine Learning 1st Edition by Wei-Ming Lee (Author)

 

Chapter 1 Introduction to Machine Learning

Chapter 2 Extending Python Using NumPy

Chapter 3 Manipulating Tabular Data Using Pandas

Chapter 4 Data Visualization Using matplotlib

Chapter 5 Getting Started with Scikit-learn for Machine Learning

Chapter 6 Supervised Learning-Linear Regression

Chapter 7 Supervised Learning-Classification Using Logistic Regression

Chapter 8 Supervised Learning-Classification Using Support Vector Machines

Chapter 9 Supervised Learning-Classification Using K-Nearest Neighbors (KNN)

Chapter 10 Unsupervised Learning-Clustering Using K-Means

Chapter 11 Using Azure Machine Learning Studio

Chapter 12 Deploying Machine Learning Models

 

Download full book sample code

https://www.wiley.com/en-us/Python+Machine+Learning-p-9781119545637

 

 

 

Core Material

 

 

Supplementary Material

 

Statistics (Review)

A 7-hours course by Dr. Sarkar on Statistics for Data Science

 

 

Introduction

Book (pdf): Introduction to Machine Learning

 

What are Neural Nets and how do they learn?

 

What is Deep Learning? (Comprehensive online book from MIT press)

 

What is MATLAB?

 

Introduction to Python

 

From YouTube: Machine Learning - Andrew Ng, Stanford University [FULL COURSE]

 

What is Machine Learning? (Online course at Coursera)

 

Useful computational resources

 

Fundamentals of Machine Learning, a whitepaper published by Interactions Corporation. The resource defines machine learning, how it works and what kind of impact it has on our daily lives. It also discusses the future of machine learning, specifically anticipated challenges and the increasing importance of a human element in this intelligence. [Thanks to Billy Adams from Interactions.com for sharing this resource.]

 

Data Science Central, https://www.datasciencecentral.com/

Python

First you need a Python compiler/IDE.

I�d recommend using Anaconda as it comes with all the major libraries you need for Data Science. The remaining libraries can be installed easily. https://www.anaconda.com/

 

Jupyter - Online Python compiler

 

Google Colab - Another online python compiler

 

PyCharm - A great IDE for Python

 

Then learn python, http://learnpython.org/. This is not the only resource and you can find many other online courses.

 

SciPy (pronounced Sigh Pie) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. It includes NumPy, Pandas, and Matplotlib

 

NumPy is the fundamental package for scientific computing with Python.

 

Pandas - is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,

built on top of the Python programming language.

 

Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

 

SciKit-learn - Machine Learning in Python

 

TensorFlow - An open source machine learning framework for everyone

 

PyCaret -  An open source low-code machine learning library.

From YouTube: Python Tutorial for Beginners - Full Course in 11 Hours

 

From YouTube: Python for Data Science | Data Science with Python | Python for Data Analysis | 11 Hours Full Course

 

Following webpage (shared by Kate Chapman / Aimee O'Driscoll) lists six courses which teach Python in the context of ethical hacking and are worth checking:

 

https://www.comparitech.com/blog/information-security/hacking-python-courses-online/. The website covers many other topics including SQL.

 

There are many libraries for Machine Learning. Here is a short list (see https://www.geeksforgeeks.org/best-python-libraries-for-machine-learning/ for details):

 

PyTorch, is an open source machine learning framework that accelerates the path from research prototyping to production deployment.

 

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlowCNTK, or Theano.

 

CNTK - Microsoft Cognitive Toolkit, previously known as CNTK and sometimes styled as The Microsoft Cognitive Toolkit, is a deep learning framework developed by Microsoft Research.

 

Theano - Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.

 

Chainer from Japan - A Powerful, Flexible, and Intuitive Framework for Neural Networks

 

R Programming

The R Project for Statistical Computing, https://www.r-project.org/

 

R Studio Cloud (No download necessary)

https://login.rstudio.cloud/

 

R Studio, https://www.rstudio.com/

 

From YouTube:

 

Introduction to Data Science with R - Data Analysis by David Langer,

Part 1: https://www.youtube.com/watch?v=32o0DnuRjfg

 

Part 2: https://www.youtube.com/watch?v=u6sahb7Hmog

 

Free book: R for Data Science by Garrett Grolemund and Hadley Wickham

SQL

 

SQL Server 2017 Express edition, https://www.microsoft.com/en-us/sql-server/sql-server-editions-express

 

https://www.mysql.com/

 

 

 

From YouTube: SQL Tutorial - Full Database Course for Beginners

 

https://cloud.google.com/sql/ (Try it Free)

 

https://free.caspio.com/ (Free)

 

https://www.freesqldatabase.com/ (Free)

 

Introduction to SQL

https://www.datacamp.com/courses/intro-to-sql-for-data-science?utm_medium=fb%2Cig-all&utm_source=fb_paid&utm_campaign=smartly_ppa&utm_id=5b1801cb8783d060f32ea35a 

MATLAB

 

If you are part of JHU, you can get MATLAB free.

If not, there is a free MATLAB Trial for Data Science, (It includes MATLAB and a full set of products for data science: Global Optimization Toolbox, Parallel Computing Toolbox, Curve Fitting Toolbox, Deep Learning Toolbox, Statistics and Machine Learning Toolbox, Optimization Toolbox, Database Toolbox, Text Analytics Toolbox and Symbolic Math Toolbox.)

https://www.mathworks.com/campaigns/products/trials/targeted/dan.html

 

MATLAB for Deep Learning, https://www.mathworks.com/solutions/deep-learning.html?s_tid=hp_brand_deeplearning

 

Data Analytics in MATLAB

https://www.mathworks.com/products/data-analytics-whats-new.html

 

and

https://www.mathworks.com/videos/data-analytics-with-matlab-99066.html

 

8 MATLAB Cheat Sheets for Data Science, https://www.mathworks.com/campaigns/offers/data-science-cheat-sheets.confirmation.html?elqsid=1559494336562&potential_use=Education

 

Getting Started courses called Onramps: These run entirely on the browser - no downloads needed - and allow students to earn certificates to share on their LinkedIn (for example),

Machine Learning - https://www.mathworks.com/learn/tutorials/machine-learning-onramp.html

Deep Learning - https://www.mathworks.com/learn/tutorials/deep-learning-onramp.html

Reinforcement Learning - https://www.mathworks.com/learn/tutorials/reinforcement-learning-onramp.html

Video series that provide high-level overviews on these topics:

https://www.mathworks.com/videos/series/introduction-to-machine-learning.html (4 part video series on Machine Learning)

https://www.mathworks.com/videos/series/applied-machine-learning.html (more practical perspectives)

https://www.mathworks.com/videos/series/deep-learning-with-MATLAB.html#tutorials (video series focused on Deep Learning)

https://www.mathworks.com/videos/series/deep-neural-networks.html (video series)

https://www.mathworks.com/videos/series/deep-learning-for-engineers.html (more practical perspectives)

 

Specific Examples / Datasets:

 

Data Sets for Deep Learning, including images: https://www.mathworks.com/help/deeplearning/ug/data-sets-for-deep-learning.html

 

Brain MRI Age classification interesting:

matlab-deep-learning/Brain-MRI-Age-Classification-using-Deep-Learning: MATLAB example using deep learning to classify chronological age from brain MRI images (github.com)

 

Semantic Segmentation of Multispectral Images Using Deep Learning,

https://www.mathworks.com/help/releases/R2018a/images/multispectral-semantic-segmentation-using-deep-learning.html

 

 

Additional Resources

Professional Career Guides by TechGuide

Data Sources

Projects

Auto ML Tools

 

Open source Auto ML Tools:

         EvalMLhttps://lnkd.in/gtscitGk

         FLAML - https://lnkd.in/gqZCyFsk

         LightAutoMLhttps://lnkd.in/gar79u5c

         MLJAR - https://mljar.com

         PyCaret AutoMLhttps://lnkd.in/gTS6bztz

         TPOT - https://lnkd.in/gN2cFMPj

         H2O AutoMLhttps://lnkd.in/gSS-EZna

         AutoGluonhttps://lnkd.in/gwgVm5tm

         AutoKerashttps://lnkd.in/g3gCAnai

         Auto-PyTorch - https://lnkd.in/gDJjS_JB

         Auto-sklearnhttps://lnkd.in/gpAzQJcf

         Ivy - https://lets-unify.ai/ivy/

This notebook by Rohan Rao (Vopani) covers most of them: https://lnkd.in/g3Na3euy

Julia Language

 

Julia comes ready with Flux, a state-of-the-art framework for machine learning and AI.

 

From YouTube: Intro to Julia for data science

Octave

A MATLAB equivalent

 

Online version: https://octave-online.net/

 

Download: https://www.gnu.org/software/octave/download

 

PROLOG

 

A symbolic AI language, no longer fashionable but I have taught full-semester courses from 1998-2007

 

Online: https://swish.swi-prolog.org/example/kb.pl

 

Online GPUs

 

Google GPUs on rent https://cloud.google.com/gpu/

 

Cloud TPUs / Tensor Flow Research Cloud https://www.tensorflow.org/tfrc/

 

Sources of online information and courses

(in no particular order)

How to Become a Data Scientist, https://techguide.org/careers/data-scientist/

How to Become a Data Analyst, https://techguide.org/careers/data-analyst/

Data Science Bootcamp Guide, https://techguide.org/bootcamps/data-science/

Coursera, https://www.coursera.org/

Datacamphttps://www.datacamp.com/

DeepLearning.Aihttps://www.deeplearning.ai/

eduCBAhttps://www.educba.com/

EdX, https://www.edx.org/

FAST http://www.fast.ai/

LinedIn Learninghttps://www.linkedin.com/learning/

Udemy, https://www.udemy.com/

Additional Books

 

100+ Free Data Science Bookshttps://www.theinsaneapp.com/2020/12/free-data-science-books-pdf.html

 

20 Free Online Books to Learn R and Data Science: https://cmdlinetips.com/2018/01/free-online-resources-books-to-learn-r-and-data-science/

 

From analyticsvidhya.com: 27 Amazing Data Science Books Every Data Scientist Should Read

 

From analyticsvidhya.com: 6 Open Source Data Science Projects to Make you Industry Ready!

 

From Knuggets.com: 60+ Free Books on Big Data, Data Science, Data Mining, Machine Learning, Python, R, and more

 

From DataScienceCentral.com: 50 Must-Read Free Books For Every Data Scientist in 2020

 

Research

Following sites have free access journal articles:

 

The 15 Most Popular Data Science and Machine Learning Articles on Analytics Vidhya in 2018

https://www.analyticsvidhya.com/blog/2018/12/most-popular-articles-analytics-vidhya-2018/ 

 

Journal of Machine Learning Research

 

Arxiv.org

 

Microsoft, https://academic.microsoft.com/home

 

Google Scholar,

 

Other journals and conferences on Neural Networks:

 

https://www.omicsonline.org/artificial-neural-network-journals-conferences-list.php

 

44 Original Data Science and Machine Learning Articles

 

Related tools

Databases

MongoDB, https://www.mongodb.com/ to deploy fully managed cloud database in minutes. (Try free)

Studio3TStudio 3T is the professional GUI, IDE & client for MongoDB available for Windows, Mac, and Linux. There is a free course there, called MongoDB 101 which covers the basics in just two hours. (Thanks to Magda Matylla for pointing out the above link)

GraphDB is an enterprise ready Semantic Graph Database, compliant with W3C Standards. Semantic graph databases (also called RDF triplestores) provide the core infrastructure for solutions where modelling agility, data integration, relationship exploration and cross-enterprise data publishing and consumption are important.

PostgreSQL is a powerful, open source object-relational database system

CouchDB (NoSQL): Apache CouchDB(TM) lets you access your data where you need it. The Couch Replication Protocol is implemented in a variety of projects and products that span every imaginable computing environment from globally distributed server-clusters, over mobile phones to web browsers.

Firebase (free cloud database by Google for use in apps). Firebase helps mobile and web app teams succeed.

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker.

Cassandra (big data storage). The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance.

 

Analytics

Microsoft Power BI for Data Analytics (Free), https://powerbi.microsoft.com/

Tableau, https://www.tableau.com/, changing the way you think about data

QLIK, https://www.qlik.com/us, Blaze trails daily - with the only end-to-end data management and analytics platform built to transform your entire business.

Alteryx With Alteryx Designer, the power to solve analytic and business challenges, big and small, is at your fingertips, no matter where you're working from.

inetSoft � open standards innovation � Analytics, Dashborads, and Reporting. InetSoft's analytics and reporting are designed and optimized for the cloud computing era where software and data are increasingly distributed between cloud-based and in-house applications. The cloud-first, small footprint architecture allows highly flexible options for embedding and rebranding regardless of InetSoft-hosting, self-hosting, hybrid-cloud, or on-premise deployment. The small technical footprint directly leads to cost savings in both software and computing resources.

 

Miscellaneous

Hadoop, https://hadoop.apache.org/, open-source software for reliable, scalable, distributed computing

Periscope data, https://www.periscopedata.com/ (Try free)

Snowflake enables every organization to be data-driven

Machine Learning Open Source Software

 

Miscellaneous Sites of interest

Text Processing: This course does not discuss text processing but that is another interesting area to explore.

The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing for English written in the Python programming language.

 

You can check your English grammar with grammica (Suggested by Manizhe Karimi): Grammar Check - English Grammar Checker | Grammica