Machine Learning
A Full-Semester Course (To be offered at JHU)
By Dr. Muhammad Ali Yousuf
Last
updated: 7/03/22
Objective |
The objective of this course is to
teach various machine learning algorithms with a few applications in mind, mostly
covered via end-of-course projects. |
|
General Links |
Subscribe to My weekly newspaper on Machine Learning and Data Science, (Updated every Monday, 10:00 AM EST) My Twitter list, Data Science (#Data, #DataScience, #DeepLearning, #MachineLearning,
#Analytics #ArtificialIntelligence) My Data Science Collection on
YouTube, https://tinyurl.com/DataScienceYouTube |
|
Book(s) |
Part I: The Fundamentals of
Machine Learning Chapter 1 The Machine Learning
Landscape Chapter 2 End-to-End Machine
Learning Project Chapter 3 Classification Chapter 4 Training Models Chapter 5 Support Vector Machines Chapter 6 Decision Trees Chapter 7 Ensemble Learning and
Random Forests Chapter 8 Dimensionality Reduction Part II: Neural Networks and Deep
Learning Chapter 9 Up and Running with
TensorFlow Chapter 10 Introduction to
Artificial Neural Networks Chapter 11 Training Deep Neural
Nets Chapter 12 Distributing TensorFlow
Across Devices and Servers Chapter 13 Convolutional Neural
Networks Chapter 14 Recurrent Neural
Networks Chapter 15 Autoencoders Chapter 16 Reinforcement Learning See the book page: http://shop.oreilly.com/product/0636920052289.do Python Machine Learning 1st
Edition by Wei-Ming Lee (Author) Chapter 1 Introduction to Machine
Learning Chapter 2 Extending Python Using
NumPy Chapter 3 Manipulating Tabular
Data Using Pandas Chapter 4 Data Visualization Using
matplotlib Chapter 5 Getting Started with
Scikit-learn for Machine Learning Chapter 6 Supervised
Learning-Linear Regression Chapter 7 Supervised
Learning-Classification Using Logistic Regression Chapter 8 Supervised
Learning-Classification Using Support Vector Machines Chapter 9 Supervised
Learning-Classification Using K-Nearest Neighbors (KNN) Chapter 10 Unsupervised
Learning-Clustering Using K-Means Chapter 11 Using Azure Machine
Learning Studio Chapter 12 Deploying Machine
Learning Models Download full book sample code https://www.wiley.com/en-us/Python+Machine+Learning-p-9781119545637 |
|
Core Material |
Supplementary Material |
|
Statistics (Review) |
A 7-hours course by Dr. Sarkar
on Statistics for Data Science |
|
Introduction |
Book (pdf): Introduction to Machine Learning What are Neural Nets and how do
they learn? What is Deep Learning?
(Comprehensive online book from MIT press) |
From YouTube: Machine Learning - Andrew
Ng, Stanford University [FULL COURSE] What is Machine Learning? (Online course at Coursera) Useful computational
resources Fundamentals of Machine
Learning, a whitepaper published by
Interactions Corporation. The resource defines machine learning, how it works
and what kind of impact it has on our daily lives. It also discusses the
future of machine learning, specifically anticipated challenges and the
increasing importance of a human element in this intelligence. [Thanks to
Billy Adams from Interactions.com for sharing this resource.] Data Science Central, https://www.datasciencecentral.com/ |
Python |
First you need a Python compiler/IDE. I�d recommend using Anaconda as it
comes with all the major libraries you need for Data Science. The remaining
libraries can be installed easily. https://www.anaconda.com/
Jupyter -
Online Python compiler Google Colab - Another online python compiler PyCharm -
A great IDE for Python Then learn python, http://learnpython.org/. This is not the only resource and you can find many
other online courses. SciPy (pronounced Sigh Pie) is a Python-based ecosystem of
open-source software for mathematics, science, and engineering. It includes
NumPy, Pandas, and Matplotlib NumPy is the fundamental package for scientific computing
with Python. Pandas - is a fast, powerful, flexible
and easy to use open source data analysis and manipulation tool, built on top of the Python
programming language. Matplotlib is a
comprehensive library for creating static, animated, and interactive
visualizations in Python. SciKit-learn - Machine Learning in Python TensorFlow - An open source machine learning framework for
everyone PyCaret
- An open source low-code machine learning
library. |
From YouTube: Python Tutorial for
Beginners - Full Course in 11 Hours From YouTube: Python for Data Science | Data
Science with Python | Python for Data Analysis | 11 Hours Full Course Following webpage (shared by Kate
Chapman / Aimee O'Driscoll) lists six courses which teach Python in the
context of ethical hacking and are worth checking: https://www.comparitech.com/blog/information-security/hacking-python-courses-online/. The website covers many other topics including SQL. There are many libraries for
Machine Learning. Here is a short list (see https://www.geeksforgeeks.org/best-python-libraries-for-machine-learning/ for details): PyTorch, is an open source machine learning framework that
accelerates the path from research prototyping to production deployment. Keras is a high-level
neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. CNTK - Microsoft Cognitive Toolkit,
previously known as CNTK and sometimes styled as The Microsoft Cognitive
Toolkit, is a deep learning framework developed by Microsoft Research. Theano - Theano is a Python library that
allows you to define, optimize, and evaluate mathematical expressions
involving multi-dimensional arrays efficiently. Chainer from
Japan - A Powerful, Flexible, and Intuitive Framework for Neural Networks |
R Programming |
The R Project for Statistical
Computing, https://www.r-project.org/ R Studio Cloud (No download
necessary) R Studio, https://www.rstudio.com/ |
From YouTube: Introduction to Data
Science with R - Data Analysis by David Langer, Part 1: https://www.youtube.com/watch?v=32o0DnuRjfg Part 2: https://www.youtube.com/watch?v=u6sahb7Hmog Free book: R
for Data Science by
Garrett Grolemund and Hadley Wickham |
SQL |
SQL Server 2017
Express edition, https://www.microsoft.com/en-us/sql-server/sql-server-editions-express |
From YouTube: SQL Tutorial - Full
Database Course for Beginners https://cloud.google.com/sql/ (Try it Free) https://free.caspio.com/ (Free) https://www.freesqldatabase.com/ (Free) Introduction to SQL |
MATLAB |
If you are part of
JHU, you can get MATLAB free. If not, there is a
free MATLAB Trial for Data Science, (It includes MATLAB and a full set of
products for data science: Global Optimization Toolbox, Parallel Computing
Toolbox, Curve Fitting Toolbox, Deep Learning Toolbox, Statistics and Machine
Learning Toolbox, Optimization Toolbox, Database Toolbox, Text Analytics
Toolbox and Symbolic Math Toolbox.) https://www.mathworks.com/campaigns/products/trials/targeted/dan.html MATLAB for Deep
Learning, https://www.mathworks.com/solutions/deep-learning.html?s_tid=hp_brand_deeplearning Data Analytics in
MATLAB https://www.mathworks.com/products/data-analytics-whats-new.html and https://www.mathworks.com/videos/data-analytics-with-matlab-99066.html 8 MATLAB Cheat Sheets for
Data Science, https://www.mathworks.com/campaigns/offers/data-science-cheat-sheets.confirmation.html?elqsid=1559494336562&potential_use=Education |
Getting Started courses called Onramps: These run entirely on
the browser - no downloads needed - and allow students to earn certificates
to share on their LinkedIn (for example), Machine Learning - https://www.mathworks.com/learn/tutorials/machine-learning-onramp.html Deep Learning - https://www.mathworks.com/learn/tutorials/deep-learning-onramp.html Reinforcement Learning - https://www.mathworks.com/learn/tutorials/reinforcement-learning-onramp.html Video series that provide
high-level overviews on these topics: https://www.mathworks.com/videos/series/introduction-to-machine-learning.html (4 part video series on
Machine Learning) https://www.mathworks.com/videos/series/applied-machine-learning.html (more practical perspectives) https://www.mathworks.com/videos/series/deep-learning-with-MATLAB.html#tutorials (video series focused on Deep Learning) https://www.mathworks.com/videos/series/deep-neural-networks.html (video series) https://www.mathworks.com/videos/series/deep-learning-for-engineers.html (more practical perspectives) Specific Examples /
Datasets: Data Sets for Deep Learning,
including images: https://www.mathworks.com/help/deeplearning/ug/data-sets-for-deep-learning.html Brain MRI Age classification interesting: Semantic Segmentation
of Multispectral Images Using Deep Learning, |
Additional Resources
Professional Career
Guides by TechGuide |
|
Data Sources |
|
Projects |
|
Auto
ML Tools |
Open source Auto ML Tools: �
EvalML - https://lnkd.in/gtscitGk �
FLAML - https://lnkd.in/gqZCyFsk �
LightAutoML - https://lnkd.in/gar79u5c �
MLJAR - https://mljar.com �
PyCaret AutoML - https://lnkd.in/gTS6bztz �
TPOT - https://lnkd.in/gN2cFMPj �
H2O AutoML - https://lnkd.in/gSS-EZna �
AutoGluon - https://lnkd.in/gwgVm5tm �
AutoKeras - https://lnkd.in/g3gCAnai �
Auto-PyTorch - https://lnkd.in/gDJjS_JB �
Auto-sklearn - https://lnkd.in/gpAzQJcf �
Ivy - https://lets-unify.ai/ivy/
This notebook by Rohan Rao (Vopani) covers
most of them: https://lnkd.in/g3Na3euy |
Julia Language |
Julia comes ready
with Flux, a state-of-the-art framework for machine learning and AI. From YouTube: Intro to Julia for data science |
Octave |
A MATLAB equivalent Online version: https://octave-online.net/ Download: https://www.gnu.org/software/octave/download |
PROLOG |
A symbolic AI language, no longer
fashionable but I have taught full-semester courses from 1998-2007 Online: https://swish.swi-prolog.org/example/kb.pl |
Online GPUs |
Google GPUs on
rent https://cloud.google.com/gpu/ Cloud TPUs / Tensor
Flow Research Cloud https://www.tensorflow.org/tfrc/ |
Sources of online information and
courses |
(in no particular order) How to Become a Data Scientist, https://techguide.org/careers/data-scientist/ How to Become a Data Analyst, https://techguide.org/careers/data-analyst/ Data Science Bootcamp Guide, https://techguide.org/bootcamps/data-science/ Coursera, https://www.coursera.org/ Datacamp, https://www.datacamp.com/ DeepLearning.Ai, https://www.deeplearning.ai/ eduCBA, https://www.educba.com/ EdX, https://www.edx.org/ FAST http://www.fast.ai/ LinedIn
Learning, https://www.linkedin.com/learning/ Udemy, https://www.udemy.com/ |
Additional Books |
100+
Free Data Science Books: https://www.theinsaneapp.com/2020/12/free-data-science-books-pdf.html 20 Free Online Books
to Learn R and Data Science: https://cmdlinetips.com/2018/01/free-online-resources-books-to-learn-r-and-data-science/ From analyticsvidhya.com: 27 Amazing Data Science Books Every Data
Scientist Should Read From
analyticsvidhya.com: 6 Open Source Data Science Projects to Make you
Industry Ready! From
Knuggets.com: 60+ Free Books on Big Data, Data Science, Data
Mining, Machine Learning, Python, R, and more From
DataScienceCentral.com: 50 Must-Read Free Books For Every Data
Scientist in 2020 |
Research |
Following sites have free access
journal articles: The 15 Most Popular Data
Science and Machine Learning Articles on Analytics Vidhya in 2018 https://www.analyticsvidhya.com/blog/2018/12/most-popular-articles-analytics-vidhya-2018/ Journal of Machine Learning Research Microsoft, https://academic.microsoft.com/home Google Scholar, Other journals and conferences on
Neural Networks: https://www.omicsonline.org/artificial-neural-network-journals-conferences-list.php 44 Original Data Science and Machine Learning
Articles |
Related tools |
Databases MongoDB, https://www.mongodb.com/ to deploy fully managed cloud database in minutes.
(Try free) Studio3T, Studio
3T is the professional GUI, IDE & client for MongoDB available for
Windows, Mac, and Linux. There is a free course there, called MongoDB 101
which covers the basics in just two hours. (Thanks to Magda Matylla for
pointing out the above link) GraphDB is an enterprise ready Semantic Graph Database,
compliant with W3C Standards. Semantic graph databases (also called RDF triplestores) provide the core infrastructure for
solutions where modelling agility, data integration, relationship exploration
and cross-enterprise data publishing and consumption are important. PostgreSQL is a powerful, open source object-relational
database system CouchDB (NoSQL): Apache CouchDB(TM)
lets you access your data where you need it. The Couch Replication Protocol
is implemented in a variety of projects and products that span every
imaginable computing environment from globally distributed server-clusters,
over mobile phones to web browsers. Firebase (free cloud database by Google for use in apps).
Firebase helps mobile and web app teams succeed. Redis is
an open source (BSD licensed), in-memory data structure store, used as a
database, cache and message broker. Cassandra (big data
storage). The Apache Cassandra database is the right choice when you need
scalability and high availability without compromising performance. Analytics Microsoft Power BI for Data Analytics (Free), https://powerbi.microsoft.com/ Tableau, https://www.tableau.com/, changing the way you think about data QLIK, https://www.qlik.com/us, Blaze trails daily - with the only end-to-end data
management and analytics platform built to transform your entire business. Alteryx With Alteryx Designer, the power to solve analytic
and business challenges, big and small, is at your fingertips, no matter
where you're working from. inetSoft
� open standards innovation � Analytics, Dashborads,
and Reporting. InetSoft's analytics and reporting
are designed and optimized for the cloud computing era where software and
data are increasingly distributed between cloud-based and in-house
applications. The cloud-first, small footprint architecture allows highly flexible
options for embedding and rebranding regardless of InetSoft-hosting,
self-hosting, hybrid-cloud, or on-premise deployment. The small technical
footprint directly leads to cost savings in both software and computing
resources. Miscellaneous Hadoop, https://hadoop.apache.org/, open-source software for reliable, scalable, distributed
computing Periscope data, https://www.periscopedata.com/ (Try
free) Snowflake enables every organization to be data-driven Machine Learning Open Source Software |
Miscellaneous Sites of interest |
Text
Processing: This course does not discuss text processing but that is another
interesting area to explore. The
Natural Language Toolkit, or more commonly NLTK,
is a suite of libraries and programs for symbolic and statistical natural
language processing for English written in the Python programming language. You
can check your English grammar with grammica (Suggested
by Manizhe Karimi): Grammar
Check - English Grammar Checker | Grammica |