DS231: Final Review Summary

Module 2: Wrapping Your Head Around DS

Introduction

In its truest form:
  • Data Science represents the optmization of processes and resources.

Seeing Who Can Make Use of Data Science

Data Science
  • The computational science of extracting meaningful insights from raw data and effectively communicating those insights to generate value
Data Engineering
  • The domain dedicated to building and maintaining systems that overcome data processing bottlenecks and data handling problems
Three Data Varieties:
  • Structured: Data stored, processed, and manipulated in a traditional relational database management system (RDBMS). Example: MySQL database.
  • Unstructured: Data generated from human activities and doesn't fit into structured database format. Example: Email documents.
  • Semistructured: Data that doesn't fit into structured database system, but is nonetheless organizable by tags that create a form of order and hierarchy. Example: XML and JSON files.
Who can use Data Science?
You, your organization, employer, and anyone!

Inspecting the Pieces of the Data Science Puzzle

To practice data science, you need:
  • Analytical know-how of math & statistics
  • Coding skills
  • Subject Matter Expertise
The Key Components that are part of any data science role:
  1. Collecting, querying, and consuming data.
  2. Applying mathematical modeling to data science tasks
  3. Deriving insights from statistical methods
  4. Coding, coding, coding!
  5. Applying data science to a subject area
  6. Communicating data insights
The Universally accepted file formats:
  • Comma-separated values (CSV)
  • Script: .ply, .ipynp, .r
  • Application: Excel
  • Web programming: D3.js, JavaScript
The main point of distinction between data science and statistics is the need for subject matter expertise

Exploring Career Alternatives That Involve Data Science

The data implementer:
  • The main task is to build data and AI solutions
  • High attention to details
  • Starts a project with a request and messy data
  • math and coding are his powers.
The data leader:
  • Lead teams and project stakeholders through building data solutions
  • Love data science for the incredible outcomes it makes possible.
  • Have a deep passion for using their DS and leadership skills
  • Love to collaborate with smart people in the organization
The data entrepreneur:
  • Builds up business by delivering services and products.
  • Craves the creative freedom as a founder.
  • Creates a vision and use data science expertise to turn it into reality.

Module 3: Tapping Into Critical Aspects of Data Engineering

Defining Big Data and the Three Vs

Hadoop
  • A data processing platform designed to reduce big data into smaller datasets.
  • It provides batch-processing and storing large volumes of data.
Three Characteristics of Big Data (The Three Vs)
  • Volume
  • Velocity
  • Variety

Identifying Important Data Sources

Some Sources of Big Data:
  • Enpterprise Data
  • Public Data
  • Sensor Data
  • Social Media
  • Financial Transactions
  • Health Records
  • Click-streams
  • Log files
  • Internet of Things

Grasping the Differences among Data Approaches

Defining Data Science:
  • The Scientific domain dedicated to knowledge discovery via data analysis.
Defining Machine Learning Engineering:
  • The practice of applying algorithms to learn from data and make automated predictions
  • A machine learning engineer is a software engineer
Defining Data Engineering:
  • Build and maintain data systems for overcoming data processing bottlenecks
  • Experience in working with real-time processing frameworks and massively parallel processing platforms
  • Know how to deploy Hadoop, MapReduce, or Spark
To summarize, hire:
  1. Data engineer to store, migrate, and process data.
  2. Data scientist to make sense of it
  3. Machine learning engineer to bring your machine learning models into production.

Storing and Processing Data for Data Science

Advantages of Storing Data in the Cloud:
  • Faster time-to-market
  • Enhanced flexibility
  • Security
Serverless Computing:
  • Computing that is executed in a cloud environment
  • Decreases downtime
Kubernetes:
  • An open-source software suite that manages deployment of containerized applications.
  • Can be run on data on-premise clusters, in the cloud, or in a hybrid cloud environment
Popular Cloud-Warehouse Solutions:
  • Aamazon Redshift
  • Snowflake
  • Google BigQuery
NoSQL Database:
  • Nonrelational, distributed database system
  • Can run on-premise, or in the cloud
  • Can work with Semistructured and Structured data
On-Premis Hadoop Storage Environment Includes:
  • HDFS (data storage, uses commodity hardware)
  • MapReduce (bulk data processing)
  • Spark (real-time processing)
  • YARN (resource management)
Massively Parallel Processing MPP Platforms:
  • Runs parallel computing tasks on costly custom hardware

Module 4: Machine Learning Means

Defining Machine Learning and Its Processes

Use Cases of Machine Learning:
  • Real-time internet advertising
  • Spam filtering
  • Recommendation engines
  • Natrual language processing
  • Sentiment analysis
  • Auto facial recognition
Three Main Steps in Machine Learning:
  • Setup
  • Learning
  • Application
Setup
  • Acquiring data, preprocessing it
  • Selecting variables
  • Breaking data into training and test datasets
Learning
  • Model experimentation, training, building, and testing.
Application
  • Model deployment and prediction.
Rule of thump for breaking data into test and training data:
Apply random sampling to two-thirds of the data to train the model. Use the remaining one-third as test data.

Machine Learning Terms

Instance:
Row, observation, data point, or instance case.
Feature:
Column, field, variable, or independent variable.
Target Variable:
Prediction, or dependent variable

Considering Learning Styles

Machine Learning has Three Styles:
  • Supervised
  • Unsupervised
  • Semisupervised
Supervised Learning
  • Require input data to have labeled features.
  • Use supervised learning when you have a labeled dataset of historical values to predict future events
  • Example: Logistic Regression.
Unsupervised Learning
  • Accepts unlabeled data.
  • Attempts to group observations into categories based on similarities.
  • Examples: K-means clustering, and Singular value decomposition.
Reinforcement Learning
  • Is a behavior-based learning model
  • Based on a mechanic like how humans and animals learn
  • The model is given rewards based on how it behaves
  • It learns to maximize the sum of its rewards.
Examples of Deep Learning Algorithms:
  • Gmail's Smart Reply
  • Facebook's DeepFace
Apache Spark:
  • In-memory distributed computing application.

Module 5: Probability, and Statistical Modeling

Exploring Probability and Inferential Statistics

A Statistic is:
  • A result derived from performing mathematical operations on numerical data.
Descriptive Statistics:
  • Provide a description that illuminates some characteristics of a numerical dataset, including:
    • Dataset distribution
    • Central tendency (e.g. max, min, or mean)
    • Dispersion (as in standard deviation and variance)
  • They highlight the relationship between X and Y, but do not assume that X causes Y.
Inferential Statistics:
  • Carve out a smaller section of the dataset and deduce significant information about the larger dataset.
  • They try to predict by studying causation
  • Example: Regression analysis
Descriptive Statistics describe the characteristics of a numerical dataset. But they don't tell you why you should care
When to use descriptive statistics:
  • To detect outliers
  • To plan for feature preprocessing requirements
  • To Identify which features to use in an analysis

Probability Distributions

Use random variables and probability distributions to model unpredictable variables.
Probability has Two Important Characteristics:
  1. The probability of a single event never goes below 0.0 or exceeds 1.1
  2. The probability of all events always sums to exactly 1.0
Probability distribution is classified as per these two types:
  1. Discrete: A random variable where values can be counted by groupings. Colors of a car
  2. Continuous: A random variable that assigns probabilities to a range of values. Miles per galon
Normal Distributions: (numeric continuous)
  • Represented graphically by a symmetric bell-shaped curve
  • Most likely observation at the top of the bell
  • Observations at the two extremes are less likely
Binomial Distributions: (numeric discrete)
  • Model the number of successes that can occur in a certain number of attempts when only two outcomes are possible.
  • Variables that assume only two values have a binomial distribution.
Categorical Distributions: (non-numeric)
  • Represent non-numeric categorical variables or ordinal variables.
  • Example: level of service offered by airlines (first, business, and economy class)
Conditional Probability with Naive Bayes
  • It's a machine learning method
  • Borrowed from the statistics field
  • Predicts the likelihood that an event will occur based on evidence in the data features.
  • Based on classification and regression.
  • Useful to classify text data.
  • Example: Spam classification

Quantifying Correlation

Correlation is quantified per the value of a variable called r, which ranges between -1 and 1.
The closer the r-value is to 1 or -1, the more correlation there is between the two variables.
If the two variables have an r-value that's close to 0, it could indicate that they're independent variables.
Calculating Correlations with Pearson's r
  • If you want to uncover dependent relationships between continuous variables, use statistics to estimate their correlation
  • The simplest form of correlation analysis is the Pearson Correlation, which assumes that:
    • Your data is normally distributed
    • You have continuous, numeric variables
    • Your variables are linearly distributed
Spearman's Rank Correlation
  • Determines correlation between ordinal variables
  • Its application converts numeric variable pairs into ranks by calculating the strength of the relationship between variables.
It assumes that:
  • Your variables are ordinal
  • Your variables are related nonlinearly

Reducing Data Dimensionality with Linear Algebra

Array and matrix objects are the primary data structure in analytical computing
Decomposing data to reduce Dimensionality
  • Using a linear algebra method called Singular Value Decomposition (SVD), you can reduce the dimensionality of a dataset.
  • Dimension reduction algorithms are ideal options if you need to compress your dataset while also removing redundant information and noise
  • SVD is applied to analyze principal components from large, noisy data sets.
  • This uses a machine learning approach called Principal Component Analysis (PCA)
Reducing Dimensionality with Factor Analysis
  • Factor Analysis is used for filtering out redundant information and noise from your data.
  • Shared variance means information redundancy is at play.
  • We use factor analysis or principal component analysis to clear data of information redundancy.
Factor Analysis Makes the Following Assumptions:
  • Your features are metric.
  • Your features should be continuous or ordinal.
  • You have more than 100 observations in your dataset and at least 5 observations per feature.
  • Your sample is homogenous.
  • There is r > 0.3 correlation between the features in your dataset.
Decreasing Dimensionality and Removing Outliers with PCA:
  • Principal Component Analysis (PCA) is another dimensionality reduction technique like SVD.
  • An unsupervised statistical method.
  • Finds relationships between features in a dataset.
  • Reduces those features to a set of non-information redundant principal components.
Differences between PCA and Factor Analysis:
  • PCA does not regress to find cause of shared variance.
  • In PCA, the number of components to be discovered in the dataset is not specified on the first run.

Modeling Decisions with Multiple Criteria Decision-Making

To use MCDM, the following two assumptions must be satisfied:
  • Multiple criteria evaluation: You must have more than one criterion to optimize.
  • Zero-sum system: Optimizing one criterion must come at the sacrifice of at least another criterion.
Fuzzy MCDM:
  • Evaluates suitability within a range, not using binary terms of 0 or 1.
  • The term fuzzy means the criteria offer a range of acceptability, unlike binary criteria in traditional MCDM.

Introducing Regression Methods

Use regression methods if you want to determine the strength of correlation between variables in your data.
Regression methods assume a cause-and-effect relationship between variables.
Linear Regression:
  • Used to describe and quantify the relationship between a target variable Y, and chosen features of the dataset.
  • If only one feature is used, linear regression becomes as simple as middle school algebra formula: y = mx + b
Linear Regression Limitations:
  • Works only with numerical variables.
  • Missing values cause problems.
  • Outliers cause inaccurate results.
  • Assumes a linear relationship exists between features and the target variable.
  • Assumes all features are independent.
  • Prediction errors should be normally distributed.
Logistic Regression:
  • It's a machine learning method.
  • Used to estimate values for a categorical target varaible.
  • Target variable should be numeric and should contain values that describe the target's class.
  • It predicts the class of observation, and indicates the probability of each estimate.
Ordinary Least Squares (OLS) Regression Method:
  • A statistical method that fits a linear regression line to a dataset.
  • Used to construct a close approximation function to your data.
  • Fits a regression line to models with more than one independent variable.

Detecting Outliers

Outliers:
  • Data points with values significantly different from the majority of the remaining data points.
  • Used to spot anomalies, that represent fraud, equipment failure, or cybersecurity attacks.
  • Can be detected using a univariate or a mutlivariate approach.
Univariate Analysis:
  • Inspecting features individually for anomalous values.
  • Choose from two simple methods:
    • Tukey outlier labeling
    • Tukey boxplotting
Multivariate Analysis:
  • Inspecting combinations of data points from disparate variables.
  • Considers two or more variables at a time and inspects them together.
  • Use one of the following methods:
    • A scatter-plot matrix
    • Boxplotting
    • Density-based spatial clustering (DBScan)
    • Principal Component Analysis

Introducing Time Series Analysis

  • A collection of data on attribute values over time.
  • Predicts future instances of the measure based on past observational data.
  • Used to forecast future values from data in your dataset.

Module 6: Part 1 | Ten Phenomenal Resources

Introduction

Open Data:
  • Data that has been made publicly available and is permitted to be used, reused, built on, and shared with others.
  • Part of the Open Movement
  • Have copyleft instead of copyright

Exploring Data Worldwide

1. Data.gov
  • URL: www.data.gov
  • 100,000 datasets
  • Indicators:
    • Economics
    • Environmental
    • STEM industry
    • Quality of life
    • Legal
  • 60 open-source APIs
2. Canada Open Data
  • URL: open.canada.ca
  • 200,000 datasets
  • Indicators
    • Environmental
    • Citizenship
    • Quality of Life
3. Data.gov.uk
  • URL: data.gov.uk
  • 20,000 datasets
  • Indicators:
    • Environmental
    • Government Spending
    • Societal
    • Health
    • Education
    • Business & Economics
4. US Census Bereau Data
  • URL: www.census.gov
  • Useful for marketing or advertising research
  • Provides the following demo-graphics data:
    • Age
    • Average annual income
    • Household size
    • Gender of race
    • Level of education
5. NASA Data
  • URL: data.nasa.gov
  • Generates 4 terabytes of new earth-science data per day
  • Indicators:
    • Astronomy and space
    • Climate
    • Life sciences
    • Geology
    • Engineering
6. World Bank Data
  • URL: data.worldbank.org
  • The World Bank is an international financial institution that provides loans to developing countries.
  • Indicators:
    • Agriculture & rural development
    • Economy & growth
    • Environment
    • Science and Technology
    • Financial sector
    • Poverty income
7. Knoema Data
  • URL: knoema.com
  • 500+ datasets
  • 150 million time series
  • Indicators:
    • Government data from industrial nations
    • National public data from developing nations
    • United nations data
    • International organization data
    • Corporate data from global corporations
8. Quandl Data
  • URL: www.quandl.com
  • Toronto-based website
  • search engine for numeric data
  • 10 million datasets
  • Links 2.1 million UN datasets and other sources
  • Indicators:
    • Open financial data project
    • Central banks
    • Real estate organizations
    • Well-known think tanks.
9. Exversion Data
  • Provides collaborative functionality like GitHub
  • All uploaded data is public
  • Extremely useful in data-cleanup stage
10. OpenStreetMap Spatial Data OSM
  • URL: openstreetmap.org
  • Crowd-sourced alternative to commercial mapping products

Discovering Saudi Open Data Portal

Saudi Open Data Portal:
  • URL: https://data.gov.sa
  • 6,544 datasets
  • Total publishers: 147
  • Aims to implement a public data hub and a strategy to enable transparency and inspire innovation
  • Its primary role is to publish datasets from ministries and government agencies in open format and make it available to the public.
  • Helps bridge the gap between governments and citizens
  • Public benefits:
    • Better understanding of how government agencies work.
    • Evaluating the performance of administrative institutions.
    • Making informed decisions about government policies.
  • Allows using data for research

Module 6: Part 2 | Starting with Python

Why is Python Hot

Python has lots of capabilities in Machine Learning, robotics, AI, and data science
Reasons for Python's Popularity
  • Easy to learn
  • Free
  • Offers ready-made tools for data science, machine learning, AI, and robotics.

Choosing the Right Python

Go to python.org to download Python.

Tools for Success

  • You need an editor to type the code
  • You need an interpreter to run the code
Code Editor
  • An app to type code.
  • You still need the Python interpreter
Anaconda
  • A complete Python development environment
  • Has intuitive and easy graphic user interface
  • Works on Mac, Windows, or any phone or tablet
  • Often referred to as a data science platform
Installing Anaconda
  • Go to www.anaconda.com/download
  • Click download under the latest version
  • Follow instructions to download free version
  • Installation: select options that make sense to you
  • Install VS Code if asked
Using Anaconda Navigator
  • Lets you navigate different features and choose what to run

Writing Python in VS Code

To use VS Code with Python and Anaconda, you need extensions.
  • To verify, click the Extensions icon in the left pane.
  • You should see three extensions:
    • Anaconda extension pack
    • Python
    • YAML

Using Jupyter Notebook for Coding

Jupyter Notebook
  • A popular tool for writing Pyhon code
  • Supports three popular languages:
    • Julia
    • Python
    • R

Module 7: Interactive Mode, Getting Help, and Writing Apps

Using Python's Interactive Mode

Opening Terminal: To use Python interactively with Anaconda, follow these steps:
  • Open Anaconda Navigator, then VS Code from Anaconda home page.
  • Choose View => Terminal from VS Code menu bar
  • Highlight Terminal pane
  • You will see the following prompt:
    • PS C:\Users\xxx>
Getting your Python version
  • At the operating system command prompt, type the following:
    • python --version
Going into Python Interpreter
  • Enter the following command:
    • python
  • You should see information about Python version and the >>> prompt.
Using Python's built-in help
  • Use the following command:
    • help()
  • You will see help>
  • You are no longer in Python interpreter
  • You can enter the name of any module, keyword, or topic to get help
  • To get a list of Python keywords, type the following:
    • keywords
  • To exit interactive help:
    • Type the letter "q"
    • Or, press Ctrl+Z
    • You will get back to the >>> prompt
Searching for Specific Help Topics Online
  • www.youtube.com
  • stackoverflow.com

Creating a Python Development Workspace

VS Code uses the term workspace to define development environment
  • It includes Python interpreter + additional extensions
Steps to Create a Development Workspace in VS Code:
  1. Open VS Code from Anaconda
  2. Choose File => Save Workspace As, and select folder
  3. Type name of workspace and click Save
  4. Choose File => Preferences => Settings
  5. Click Open Settings (JSON) icon
  6. Copy the entire line that starts with python.pythonpath
  7. Click the Split Editor Right icon
  8. Choose View => Command Palette, type open and then select Preferences: Open Workspace Settings(JSON)
  9. Click between the set of curly braces and paste the lines of code there
  10. Choose File=>Save
  11. Close all settings and close Anaconda

Creating a Folder for Your Python Code

Steps to Associate Code Folder with VS Code Workspace:
  1. Open Anaconda and launch VS Code
  2. Choose File => Open Workspace
  3. Navigate to the folder where you saved your workspace
  4. Choose File => Add Folder to Workspace
  5. Navigate the folder for your Python code, and choose Add

Typing, Editing, and Debugging Python Code

Writing Python Code
  • You'll write most of the code in an editor
  • Each Python code file is a plain text file with a .py extension
  • To create a .py file:
    1. Open VS Code
    2. Click the Explorer icon
    3. Right-click a folder and choose New File
    4. Type the filename with .py extension and press Enter
Your First Python Code
  1. Click to the right of line 1 in the editing area
  2. Type the following:
    • print("Hello World")
  3. Press Enter
Saving Your Code
  • One way to save your code is to try to remember to save any time you make a change
  • The second method is to use AutoSave
  • To turn on autosave, choose File => Auto Save
Running Python in VS Code
  • Easiest way: right-click the file's name and choose Run Python File in Terminal
Learning Simple Debugging: Screen Indications of an Error
  • Name of folder and file will be red in the Explorer pane
  • Number of errors in file will appear in red next to the filename in the Explorer pane
  • Total number of errors will appear next to the circled X in the bottom left corner of VS Code window
  • Bad code will have a wavy red underline
Python is case-sensitive
VS Code has a built-in debugger

Writing Code in a Jupyter Notebook

Creating and Saving a Jupyter Notebook
  1. Open Anaconda and launch Jupyter Notebook
  2. Navigate to the Jupyter notebook folder
  3. Click New and choose Python 3
  4. Click Untitled at the top to rename the notebook
Typing and Running cod in a notebook
  1. Click in the Code cell to the right of In []: and type your code
  2. To run the code, hold down Alt and press Enter, or click the Run button in the toolbar
  3. The output appears below the cell
Adding Markdown text
  • You can add text, pictures, and videos to your notebook
  • No special coding needed to regular text
  • Markdown tags needed for formatting, or adding pictures and videos.
  • Markdown: is a popular markup language like HTML
  • Add an empty cell by choosing Insert => Insert Cell Below
  • Then add content to the new cell (regular text & Markup)
To run all cells in a notebook, use the double triangle icon in the toolbar.
Jupyter notebook files have the .ipynb extension.

Module 8: Python Elements and Syntax

The Zen of Python

Pyhton is based on the philosophy that a programming language should be geared toward how humans think, work, and communicate

Introducing Object Oriented Programming

Object-oriented programming:
  • A design philosophy that tries to mimic the real world by having objects with properties as well as methods.
Object
  • A software thing, not physical
Class
  • Object creator
Python is an object-oriented language
  • Consists of controls to control different objects
  • Must learn core language first before using other people's objects

Discovering Why Indentations Count, Big Time

Paython uses indentations rather than paranthesis and curly braces to indicate blocks of code
  • Indentations are not optional
  • They affect how the code runs

Using Python Modules

One of the secrets to Python's success is that it's comprised of a simple, and clean core language
Python Modules
  • Many modules are available for free
  • You can access the power of modules from the basic core language
  • Most modules are for a specific application, like science, AI, date and time
Understanding the Syntax for Importing Modules
  • To import a module in your Python program, use the following syntax:
    • import modulename [as alias]
    • Example:
    
    import random as rnd
                    
  • Anything in italic is called syntax chart, information you should supply in your own code.
  • The part between brackets [] is optional
Using an Alias with Modules
  • You can assign an alias to any module you import as a nickname
  • Then you can can use the nickname to refer to the module in your code:
    
    answer = rnd.randint(1,8)
                  

Module 9: Building Your First Python Application

Opening the Python App File

To Open a Python File, Follow These Steps:
  • Open Anaconda and launch VS Code
  • Choose File => Open Workspace and open your Python 3 workspace
  • Click the hello.py file you created before
  • Select all the text and delete it to start from scratch

Typing and Using Python Comments

Comment: A text in the program that does nothing
  • Explains code to team members
  • Developers use comments as notes to themselves
To be identified as a comment, do one of the following:
  • Start the text with a pound sign (#)
  • Enclose the text in triple quotation marks
  • Examples:
    
    #This is a single line comment
    """ This is 
        a multiline
        comment"""
                  

Understanding Python Data Types

  • Numbers are amounts, such as 10 or 123.45
  • Text consists of letter and words
  • Computer can do arithmetic operations with numbers, not with letters and words
Numbers
  • Must start with a number digit(0-9), a dot(.), or a hyphen(-)
  • A number can only have one decimal point
  • Does not contain letters, spaces, dollar sign, etc.,
  • Cannot contain a hyphen (-) in the middle
Types of Numbers
  • Integers: whole numbers, positive and negatives. No size limit
  • Floats: Floating-point numbers. Any number with a decimal point. No size limit
  • Complex Numbers End with the letter j for the imaginary part of the number
Words (Strings)
  • Names, addresses, and all kinds of text
  • Computers can't do arithmetic operations on them
  • Must be enclosed in single or double quotation marks
  • If the string contains a single quote, use double quotes, and the opposite is true
    
    firstString = "Ahmed's book is here"
    secondString = 'I "gave" the book to Ahmed'
                  
Booleans
  • Can either be True or Flase
  • In Python, True and False values are stored in variables
    
    x = True
    y = False
                
  • Not enclosed in quotation mark, and the first letter is capital

Working with Python Operators

Arithmetic Operators
  • For doing arithmetic addition, subtraction, multiplication, division, and more
Operator Description Example
+ Addition 1 + 1 = 2
- Subtraction 10 - 1 = 9
* Multiplication 3 * 5 = 15
/ Division 10 / 5 = 2
% Modulus (remainder) 11 % 5 = 1
** Exponent 3 ** 2 = 9
// Floor division 11 // 5 = 2
Comparison Operators
  • Help write code that makes decisions
  • Operator Meaning
    < Less than
    <= Less than or equal to
    > Greater than
    >= Greater than or equal to
    == Equal to
    != Not equal to
    is Object identity
    is not Negated object identity
Boolean Operators
  • Work with Boolean values.
  • Used to determine if one or more things are True or False
  • Operator Code Example What It Determines
    or x or y Either x or y is True
    and x and y Both x and y are True
    not not x x is not True

Creating and Using Variables

Creating valid variable names
  • A variable is a placeholder for information that may change
Rules for Writing Variables
  • Must start with a letter or an underscore
  • After the first character, you can use letters, numbers, or underscores
  • Variables are case sensitive
  • Can't contain double or single quotes
  • PEP8 style recommends using lowercase letters and underscores to separate multiple words
To create a variable, use the following
  • variablename = value
The Assignment Operator (=)
  • Assigns value to a variable
  • 
    x = 10 # Assigning the value 10 to variable x
                

Understanding What Syntax is and Why It Matters

Syntax is important in human languages and in programming languages
Syntax Error
  • Tells you that Python doesn't know what to do with the line of code
Individual Lines of Code
  • A line of code ends with line break or a semicolon
  • 
    #Using line breaks
    first_name = "Ahmed"
    last_name = "Harbi"
    
    #Using semicolon
    first_name = "Ahmed"; last_name = "Harbi"
                
  • The code will run the same in both cases

Module 10: Working with Numbers, Text, and Dates

Introduction

  • In the computer world, numbers can be added, subtracted, etc.,
  • In Python, there are whole numbers integers and numbers with decimal points floats
  • Words are stored as strings "A string or characters"
  • Python doesn't have a built-in data type for dates and times
  • A free module can be imported to use dates and times

Calculating Numbers with Functions

Python functions generally have the syntax:
  • variablename = functionname(param[,param])
  • The variable is defined to store what the function returns
  • Inside the parenthesis, you can pass one or more values (called parameters) to the function
  • Some functions accept two or more values
Some Built-in Python Functions for Numbers:
Built-in Function Purpose
abs(x) returns absolute value
bin(x) Returns a string with the value of x in binary
float(x) Converts a string or number to the float data type
format(x, y) Returns x formatted according to pattern in y
hex(x) Convert x to hexadecimal, prefixed with 0x
int(x) Converts x to integer by truncating the decimal portion
max(x, y, z,...) Return the largest number
min(x, y, z, ...) Return the smallest number
oct(x) Convert x to octal number, prefixed with 0o
round(x, y) Rounds x to y number of decimal places
str(x) Converts x from number to string
type(x) Returns a string indicating the data type of x

More Math Functions

If you need to use functions from the math module, import it near the top of the .py file.
Some Functions from the Python Math Module
Function Purpose
math.acos(x) arccosine x in radians
math.atan(x) arctangent of x in radians
math.atan2(y, x) Converts rectangular coordinates (x,y) to polar coordinates (r,theta)
math.ceil(x) ceiling of x, smallest integer greater than or equal x
math.cos(x) cosine x
math.degrees(x) Convert angle x from radians to degrees
math.e mathematical constant e
math.exp(x) e raised to the power x
math.factorial(x) factorial of x
math.floor(x) floor of x, largest integer less than or equal to x
math.isnan(x) True if x is not a number
math.log(x,y) logarithm of x to base y
math.log2(x) base-2 logarithm of x
math.pi The mathematical constant pi (3.14..)
math.pow(x,y) x raised to the power y
math.radians(x) convert angle x from degrees to radians
math.sin(x) sine of x
math.sqrt(x) square root of x
math.tan(x) tangent of x
math.tau() mathematical constant tau (6.283185...)

Formatting Numbers

Formatting with f-strings
  • f-string is the easiest way to format data in Python
  • 
    username = "Mohammed"
    print(f"Hello {username})
                
  • The output is: Hello Mohammed
  • The text (Hello) is called the literal part, and is displayed literally.
  • Anything in curly braces (username) is the expression part. A placeholder for what will appear when the codes runs
Showing Dollar Amounts
    
    quantity = 100
    unit_price = 14.997
    print(f"Subtotal: ${quantity * unit_price:,.2f}")
                
  • Output is: Subtotal: $1,499.70
  • the comma tells Python to show commas in thousands places
  • .2f means two decimal places
Formatting Percent Numbers
    
    sales_tax_rate = 0.065
    print(f"Sales Tax Rate {sales_tax_rate:.2%}")
                
  • Output is: Sales Tax Rate 6.50%
Making Multiline Format Settings
  • Use /n where you want a line break
  • 
    user1 = "Ahmed"
    user2 = "Mohammed"
    user3 = "Hussam"
    output = f"{user1}\n{user2}\n{user3}"
    print(output)
                
  • The output is:
    Ahmed
    Mohammed
    Hussam
  • Use triple quotation marks also:
  • 
    user1 = "Ahmed"
    user2 = "Mohammed"
    user3 = "Hussam"
    output = f"""{user1} 
    {user2} 
    {user3}"""
    print(output)
                
  • Same output
Formatting Width and Alignment
  • > for right aligned
  • < for left aligned
  • ^ for centered
  • 
    subtotal = 1598.40
    sales_tax = 103.90
    total = 1702.30
    print(f"""
          Subtotal:   ${subtotal:>9,.2f}
          Sales Tax:  ${sales_tax:>9,.2f}
          Total:      ${total:>9,.2f}
          """)
                
  • The output is:
    Subtotal:   $  1,598.40
    Sales Tax:  $    103.90
    Total:      $  1,702.30
                  

Grappling with Weirder Numbers

Binary, octal, and hexadecimal Numbers
  • System Also Called Digits Used Symbol Function
    Base 2 Binary 0,1 0b bin()
    Base 8 Octal 0,1,2,3,4,5,6,7 0o oct()
    Base 16 Hexadecimal 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F 0x hex()
  • 
    x = 255
    print(bin(x))
    print(0b11111111)
                      
  • Output is:
    0b11111111
    255
                        

Module 11: Working With Numbers, Text, and Dates (Part II)

Manipulating Strings

Concatenating Strings
  • You can join strings by a + sign.
  • This process is called string concatenation
  • Python doesn't put spaces, put them yourself
  • Space is represented by quotation marks with one space between them " "
  • 
    print("My" + " " + "name" + " " + "is" + " " + "Mohammed")
                  
  • Output is:
    My Name is Mohammed
                  
Getting the length of a string
  • Use the built-in len() function
  • 
    s1 = ""
    s2 = " "
    s3 = "A B C"
    print(len(s1))
    print(len(s2))
    print(len(s3))
                  
  • Output is:
    0
    1
    5
                  
Common String Operators
    Operator Purpose
    x in s True if x exists in string s
    x not in s True if x is not in string s
    s * n (or n * s) Repeats s n times
    s[i] Returns the ith item of string s
    s[i:j] A slice from string s from position i until j
    s[i:j:k] A slice of s from i to j with step k
    min(s) Returns the smallest character in s
    max(s) Returns that largest character in s
    s.index(x,[i,j]) The numeric position of the first occurrence of x in string s. i and j are optional limits to the search from i to j
    s.count(x) The number of times x appears in s
Built-in Methods for Python 3 Strings
Method Purpose
s.capitalize() capitalize the first letter and the rest lowercase
s.count(x, [y, z]) The number of times x appears in s from position y to z
s.find(x, y, z) Returns a number indicating the first position where x can be found in s from position y to z
s.isalpha() True if s at least has one character and contains only letters
s.isdecimal() True if s is at least one character long and contains only numeric characters
s.islower() True if all letters are lowercase
s.isnumeric() True if s is at least one character long and contains only numeric characters
s.isprintable() True if string s contains only printable characters
s.istitle() True if s contains letters and the first letter of each word is uppercase followed by lowercase
s.isupper() True if all letters are uppercase
s.lstrip() Returns s without leading space
s.replace(x, y) Returns a copy of string s with all x characters replaced with y
s.rfind(x, y, z) Searches backward from end of string, or from z to y. returns -1 if substring not found
s.rindex() Same as s.rfind() but returns an error if substring not found
s.rstrip() removes tailing spaces
s.swapcase() Converts uppercase to lowercase and vice versa.
s.strip() Remove leading and trailing spaces
s.title() Return s string with the first letter of each word in uppercase and other letters in lowercase
s.upper() Returns string with all letters in uppercase

Uncovering Dates and Times

  • We use timestamps to record exactly when a user did something
  • Python doesn't have built-in data types for date and time
  • You need to import the datetime module
datetime Classes
  • datetime.date: A date consisting of month, day, and year
  • datetime.time: A time consisting of hour, minute, second, microsecond, and optional time zone.
  • datetime.datetime: A single item consisting of date, time, and optional time zone
Working with dates
  • Use today() method to get the date from the computer's internal clock
  • You can specify a year, month, and day inside the date() method like this day(2024,12,22)
  • Don't use a leading zero. 2024,09,07 is wrong
  • Use .month, .day, or .year to isolate parts of the date
  • 
    import datetime as dt
    today = dt.date.today()
    test_day = dt.date(2024, 9, 7)
    
    print(today)
    print(test_day)
    print(test_day.month)
    print(test_day.day)
    print(test_day.year)
                
  • The output is:
    2024-12-22
    2024-09-07
    9
    7
    2024
                  
Formatting Strings for Dates and Times
Directive Description Example Output
%a Weekday, abbreviated Sun
%A Weekday, full Sunday
%w Weekday number 0-6, 0 is Sunday 0
%d Number of day of the month 01-31 31
%b Month name abbreviated Jan
%B Month name full January
%m Month number 0-12 01
%y Year without century 24
%Y Year with century 2024
%H Hour 00-23 22
%I Hour 00-12 11
%p AM/PM PM
%M Minute 00-59 03
%S Second 00-59 45
%f Microsecond 000000-999999 236478
%z UTC offset -0500
%Z Time zone EST
%j Day number of year 001-366 300
%U Week number of year, Sunday first day of week, 00-53 50
%W Week number of year, Monday first day of week, 00-53 50
%c Local version of date and time Tue Dec 21 23:59:45 2024
%x Local version of date 12/22/2024
%X Local version of time 22:16:59
%% A % character %
Sample Date Format Strings
Format string Example
%a, %b %d %Y Sun, Dec 22 2024
%m-%d-%y 12-22-24
This %A %B %d This Sunday December 22
%A %B %d is day number %j of %Y Saturday June 01 is day number 152 of 2019
Working with Times
  • To work strictly with time data, use datetime.time class
  • 
    variablename = datetime.time(hour, minute, second, microsecond)
                
Format String Example
%I:%M %p 11:59 PM
%H:%M%S and %f microseconds 23:59:59 and 129384 microseconds
%X 23:59:59
Using Both Date and Time
  • To pinpoint a moment in time using both date and time, use the datetime.datetime class
  • It supports a now() method that grabs the current date and time from the computer's clock.
  • 
    import datatime as dt
    print(dt.datetime.now())
                
  • The output is:
    2024-12-22 10:00:40.800331
                  
Examples for Datetime Strings
Format String Example
%A, %B %d at %I:%M%p Tuesday, December 31 at 11:45PM
%m/%d/%y at %H:%M 12/22/24 at 22:36
%I:%M %p on %b %d 11:59 PM on Dec 31
%I:%M %p on %m/%d/%y 1:59 PM on 12/31/19
Calculating Timespans
  • Durations is called timespane in the computer world
  • For timespans, use the datetime.timedelta class
  • A timedelta object is created automatically whenever you subtract two dates, times, or datetimes from each other

Accounting for Time Zones

Python has Two Types of Datetimes
  • Naive datetime: Any datetime that does not include information about a specific time zone
  • Aware datetime: A datetime that includes time zone information

Working with Time Zones

To tell the difference between your time and UTC time:
  • Subtract your time .now() from .utcnow()
  • 
    import datetime as dt
    here_now = dt.datetime.now()
    utc_now = dt.datetime.utcnow()
    
    time_difference = (utc_now - here_now)
    
    print(f"My time   : {here_now: %I:%M %p}")
    print(f"UTC time  : {utc_now: %I:%M %p}")
    print(f"Difference: {time_difference}")
                
  • Output is:
    My time   : 01:02 PM
    UTC time  : 06:02 PM
    Difference: 5:00:00
                  

Module 12: Controlling the Action | Part 1

Main Operators for Controlling the Action

You control what you program by making decisions
  • You use operators to make comparisons
  • These operators are called relational operators or comparison operators
  • Python also has three logical operators
Python Comparison Operators for Decision-Making
Operator Meaning
== Is equal to
!= Is not equal to
< Is less than
> Is greater than
<= Is less than or equal to
>= Is greater than or equal to
Python Logical Operators
Operator Meaning
and Both are true
or One of the other is true
not Is not true
All these operators are often used with if...then...else

Making Decisions with if

Syntax for if
    
    if condition: do this
    do this no matter what
              
  • The first line is executed if the condition is true
  • The second line is executed regardless of the condition outcome
Use Indentation to do more than one thing is the condition is true:
    
    if x == 0:
        x = x +1
        print("This line will also execute")
        print("This line is also part of the if statement block")
    print("This line is not part of the block")
                
  • All indented lines will execute if the condition is true, and will not execute if the condition is false
Adding else to your if logic
  • If the condition is true, execute the block after if. Ignore the block after else
  • If the condition is false, execute the block after else. Ignore the block after if
  • 
    if x > 5:
        print("This will only be printed if condition is true.")
    else:
        print("This will only be printed if condition is false.")
                
Handling Multiple else statements with elif
  • Used when if...else is not enough
  • 
    if name == "Ahmed":
        print("Hi Ahmed")
    elif name == "Mohammed":
        print("Hi Mohammed")
    elif name == "Hussam":
        print("Hussam is eating Fish!")
    else:
        print("The last else is optional")
                

Module 13: Controlling Action | Part II

Repeating a Process with for

Looping through numbers in a range
  • Use the for loop to repeat a line of code, or several lines as many times as you like
  • If you know how many times you want to repeat the loop, use the following syntax:
  • 
    for x in range(y):
      do this
      do this
      ...
    un-indented code will execute after the loop is complete
                
  • Examples:
  • 
    for x in range(3):
        print(x)
    print("All done")
                
  • Output is:
    0
    1
    2
    All done
                  
  • 
    for x in range(1, 4):
        print(x)
    print("All done")
                
  • Output is:
    1
    2
    3
    All done
                  
Looping through a String
  • Using range() in a for loop is optional
  • You can replace range with a string
  • The loop will repeat for every character in the string
  • 
    name = "Ahmed"
    for x in name:
        print(x)
    print("Done")
                
  • Output is:
    A
    h
    m
    e
    d
    Done
                  
Looping Through a List
  • A list is any group or items separated by commas, inside square brackets
  • You can loop through a list like the following example:
  • 
    for x in ["My", "name", "is", "Mohammed"]:
        print(x)
    print("Done")
                
  • Output is:
    My
    name
    is
    Mohammed
    Done
                  
Bailing out of a loop
  • You can force a loop to stop early if some condition is met by using the break statment
  • 
    for x in range(5):
      if x == 3:
        break
      print(x)
    print("Done")
                
  • Output is:
    0
    1
    2
    Done
                  
Looping with continue
  • Use continue to skip the current iteration of the loop
  • The loop will continue with the next iteration:
  • 
    for x in range(5):
      if x == 3:
        continue
      print(x)
    print("Done")
                
  • Output is:
    0
    1
    2
    4
    Done
                  
Nesting Loops
  • You can put loops inside other loops
  • Make sure your indentation is correct:
  • 
    for x in ["First", "Second", "Third"]:
        print(x)
        for y in range(1, 4):
            print(y)
    print("both loops are done")
                
  • Output is:
    First
    1
    2
    3
    Second
    1
    2
    3
    Third
    1
    2
    3
    both loops are done
                  

Looping with while

while loop syntax
  • With while loops, you have to make sure that the condition that makes the loop stop happens eventually
  • Otherwise, you get an infinite loop
  • 
    counter = 5
    while counter < 11:
        print(counter)
        counter += 1
                
  • Output is:
    5
    6
    7
    8
    9
    10
                  
Starting while loops over with continue
  • You can use if and continue in a while loop to skip back to the top of the loop
  • 
    counter = 0
    
    while counter < 6:
        counter += 1
        if counter == 3:
            continue
        print(counter)
                
  • Output is:
    1
    2
    4
    5
    6
                  
Breaking while loops with break
  • This is similar to for loops
  • 
    counter = 0
    while counter < 6:
        if counter == 4:
            break
        print(counter)
        counter += 1
                
  • Output is:
    0
    1
    2
    3