Algorithm Analysis

CSCI 1913 – Introduction to Algorithms, Data Structures, and Program Development
Adriana Picoral

What’s an algorithm

The word “algorithm” is in the public vernacular (commonly spoken, informal language), but it has a specific meaning in computer science and mathematics.

What is an algorithm?

Algorithm

Finite sequence of instructions (not code yet), typically used to solve a class of specific problems (such as search and sorting) or to perform a computation.

An algorithm can be expressed as a finite amount of space and time (worst case).

Social media recommender systems are often refered as “the algorithm”, but these systes rely on heuristics (there’s no “correct” recommendation).

Why?

import time

def run_big_list():
    evn = list(range(0, 999999, 2))
    for i in range(10000):
        _ = i in evn
    
def run_big_set():
    evn = set(range(0, 999999, 2))
    for i in range(10000):
        _ = i in evn

if __name__ == "__main__":
    start_time = time.perf_counter()
    run_big_list()
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"Time to run_big_list(): {elapsed_time:.4f} seconds")
    
    start_time = time.perf_counter()
    run_big_set()
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"Time to run_big_set(): {elapsed_time:.4f} seconds")

Time to run_big_list(): 11.4692 seconds
Time to run_big_set(): 0.0104 seconds

“correct” vs “good”

What does it mean when we say code is “correct”?
How can code be correct but not good?
How do we compare correct code?

Software Development

Level 1: “direct translation”

Specific Problem → Specific Code

Software Development

Level 2: “problem solving”

Problem Description → Specific Problem → Pseudo-code → Specific Code

Where I want you today
Skills won through hard practice
Starting to see connections and re-use tasks
Language agnostic

Software Development

Level 3: “generalization”

Problem Description → Specific Problem → Generic Problem → General Algorithm → Pseudo-code → Specific Code

Where we’re going
Many problems are old friends
Big problems are made of little familiar problems

Software Development

Level 1: add up all elements in an array

Level 2: find the prime numbers in an array

Level 3: sort an array, use fewer than \(n^2\) comparisons

Algorithm

Generic Problem: A formally specified problem or task.

Algorithm: An ordered series of computations to accomplish a given task (This is purely a mental/conceptual thing.)

Code: A specific text file containing a specific realization of an algorithm in a specific programming language

What we’re going to study

CSCI 1913/1933 → CSCI 3041 → CSCI 4041 → CSCI 5421

CSCI 1913/1933 - Introduction to Algorithms and Data
CSCI 3041 Introduction to Discrete Structures and Algorithms
CSCI 4041 Algorithms and Data Structures
CSCI 5421 Advanced Algorithms and Data Structures

Quiz 03

You have 10 minutes to complete the quiz

No need for comments or doc strings, no need to include the tests in your solution, no need for if __name__. Just write your function and what’s inside the function

HINTS:

remember there are ways to iterate over a list: through its index or through its elements, only one of these work to change the list

Our Focus

Structures
Searching
Sorting
“organization” – algorithms for data structures
Sensitivity to speed

Exercise

Problem: find largest value in an unsorted collection

Test cases in python

assert get_max(2, 0, -100, 100, 0, 2) == 100
assert get_max(2, 0, float("inf"), 100, 0, 2) == float("inf")

Submit your get_max.py solution to gradescope.

Solution?

Can we easily modify this to get the smallest value instead?

def get_max(*values):
  current_max = None
  for v in values:
    if current_max == None or v > current_max:
      current_max = v
  return current_max

if __name__ == "__main__":
  assert get_max(2, 0, -100, 100, 0, 2) == 100
  assert get_max(2, 0, float("inf"), 100, 0, 2) == float("inf")

If you are curious, this is how builtin max() is implemented in Python

Analysis

What’s the best case?
What’s the worst case?

def get_max(*values):
  current_max = None
  for v in values:
    if current_max == None or v > current_max:
      current_max = v
  return current_max

Analysis

What’s the best case? Empty collection (constant time)
What’s the worst case? N

def get_max(*values):
  current_max = None
  for v in values:
    if current_max == None or v > current_max:
      current_max = v
  return current_max

Runtime

import time

def get_max_tuple(*values):
  current_max = None
  for v in values:
    if current_max == None or v > current_max:
      current_max = v
  return current_max

def get_max(values):
  current_max = None
  for v in values:
    if current_max == None or v > current_max:
      current_max = v
  return current_max

if __name__ == "__main__":
    values = range(99999999)

    start_time = time.perf_counter()
    get_max_tuple(tuple(values))
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"Time to get_max_tuple: {elapsed_time:.4f} seconds")

    start_time = time.perf_counter()
    get_max(list(values))
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"Time to get_max list: {elapsed_time:.4f} seconds")

    start_time = time.perf_counter()
    get_max(set(values))
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"Time to get_max set: {elapsed_time:.4f} seconds")

Time to get_max_tuple: 1.2009 seconds
Time to get_max list: 2.9657 seconds
Time to get_max set: 5.6535 seconds

Asymptotic Runtime

Asymptotic means “approaching a limit”
It doesn’t make sense to use computing runtime to compare implementations/algorithms because there are too many variables like processor speed, assembly code efficiency, etc.
Literal running time is NOT helpful.
Goal IS NOT to predict wall clock (actual, elapsed real-world time)
Goal IS NOT to evaluate implementations

Asymptotic Runtime

We deal with approximations for comparisons
Usually if we care about runtime, we are interested in scalability: how much more time will it take if I double the input size?

What is Big-O?

Big-O notation is a way of quantifying the rate at which some quantity grows.

A tool for comparing algorithms and predicting performance
O stands for order of, N is the input size
We ignore constants and smaller terms because we care about what happens when n gets really large
We calculate Big-O based on worst case scenario

Big O

An algorithm with O(N) complexity means its runtime grows linearly with the input size (N).

How much more time will it take if I double the input size for a O(N) algorithm?

Twice as long. Finding max value of an array of size 12 takes twice as long as finding max value of an array of size 6 (double the input, double the time complexity).