Algorithm Analysis

CSCI 1913 – Introduction to Algorithms, Data Structures, and Program Development
Adriana Picoral

What’s an algorithm

The word “algorithm” is in the public vernacular (commonly spoken, informal language), but it has a specific meaning in computer science and mathematics.

What is an algorithm?

Algorithm

Finite sequence of instructions (not code yet), typically used to solve a class of specific problems (such as search and sorting) or to perform a computation.

An algorithm can be expressed as a finite amount of space and time (worst case).

Social media recommender systems are often refered as “the algorithm”, but these systes rely on heuristics (there’s no “correct” recommendation).

Why?

import time

def run_big_list():
    evn = list(range(0, 999999, 2))
    for i in range(10000):
        _ = i in evn
    
def run_big_set():
    evn = set(range(0, 999999, 2))
    for i in range(10000):
        _ = i in evn

if __name__ == "__main__":
    start_time = time.perf_counter()
    run_big_list()
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"Time to run_big_list(): {elapsed_time:.4f} seconds")
    
    start_time = time.perf_counter()
    run_big_set()
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"Time to run_big_set(): {elapsed_time:.4f} seconds")
Time to run_big_list(): 11.4692 seconds
Time to run_big_set(): 0.0104 seconds

“correct” vs “good”

  • What does it mean when we say code is “correct”?
  • How can code be correct but not good?
  • How do we compare correct code?

Software Development

Level 1: “direct translation”

  

Specific Problem → Specific Code

Software Development

Level 2: “problem solving”

  

Problem Description → Specific Problem → Pseudo-code → Specific Code

  • Where I want you today
  • Skills won through hard practice
  • Starting to see connections and re-use tasks
  • Language agnostic

Software Development

Level 3: “generalization”

  

Problem Description → Specific Problem → Generic ProblemGeneral Algorithm → Pseudo-code → Specific Code

  • Where we’re going
  • Many problems are old friends
  • Big problems are made of little familiar problems

Software Development

Level 1: add up all elements in an array

Level 2: find the prime numbers in an array

Level 3: sort an array, use fewer than \(n^2\) comparisons

Algorithm

Generic Problem: A formally specified problem or task.

Algorithm: An ordered series of computations to accomplish a given task (This is purely a mental/conceptual thing.)

Code: A specific text file containing a specific realization of an algorithm in a specific programming language

What we’re going to study

CSCI 1913/1933 → CSCI 3041 → CSCI 4041 → CSCI 5421

  • CSCI 1913/1933 - Introduction to Algorithms and Data
  • CSCI 3041 Introduction to Discrete Structures and Algorithms
  • CSCI 4041 Algorithms and Data Structures
  • CSCI 5421 Advanced Algorithms and Data Structures

Quiz 03

You have 10 minutes to complete the quiz

  • No need for comments or doc strings, no need to include the tests in your solution, no need for if __name__. Just write your function and what’s inside the function

HINTS:

  • remember there are ways to iterate over a list: through its index or through its elements, only one of these work to change the list

Our Focus

  • Structures
  • Searching
  • Sorting
  • “organization” – algorithms for data structures
  • Sensitivity to speed

Exercise

Problem: find largest value in an unsorted collection

Test cases in python

assert get_max(2, 0, -100, 100, 0, 2) == 100
assert get_max(2, 0, float("inf"), 100, 0, 2) == float("inf")

Submit your get_max.py solution to gradescope.

Solution?

Can we easily modify this to get the smallest value instead?

def get_max(*values):
  current_max = None
  for v in values:
    if current_max == None or v > current_max:
      current_max = v
  return current_max

if __name__ == "__main__":
  assert get_max(2, 0, -100, 100, 0, 2) == 100
  assert get_max(2, 0, float("inf"), 100, 0, 2) == float("inf")

If you are curious, this is how builtin max() is implemented in Python

Analysis

  • What’s the best case?
  • What’s the worst case?
def get_max(*values):
  current_max = None
  for v in values:
    if current_max == None or v > current_max:
      current_max = v
  return current_max

Analysis

  • What’s the best case? Empty collection (constant time)
  • What’s the worst case? N
def get_max(*values):
  current_max = None
  for v in values:
    if current_max == None or v > current_max:
      current_max = v
  return current_max

Runtime

import time

def get_max_tuple(*values):
  current_max = None
  for v in values:
    if current_max == None or v > current_max:
      current_max = v
  return current_max

def get_max(values):
  current_max = None
  for v in values:
    if current_max == None or v > current_max:
      current_max = v
  return current_max

if __name__ == "__main__":
    values = range(99999999)

    start_time = time.perf_counter()
    get_max_tuple(tuple(values))
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"Time to get_max_tuple: {elapsed_time:.4f} seconds")

    start_time = time.perf_counter()
    get_max(list(values))
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"Time to get_max list: {elapsed_time:.4f} seconds")

    start_time = time.perf_counter()
    get_max(set(values))
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"Time to get_max set: {elapsed_time:.4f} seconds")
Time to get_max_tuple: 1.2009 seconds
Time to get_max list: 2.9657 seconds
Time to get_max set: 5.6535 seconds

Asymptotic Runtime

  • Asymptotic means “approaching a limit”
  • It doesn’t make sense to use computing runtime to compare implementations/algorithms because there are too many variables like processor speed, assembly code efficiency, etc.
  • Literal running time is NOT helpful.
  • Goal IS NOT to predict wall clock (actual, elapsed real-world time)
  • Goal IS NOT to evaluate implementations

Asymptotic Runtime

  • We deal with approximations for comparisons
  • Usually if we care about runtime, we are interested in scalability: how much more time will it take if I double the input size?

What is Big-O?

Big-O notation is a way of quantifying the rate at which some quantity grows.

  • A tool for comparing algorithms and predicting performance
  • O stands for order of, N is the input size
  • We ignore constants and smaller terms because we care about what happens when n gets really large
  • We calculate Big-O based on worst case scenario

Big O

An algorithm with O(N) complexity means its runtime grows linearly with the input size (N).

How much more time will it take if I double the input size for a O(N) algorithm?

Twice as long. Finding max value of an array of size 12 takes twice as long as finding max value of an array of size 6 (double the input, double the time complexity).