A Common-Sense Guide to Data Structures and Algorithms – Part 4

This week we covered chapters 7 and 8 of A Common-Sense Guide to Data Structures and Algorithms, 2nd ed. by Jay Wengrow (link). I’ve been coding along in Python on my Github.

The topics discussed:

  • Big O in everyday code
  • Hash tables

Big O in Everyday Code

  • I found a bug in the third homework question. If you all find_needle(“cd”, “abc”) it will throw an out-of-range error. Adding bounds checking to the outer while-loop fixes the problem.
  • Dennis wondered if there are tools that can compute Big O for you. Jamie said you can get plugins for things like cyclomatic complexity. It looks like it’s not computationally reasonable to compute Big O because it’s a different flavor of the Halting Problem. We talked about some tools that look at performance (e.g., SonarQube, dotTrace) and ways to use those to test hypotheses against baselines to identify bottlenecks.
  • Seeing more examples of different types of code will help train your mind over time to recognize different classes of algorithms. The more exposure you have to different types of tradeoffs, the more equipped you’ll be to understand other approaches.

Blazing Fast Lookup with Hash Tables

  • Hash functions must be 1:1 in a single direction, but may be many:1 in the other direction.
  • Houston liked this chapter because he didn’t get to see this much in school; it wasn’t until his career that he used dictionaries (hash tables).
  • I thought the thesaurus example where you take the first value you find in the hash table was fitting, given this seems to be what Google Translate does (which means you lose some of the nuance of the language).
  • Jamie liked the example of using a hash table as a lookup for HTTP status codes; this eliminates the need for a large switch-statement.
  • The author’s implementation of separate chaining was interesting. I’ve seen hash tables where instead of storing the values, you store pointers to lists of values. The implementation in the book allows you to store the key as well; this would only be needed if you had a requirement of being able to enumerate all values and their corresponding keys.
  • The author didn’t cover what happens if the load factor gets too high. It’s beneficial to rehash so that you get fewer collisions. You may also need to revisit your hashing function.
  • I had never thought of using a hash table for representing objects; I’ve always used a class.