A Common-Sense Guide to Data Structures and Algorithms – Part 11


This week we covered the last two chapters A Common-Sense Guide to Data Structures and Algorithms, 2nd ed. by Jay Wengrow (link). I’ve been coding along in Python on my Github.

Our topics today were space constraints and techniques for code optimization.

This image has an empty alt attribute; its file name is common-sense-guide-data-structures.jpg

Dealing with Space Constraints

  • The author talks about three algorithms that determine if an array has duplicates: 1) comparison of each element to all other elements, 2) building a hash table to count occurrences, and 3) sorting the list followed by comparing each element to the one next to it. I thought it was interesting that he didn’t propose using a data structure that’s always sorted, which seems to solve both time and space constraints. Perhaps the original code was fit for purpose and the need to know about duplicates came later and is a one-off requirement.
  • One of the limits of recursion is stack frame space. I’m curious what the heuristics are about how big something needs to be before recursion doesn’t make sense as a tool. His example of 15,000 stack frames means you’d need a list of 2^15000 elements, which I don’t think we have enough physical RAM to hold.

Techniques for Code Optimization

  • “Often, I find that I can successfully optimize an algorithm to a speed that is in between my current Big O and the best-imaginable Big O.” It seems like a better question is, “Are there examples of taking an algorithm with my current Big O to a lower order of magnitude?” For example, most simple sorts are O(N^2), but there are sorts that work in O(N log_2 N).
  • Greedy algorithms are a common pattern. They choose the best option at that moment in time.
  • Dennis asked how you’d get the business to buy in on finding optimizations. Houston said this is the same argument as getting time for refactoring. Perhaps the author should have made a stronger point about the time/space tradeoff in this chapter. There’s also an assumption that trading time for space (e.g., caching) will be worth the effort; your problem needs to be big enough.
  • I disagree with the author’s analysis of the algorithm for finding the maximum value in a list (i.e., consider the first value the biggest, then walk through the list until you find a bigger one). First, I don’t know if I’d call this greedy. “Greedy” means that there are local minima/maxima that get you an answer, but not the best answer. max() is too naive because the local max and global max are always the same. Second, making the first element the max isn’t greedy. You need to have some initial value to compare against, and it could just as well be int.Min.