Page Comparison

Table of Contents

...

Algorithms-Specific Informational Websites (not programming challenge websites)
- Nick Parlante - Pointers, Binary Trees, etc.
- OpenDSA
  - Very good explanations
- GeeksForGeeks
  - OK explanations
- Jason Park - Algorithm Visualizer
- Khan Academy - Algorithms
- Treehouse
Books
- Look good:
- The Algorithm Design Manual by Skiena
  - Rec'd by Adam D'Angelo in a Quora post.
- Algorithms in a Nutshell, 2nd ed.
- Algorithms Unlocked
- Essential Algorithms - 5 stars on O'Reilly
- Working with Algorithms in Python - O'Reilly video series - 4.3 stars - Same guy who wrote 'Algorithms in a Nutshell'
- Maybe good:
- Meh:
  - Learning Data Structures and Algorithms
Articles
- http://haseebq.com/how-to-break-into-tech-job-hunting-and-interviews/
  - First, a high-level roadmap.
    Watch the Princeton Coursera course on algorithms (by Robert Sedgewick), mostly up until Dijkstra’s algorithm and not much more than that. Try to follow along and code as you go. You don’t need to implement things in Java unless you already know Java.
    Read The Algorithm Design Manual for general algorithms and data structures background. Go through most of it. (You can skip the giant appendix about specific algorithmic topics).
    Buy Cracking the Coding Interview. We’ll be using this as a repository of coding problems and solutions.
    (...)
    
    If you’re working on an algorithms problem (such as out of CTCI) and can’t figure it out, spend no more than 20-30 minutes without making progress. Just go look up the answer. Contrary to popular belief, most struggling past 30 minutes is pointless.
    Some people will disagree with this. [NW: Like Tom and Chris Uga.]
    Screw those people.
    That said, if you find up having to look up the answer to more than 2/3rds of problems, your problem set is too hard and you need to downgrade.
    But banging your head against a wall or staring blankly at code is a waste of time. For code/algorithms problems, you very much want to maximize the density of learning per hour.
    Any time you look up the answer to a coding problem, try to understand the solution. Walk through it with a debugger if it’s particularly mysterious. Read multiple explanations of why it works.
    Then—and this is extremely important—if you didn’t solve the problem completely on your own, treat the problem as though you never solved it at all. Put it back into your rotation of problems. Mark it as something you need to try again from scratch. Wait at least a day, and try to solve it fresh. If you fail, rinse and repeat ad infinitum.
    And finally, structure the hell out of your learning. Make a schedule. Know exactly when you’re going to be working on this stuff, when you’re going to take breaks, when you’re going to go to lunch, etc. Build flexibility into your schedule, but have clear goals for how many hours a day you’ll spend studying.
Interview Challenge sites
- Starfighter
  - Stockfighter
  - this is by Kalzumeus
Front-end interview challenge / prep sites
- https://www.acefrontend.com/
Interview Prep sites
- CareerCup
  - rec'd by Choketsu
- Codility
  - Lessons
  - Challenges
  - Rec'd by Toptal
- Cracking the Coding Interview - Video Interviews
- HackerRank
  - rec'd by Choketsu
- InterviewBit
  - via Tom A.
- Interview Cake
Tech Interview Handbook (GitHub project)
Non-Interview-Prep Challenge sites
- beat.my.code()
  - Rec'd by Toptal
- CodeChef
- CodeForces
- CoderByte
- Codility
- CodingBat
- Kaggle
  - Someone who works with machine learning explaining how Kaggle is a simplification of real ML work:
    - http://jvns.ca/blog/2014/06/19/machine- ... petitions/
- LeetCode
- Polish Olympiad in Informatics (POI)
- Project Euler
  - Random Observations - Solving Project Euler Problems
  - Jason B Hill's Project Euler solutions
    - Rec'd by Toptal, Chris Uga
- Sphere Online Judge (SPOJ)
- Top Coder
- USACO Training
  - rec'd by Choketsu
  - How to get started training at the USACO Gateway:
    - http://blogkev.wordpress.com/2013/09/14 ... o-gateway/
  - What is it like to attend the USACO training camp?
    - http://www.quora.com/USA-Computing-Olym ... ining-camp
    - It's only a week
    - New people just get 3 hours of lab/lecture in the morning
    - The top guys do a contest every day. Sometimes the contest goes until after lunch, otherwise they just review after lunch. (Seems like 5 hrs/day)
    - Then everyone has fun for the rest of the day
  - If one is stuck on a particular section, is it worthwhile to keep on trying to solve the given problem, or should one move on to other sources in preparation for programming contests?
    - http://www.quora.com/Is-it-worthwhile-t ... ning-pages
  - Richard Peng, one of the most successful Canadian IOI competitors of all time, on the USACO training pages:
    - "USACO training was put together before the major IOI escalation (Finland/Korea/Wisconsin). A lot of the techniques described on it are no longer useful on *OI and a lot of the 'hot' topics over the past few years are not covered. Also, a lot of the bottle necks on training are quite meaningless, and they typically cause a lot of frustration and time waste on the scale of months."

Algorithms Unlocked

Summary
- 1. What Are Algorithms and Why Should You Care?
- 2. How to Describe and Evaluate Computer Algorithms
- 3. Algorithms for Sorting and Searching
- 4. A Lower Bound for Sorting and How to Beat It
- 5. Directed Acyclic Graphs
- 6. Shortest Paths
- 7. Algorithms on Strings
- 8. Foundations of Cryptography
- 9. Data Compression
- 10. Hard? Problems

...

Step-by-step process

Clarify the task.
Design the system at a high level.
1. Share what you know about the system with the interviewer.
2. If you’re not familiar with the system, tell the interviewer so they can explain how it should work.
3. Constrain the problem to make it solvable within the hour.
Clarify the scale of usage, the amount of data involved, the rate of requests.
1. Specify some key metrics to help your high-level decision-making.
  1. Example: the number of users, the amount of data.
  2. Try to determine the number of requests per day, divide by 100,000 (seconds per day) to get the average requests per second, and take a guess at the peak factor (how much times the number of requests per second you might see in peak traffic), take a guess at how many requests per second a typical server might be able to handle (say, 1000), and use that to determine how many servers you’ll need.
    1. It doesn’t have to be perfect, the idea is to see that you’re aware of these ideas.
    2. If you’re going to be using auto-scaling servers, that’s fine.
    3. (Source)
2. You don’t necessarily have to go into the details of scaling at this point.
1. List the different metrics you could consider tracking / focusing on and explain the one you think we should focus on and why.
  1. Total users, DAU, WAU, MAU. Maybe daily minutes per active user.
  2. Uber: driver ratings, trip completion rates, and acceptance rates, number of rides completed per day, number of cancelled rides, number of rider complaints raised, average loading time, ratio of daily active users to monthly active users, number of daily rides booked per daily active user, average revenue per user, average wait time
Clarify the scale of usage, the amount of data involved, the rate of requests.
1. Specify some key metrics to help your high-level decision-making.
  1. Example: the number of users, the amount of data.
  2. Try to determine the number of requests per day, divide by 100,000 (seconds per day) to get the average requests per second, and take a guess at the peak factor (how much times the number of requests per second you might see in peak traffic), take a guess at how many requests per second a typical server might be able to handle (say, 1000), and use that to determine how many servers you’ll need.
    1. It doesn’t have to be perfect, the idea is to see that you’re aware of these ideas.
    2. If you’re going to be using auto-scaling servers, that’s fine.
    3. (Source)
2. You don’t necessarily have to go into the details of scaling at this point.
Design the system at a high level.
1. At an early stage, it’s OK to have a few simple components; later on you can get more complex.
2. Example:
  1. If asked to design Spotify, you could draw a mobile app connected to a load balancer which connects to a group of webservers, which connect to the database.
Outline your database design.
1. After laying out your components, design your database.
2. Split out databases appropriate for different types of data.
3. In general, try to explain why you’re doing something upfront rather than waiting for your interviewer to ask you.
Go through a normal use-cases and explain/flesh out how it will work in greater detail.
1. Consider how the webservers will retrieve data from the db and how they’ll send it to the end-user; will they use websockets?
If you have more time, think about how things could “go wrong” in terms of scaling.
1. Examples:
  1. When designing Spotify, some songs might be requested by users for listening far more than others.
2. Solutions:
  1. A CDN

...

Abbott & Fisher
- https://www.amazon.com/gp/product/B00YF0OSHC
- https://www.amazon.com/Scalability-Rules-Principles-Scaling-Sites-ebook/dp/B00503D1TY/
https://www.amazon.com/API-Design-Patterns-JJ-Geewax/dp/161729585X
https://www.amazon.com/APIs-Strategy-Application-Programming-Interfaces/dp/1449308929/
The Art of Capacity Planning
- Written by the manager of data operations for Flickr.com
- First edition: https://www.amazon.com/Art-Capacity-Planning-Scaling-Resources/dp/0596518579
- Second edition: https://www.amazon.com/Art-Capacity-Planning-Scaling-Resources-dp-1491939206/dp/1491939206/
ByteByteGo / Alex Xu
- https://www.youtube.com/@ByteByteGo/videos
- https://blog.bytebytego.com/archive?sort=new
- https://www.amazon.co.uk/System-Design-Interview-Insiders-Guide-ebook/dp/B08B3FWYBX?ref_=ast_author_dp
- https://www.amazon.co.uk/System-Design-Interview-Insiders-Guide/dp/1736049119?ref_=ast_author_dp
Database Star - Database Design
https://www.designgurus.io/ - Grokking the System Design Interview (online course)
https://www.amazon.com/Design-Web-APIs-Arnaud-Lauret/dp/1617295108/
https://www.amazon.com/Designing-Swagger-OpenAPI-Joshua-Ponelat/dp/1617296287/
https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable-ebook/dp/B06XPJML5D/ref=tmm_kin_swatch_0?_encoding=UTF8&qid=&sr=
https://www.amazon.com/Designing-Distributed-Systems-Patterns-Paradigms/dp/1491983647/
https://www.amazon.com/Designing-Web-APIs-Building-Developers/dp/1492026921/
https://www.youtube.com/@gkcs/playlists
https://www.youtube.com/@hnasr/videos
Interview Pen - System Design
microservice-api-patterns.org - This looks really interesting.
https://www.youtube.com/@IGotAnOffer-Engineering
NeetCode system design courses - https://neetcode.io/courses
https://www.quastor.org/
- https://drive.google.com/file/d/1U7EchvgzCjTtF5JzGVVCFgjXC-qw7lIG/view?pli=1 ← They send you a link to this ebook when you sign up for their newsletter.
https://www.amazon.com/REST-Design-Rulebook-Mark-Masse/dp/1449310508/
https://www.youtube.com/@sudocode/playlists
https://www.youtube.com/@SystemDesignInterview/videos
system-design-primer (GitHub)
https://www.algoexpert.io/systems/product
- Negative review: https://www.reddit.com/r/cscareerquestions/comments/s0e54f/warning_systems_expert_by_the_algoexpert_guy_is/
  - “modules are very light on information, and you'd have a better time just reading the wikipedia page on those topics.”
  - “topics aren't well explained and it's just an extremely high level overview of what they are”
Udemy: https://www.udemy.com/courses/search/?src=ukw&q=system+design
https://www.amazon.com/Scalability-Startup-Engineers-Artur-Ejsmont/dp/0071843655

...

Designing Data-Intensive Applications

IGotAnOffer

2023.03.10 - Google system design interview: Design Spotify (with ex-Google EM)
- Summary
  - 23:05 - He discusses how to send the audio data to the user, he initially suggests using websockets to send chunks to the webserver, then decides that for only 5mb it would be “almost instantaneous” to send data from S3 to the webservers, and would remove any lag from hitting the database. So that makes me wonder whether the S3 db would be at the same datacenter as the webservers and what kind of bandwidth you can get between them.
    - I’m interested in knowing the process for determining whether to chunk between the EC2 instance and the S3 bucket.
  - 25:37 - After creating his relatively-simple design for Spotify, he asks himself, “What could go wrong here?” and realizes that certain songs/artists could be way more popular than others. To avoid wasting bandwidth, he recommends using a CDN (a cache).
  - 33:00 - He’s talking about other types of caches: a cache layer between the db and the webservers (redis?), a local cache within each webserver, and then a cache within the phone app to store frequently-played songs.
  - 35:20 - More on the load balancer - Normally load balancers strive to equalize the CPU load across servers, but for this app he might focus more on I/O and/or RAM usage; he’d want to make it smarter than a normal round-robin system. [NW: Can a load balancer He goes through the various use cases of Spotify he can think of, then asks which one he should focus on, which the interviewer limits to “finding and playing music”.
  - He asks how many users and how many songs the interviewer wants to support, then goes through some math to try to estimate how much data they’ll need to store, but he never ends up using that information later on.
  - He draws out a high-level diagram of the system: a phone connects to a load balancer which connects to a bunch of web servers which connect to a database.
  - He goes deeper on the database and explains that they’ll want to have a relational db for user metadata (AWS RDS) and blob storage for the songs (AWS S3). He then lists some example fields you might want to use for the song table in the relational db.
  - 23:05 - He discusses how to send the audio data to the user, he initially suggests using websockets to send chunks to the webserver, then decides that for only 5mb it would be “almost instantaneous” to send data from S3 to the webservers, and would remove any lag from hitting the database. So that makes me wonder whether the S3 db would be at the same datacenter as the webservers and what kind of bandwidth you can get between them.
    - I’m interested in knowing the process for determining whether to chunk between the EC2 instance and the S3 bucket.
  - 25:37 - After creating his relatively-simple design for Spotify, he asks himself, “What could go wrong here?” and realizes that certain songs/artists could be way more popular than others. To avoid wasting bandwidth, he recommends using a CDN (a cache).
  - 33:00 - He’s talking about other types of caches: a cache layer between the db and the webservers (redis?), a local cache within each webserver, and then a cache within the phone app to store frequently-played songs.
  - 35:20 - More on the load balancer - Normally load balancers strive to equalize the CPU load across servers, but for this app he might focus more on I/O and/or RAM usage; he’d want to make it smarter than a normal round-robin system. [NW: Can a load balancer dynamically divert traffic like that? That’s not how I thought they worked.]
  - 38:20 - You may want to locate your replicas closer to the geographies where the data is more likely to be used.
- Comments
  - Like many others I was designing my own version alongside Mark's and I think the area he was a little weaker in (which he himself admits) was load balancing. My background is systems administration so I may have a different perspective on this. I think going back a few steps, chunking the data also serves an important role in the load balancing process. I would have songs chunked, from every retrieval source, so that as soon as the user presses Play, the song begins playing, and playback should always be an instantaneous process unless the servers are over capacity, which can occur because some song or album has gone viral.
    I would structure the web server load balancing so that client apps attempt to contact the server geographically closest to them first and utilize GSLB (global site load balancing) which combines layer 4 and layer 7 load balancing, as I/O capacity or concurrent connections (the two metrics I would prioritize) reach a threshold.
    Again, when talking about load balancing, it's important to determine what happens when maximum capacity on all servers is exceeded. When this happens in my design, the system will issue "tickets" for data chunks, served in the order they arrive in. This is where song chunking comes into play. Because we are chunking the MP3 data, we can still serve the first chunk of the song from the nearest available server as soon as that I/O is available, further ensuring near-instantaneous playback upon pressing the Play button. The rest of the song then has some time to download and cache to the client device, reducing the number of interruptions and pauses in playback due to bandwidth and concurrent connection overagesof the song then has some time to download and cache to the client device, reducing the number of interruptions and pauses in playback due to bandwidth and concurrent connection overages.
2023.06.30 - Design TikTok (with ex-Google EM)
- He just straight up asks the interviewer what use case he wants him to focus on without describing anything about how the app works. The interviewer says he wants him to ignore how the app itself works and instead focus on the backend that handles uploading and downloading videos. He clarifies that the interviewer doesn’t want him to spend any time on how people will create accounts etc.
- He asks the interviewer what scale / usage to build for and he’s told 1B users in 150 countries, 1B views per day, 10B videos uploaded in a year.
- He asks the interviewer what kind of success metrics we should be aiming for and is asked to come up with them. He comes up with “time in app”.
- He asks how much time we should expect an average active user to spend in the app, the interviewer says an hour.
System Design Interviews: 10 Key Principles (with ex-Google EM)
- Communicate efficiently.
  - The interview is definitely an artificially time-constrained environment.
  - So you need to be efficient. You don’t want the interviewer wondering what you’re thinking. You want the interviewer to be able to follow your thinking.
  - They work with a lot of engineers who understand the technical part but struggle to clearly communicate what they’re thinking.
- Scope the problem.
  - Real systems (like YouTube) are too complicated to fully design in a one-hour interview.
  - So you need to ask, “Which part of the app/feature-set should I focus on?”
  - You need to go along with what the interviewer wants to focus on.
- Start drawing ~15 minutes in
  - This isn’t a hard number, it’s just a guideline.
  - If you start drawing too soon, you may not fully understand what the interviewer wants you to focus on, and start going down a wrong path.
  - If you wait too long, you may not have enough time to come up with a working solution.
  - The interviewer says that in one mock the Google EM didn’t start until 20 minutes in and would’ve been struggling to finish.
- Start with a simple design
  - Don’t add requirements that don’t exist.
    - Example: Adding tiering storage could prevent you from having a complete solution by the end of the interviewer.
  - You can tell the interviewer that a particular optimization is available and how you’d use it, but don’t let it stop you.
  - If the interviewer asks you to go deeper on a subject, follow their lead.
- Use prep resources
  - GitHub has a system design study guide, for example.
  - You might see some differences across guides
- Properly understand the problem
  - Talk through specific use cases.
  - Imagine actually calling the API.
  - Try to catch assumptions you may have made.
  - Check in with the interviewer to see if you’re on the right path.
  - He mentions the name of the doc used for a full spec is “PRD” - Product Requirements Document
- Practice, practice, practice!
  - You can watch and read a lot of material and think you understand it, but still not be able to perform when it comes time to do it for real.
  - Practicing will also make you relax when you have a real interview.
  - If you don’t have another person or coach, just record yourself.
  - The English guy said his mom pointed out he touches his hair and face too much.
  - Don’t underestimate how much time it’ll take for a mock interview; if you want enough time for back-and-forth, you may want to block out 2 hours. 10 minutes for feedback is pretty tight.
- Explain your thinking
  - Example: Don’t just say “I’m going to use a database”, or “I’m going to use a MySQL/relational db”; explain why you’re choosing the particular technology you’re using. “The data is pretty well structured, I’ll need multiple tables, do queries across the data (join tables)”.
  - You don’t want to come across like you’re repeating memorized solutions without understanding the pros and cons of the technology.
  - The interviewer wants to see how clearly you can communicate.
  - It’s could be fine if the interviewer doesn’t agree with your choice.
- Get comfortable with the math.
  - They want to see that you can make some assumptions and do some really basic back-of-the-envelope math.
  - Make sure the interviewer can follow your math.
  - If you have a billion queries a day, with ~100,000 seconds in a day you’re dealing with ~10,000 requests per second.
  - I want to hear from you that you know about peaks (in traffic), that you’re aware of it. So a peak factor of 5 means you could hit 50,000 requests per second.
  - If a server can handle 1,000 requests per second, that means we need ~50 servers.
  - Get comfortable with the units for larger amounts of data (terabytes, petabytes, exabytes). Showing familiarity with that
  - Don’t get bits and bytes confused. Network bandwidth is measured in bits per second, data is measured in bytes. You can keep it simple by estimating 10 bits per byte instead of 8.
  - Using a calculator is fine.
- Use the drawing tool efficiently.
  - Be aware of what drawing tool they’ll want you to use.
  - You might be able to negotiate with the recruiter about what tool you use.
  - You want to be able to draw things quickly.
  - You need to know how to draw three things:
    - Boxes
    - Arrows
    - Text
  - Colors are not important, perfectness of arrows is not important, the text just needs to be good enough that you can refer to it.
  - It just needs to be clear what you’re pointing at when you have your mouse over something.
  - You don’t want the interviewer to have to keep track of stuff in their head.
  - It’s totally fine to change your diagram.

...

Versions Compared

Old Version 57

New Version 58

Key

Algorithms Unlocked

Step-by-step process

Designing Data-Intensive Applications

IGotAnOffer