Web Engineering Whys (v0.1)

Last updated: May 9th 2024

Table of Contents #

  1. Table of Contents
  2. Latest updates
  3. Who is this for?
  4. Preface
  5. What topics are included?
  6. Frontend programming
    1. JavaScript
    2. Browser compatibility and optimization
    3. Browser image optimization
    4. Browser storage mechanisms
    5. CSS
    6. React
  7. Frontend tooling
    1. Webpack
  8. Web performance
    1. Profiling
  9. Infrastructure
    1. Docker
    2. Compose
  10. System design
    1. Distributed systems
    2. Domain-Driven Design
    3. Tolerance and prevention
    4. Scaling
    5. References
  11. Typing
    1. TypeScript
  12. Testing
  13. Software design
    1. Functional abstractions
  14. Data encoding
    1. JSON
    2. XML
    3. Protocol Buffers
  15. Data storage
    1. SQL
  16. Networking
    1. TCP
    2. HTTP
    3. Caching
  17. Web security
  18. Authentication and Authorization
    1. JWT
    2. OAuth 2.0
    3. OIDC
  19. Web architecture
    1. Server-side rendering
    2. Micro-frontends
  20. Software architecture
  21. API networking
    1. REST
    2. gRPC
  22. Algorithms
    1. Big O notation
    2. Game tree search
  23. Appendix
    1. Additional references

Latest updates #

Who is this for? #

This is for the software engineer who is doing job interviews (either as interviewer or interviewee). Or the software engineer who feels shaky about the whys of what he does daily.

Preface #

"What" and "How" are inferior ways to learn about technology.

It's your understanding and ability to explain the "Why" of a thing – a technology, an algorithm, a programming library, etc. – that reveals your level of understanding.

Any monkey brain can spit out a definition (What) or memorize some recipe (How), but knowing the Why requires actual contextual and historical understanding of the thing.

So I'm writing this document, which consists purely of the "Why"s of a lot of things we do in software engineering, particularly web systems.

What topics are included? #

Everything and anything "web dev"-y that I deem relevant to whatever I'm doing at the moment in the software industry and/or my own products.

(I'll add a Table of Contents later.)

Frontend programming #

JavaScript #

(TODO: Improve this stub section, without turning it into a book.)

  1. Why are == and === different? docs
    1. Because == does type coercion to make more "truthy" / "falsy" values match each other, whereas === doesn't.
    2. (Which makes == "convenient", but === less error-prone and faster.)
  2. Why closures?
  3. Why callbacks?
    1. For separation of concerns in a "functional" way.
      • ("Functional" in the function construct sense, not the FP sense.)
    2. Eg. so my library doesn't have to know anything about your domain logic, types, etc. (In OO, think "Visitor" pattern.)
  4. Why asynchronous callbacks for concurrency?
    1. Because if you're already using callbacks for software design, using them to also deal with concurrency is natural.
    2. (ie. setTimeout(f, 1000) to schedule a call to f 1000 secs from now, without stopping the main thread to wait.)
  5. Why Promises?
    1. The usual answer: Because "it solves Callback Hell."
      • The more I think about my project history, the more Callback Hell seems like a myth. It was never the main problem for me.
    2. The realistic answer: Because promises increase separation of concerns:
      • In library function, f as a callback gives some separation of concerns – we don't have to know what f does.
      • But f is still a thing with an input type, so the library must know how to send it inputs and receive its output.
      • Whereas with promises, the library just returns the promise, and lets the user decide what to do.
  6. Why not Promises?
    1. Because sometimes separation of concerns isn't a concern!
  7. Why Immer.js? docs
    1. Because immutability "by hand" becomes error-prone and/or verbose once objects become nested.
      • By needing possibly many .slice()s and spread operators at different levels of a nested object, just to change one property somewhere inside.
      • (And "functional libraries" that try to provide Haskell-style "lenses" usually go for the "property path as a string" which isn't type-safe.)

References #

  1. (Text/HTML) Equality comparisons and sameness @ Developer.Mozilla.Org
  2. (Text/HTML) Type coercion @ Developer.Mozilla.Org
  3. (Text/HTML) ImmerJS @ ImmerJS.GitHub.IO

Browser compatibility and optimization #

  1. Why polyfills?
    1. Because not every browser installed on every device in the world supports every DOM API the same way, or at all, so we add these javascript scripts that add the APIs so that our code can assume they exist.
    2. Because we don't want to be doing if (isExplorer) { doThis() } else if (isFirefox) { doThat() } like we used to in ancient times.
  2. Why compress?
    1. Obviously because smaller files load faster.
  3. Why "uglify" / "minify"?
    1. Because it makes the JS source code smaller:
      • By renaming variables, etc. to one-letter names.
      • By removing all unnecessary white-space and other syntactically optional stuff.
    2. (Note: "Uglified" code is a bit harder to reverse-engineer / plagiarize. But it's not something you should rely on. Any developer with enough time in their hands (and nowadays, AI tools) will be able to reverse engineer your "uglified" source.)
  4. Why source maps?
    1. Because we still want to debug code running on production, which is a pain with uglified/minified sources.
  5. Why code splitting?
    1. Because it makes the initial load faster, but not downloading everything upfront, and instead downloading each additional bit of code as needed.
  6. Why tree shaking?
    1. Because not every dependency is needed at all times, so tools like webpack will try to get rid of those dependencies which don't seem to be needed/imported by a given "bundle."
  7. Why WebP?
    1. Because JPEGs suck for images that are meant to be sharp, PNGs are too heavy when images are complex, and SVG gets expensive really fast for complex graphics (or impossible, for things like photos.)
    2. (Note: 50% of the bytes that travel the wire from any given page are image data.)
  8. Why lazy loading?
    1. See React.lazy().
  9. Why sourceSet? docs
    1. Because sometimes we want to requests different formats and sizes of an image based on the width and pixel density of the screen.
      <img srcset"header640.png 640w, header960.png 960w">
      
  10. Why CDNs?
    1. Because we want our users to load assets (images, script, audios, etc.) from dedicated (and ideally, geographically nearby) asset servers, instead of having our app's server handle that load too.

References #

  1. (Text/HTML) HTMLImageElement - srcset @ Mozilla.Org

Resources #

  1. tinypng
    • Free online image compressor for faster websites.
  2. caniuse.com
    • Free browser support tables for HTML5, CSS3, etc.
  3. imagekit.io
    • Paid (with free trial) url-based image and video optimizations, transformations.

Browser image optimization #

  1. Why not lossy?
    • Because it increases size with sharp edges and small details in otherwise undetailed areas.
    • (Eg. JPEG)
  2. Why not lossless?
    • Because it increases size with frequent, unpredictable changes in color, and large number of colors.
    • (Eg. PNG)
  3. Why not vectors?

References #

  1. (Video/HTML) Image compression deep-dive @ YouTube

Browser storage mechanisms #

  1. Why cookies?
    1. Because, being lightweight, they are sent with each HTTP request, which makes them useful for things such as passing around authentication/authorization-related data.
  2. Why local storage?
    1. Because, unlike cookies, it doesn't have an expiration time.
    2. Because the size limit is way larger (whereas cookies are limited to 4KB.)
    3. Because you may have data that should be kept on the client, but you don't want to send it with each request (like with cookies).
  3. Why session?
    1. Because of all the same reasons for local storage, except we use this one when we want the data cleared when the user closes the tab/window.
  4. Why indexed DB?

CSS #

  1. Why CSS?
    1. Because the original idea was that the web page is a "document" and matters of visual styling (font colors, etc.) should be kept separate from the HTML document.
    2. Because the HTML document is meant to contain "semantic" code.
    3. (Ie. "This is a section, this a paragraph, this is a header, etc.")
  2. Why CSS frameworks?
    1. Because (ideally) they save you time with coherent default styles for layout, forms, etc.
  3. Why not CSS frameworks?
    1. Because they may be too large.
    2. Because customizing them might not be worth the hassle compared to just copying some of base "reset" stuff to your code and going from there.
  4. Why CSS preprocessors?
    1. Because they provide certain tools for CSS reusability that CSS alone lacks.
  5. Why not CSS preprocessors?
    1. Because their costs may outweigh their benefits.
    2. Because if you're styling through JS, reusability is a non-issue.
  6. Why non-semantic CSS? (eg. tailwindcss)
    1. Because we've given up on trying to make CSS both semantic and reusable.
    2. <p class="text-center">Hello</p>.
      • Good because it's easy.
      • Bad because the document is "concerned with" styling.
    3. <p class="greeting">Hello</p>
      • Good because it's semantic.
      • Bad because the ".greeting" style rules might apply to something else, eg. a button, which isn't a "greeting" at all, so using the class "greeting" on it would be wrong.
      • (You may try making something more abstract than "greeting," but that only kicks the can down the road.)
    4. .author-bio__image (ie. "BEM"-style CSS)
      • Good because it's "semantic" again.
      • Bad because it doesn't solve the reusability problem.
    5. <p class="w-96 bg-white shadow rounded">Hello</p>
      • Bad because "semanticity" has been thrown out the window.
      • Good because it's as reusable as possible.
    6. (Note: CSS coders dealing with a perennial software design question: At what level of abstraction do we write code?)
      • .rust-in-peace-cover?
      • .megadeth-album-cover?
      • .album-cover?
      • .squared .dark-bordered .shadowed?
    7. (Note: It's been argued that not all semantics need to be content-derived.)
  7. Why Object Oriented CSS?
    1. To reuse code in terms of visual patterns, as opposed to content semantics.
    2. To separate structure and skin, and separate container and content.
  8. Why "CSS in JS"?
    1. Because for large messy projects in can provide better scoping and modularity, based on component trees, instead of CSS-selector-based targeting of UI elements.
  9. Why not "CSS in JS"?
    1. Because it makes CSS caching harder or impossible.
    2. Because it can reintroduce the old "white flash" problem because the CSS is no longer a separate file that the browser will parse before the JS.
    3. (There are workarounds, such as extracting the non-dynamic CSS from the JS source with webpack, but this has the costs of increased complexity, and an increased dependence on webpack's quirky features.)
    4. Because in React in can lead to oversized component trees.

References #

  1. (Text/HTML) When to use @extend; when to use a mixin
  2. (Text/HTML) CSS Utility Classes and "Separation of Concerns"
  3. (Text/HTML) tailwindcss
  4. (Text/HTML) About HTML semantics and front-end architecture
  5. (Text/HTML) Object Oriented CSS

React #

  1. Why "components"?

    1. Because the concept promotes the idea of the UI being composed of modular, self-contained pieces that work well together.
  2. Why is the key property important?

    1. Because without it React can't keep track of component instances properly.
    2. Because if your component returns the same type of element, React will keep the existing instance, even if all other props changed.
    3. Because it allows you to force React to unmount the current instance and mount a new one, even if props are the same.
      • Ie. it allows you to "reset" component instances.
  3. Why component "lifecycle" methods?

    1. To perform certain actions at the most (or only) appropriate time.
      • Eg. Fetching data, adding event listeners, cleaning up event listeners, etc.
      • ("Lifecycle" as in its low level state in the DOM and React's management.)
  4. Why componentDidMount?

    1. To setup stuff as soon as the DOM object is added to the DOM ("mounted")
      • Eg. Add event listeners, setup network connections, etc.
  5. Why componentDidUpdate?

    1. To react to props or state changes right after they cause a re-render.
      • Eg. Your class component gets an artistId and based on it it loads some data and renders stuff. When this artistId changes, you want to do the necessary clean up and regeneration of the UI.
  6. Why componentWillUnmount?

    1. To clean up.
      • Eg. Remove event listeners.
  7. Why getDerivedStateFromError, ie. "error boundaries"? docs

    1. To localize component explosions, so that a component failing won't crash the whole UI.
    2. (Note from docs: "There is no direct equivalent for static getDerivedStateFromError in function components yet. If you’d like to avoid creating class components, write a single ErrorBoundary component like above and use it throughout your app. Alternatively, use the react-error-boundary package which does that.")
  8. Why hooks?

    1. Because reusing stateful (and effectful, etc.) code is cumbersome with traditional class components (wrapper components, render props, etc.)
    2. Because stateful logic ends up spread across various component lifecycle methods (eg. listener registration in a lifecycle method, clean up in another, etc.) which makes it hard to do extraction refactors later.
    3. Because the switch from stateful/effectful class component to function component should result in simpler code, by moving:
      • Setup from componentDidMount to useEffect.
      • Clean-up from componentWillUnmount to useEffect's return function.
      • Reaction to prop/state change from componentDidUpdate to useEffect dependencies array.
      • this.state use to setState use.
  9. Why useEffect?

    1. To have a controlled way to do (side) effects
    2. Why does useEffect take "dependencies"?
      • To control when the effect should run.
    3. Why does useEffect run after browser paint?
      • To avoid delaying the browser's screen updates.
    4. Why useEfect(f, []), ie. empty dependencies array?
    5. Why the optional return function in useEfect?
      • Because you may need to clean up, etc.
      • (What we used to do in componentWillUnmount.)
      • Eg. Removing event handlers.
      • Eg. Clearing intervals.
      • Eg. Aborting data fetching on component unmount.
  10. Why not useEffect?

    1. Because you might not need it.
      • Eg. It's unnecessary for computing fullName from useState-managed firstName and lastName values.
  11. Why is it important to understand referential equality?

    • To avoid confusing behavior related to useEffect's second argument (its dependencies). Eg.
      function C() {
        const [ name, setName ] = useState("");
        const [ age, setAge ] = useState(0);
        const person = { name, age };
      
        const [ unrelated, setUnrelated ] = useState();
      
        useEffect(() => {
           // This will run when `setUnrelated` is called.
           // Because `person` is always a new object on every re-render.
        }, [ person ]);
      
      }
      
    • (^ Use [ name, age ] instead of [ person ] as dependencies.)
    • (^ Define person with useMemo and [ name, age ] as dependencies.)
  12. Why useContext? docs

    1. To pass state down a component tree without manually passing it as props.
      • Eg. user profile data, UI theme state, locale, authetication state, feature flags, any sort of global preference, etc.
      • ([Dependency Injection type of thing.)
      • (Scala implicit arguments type of thing.)
      • Eg. ThemeContext, use <ThemeContext.Provider value="dark"> ... </ThemeContext> at the top of the app tree, then any component (no matter how deep in the tree) can access the theme with useContext.
    2. In React jargon, to avoid "prop drilling."
  13. Why useRef?

    1. To mutate values without triggering re-renders.
    2. To reference DOM objects directly.
      • const inputEl = useRef(null);
        const onClick = () => { inputEl.current.focus() };
      1. (Note: one can pass a callback to the ref prop):
        • <input ref={inputEl => inputEl.focus()} />
    3. To avoid unnecessary uses of useState. Eg.
      function SomeForm() {
        const bandRef = useRef();
        const albumRef = useRef();
        const onSave = () => {
          const band = bandRef.current.value;
          const album = albumRef.current.value;
          // ...
        };
        return (
          <div>
            <input placeholder="Band" ref={bandRef} />
            <input placeholder="Album" ref={albumRef} />
            <button onClick={onSave}>Save</button>
          </div>
        );
      }
      
  14. Why useReducer? (Why "reducers"?)

    1. To keep all "state update logic" in the same place.
    2. Because you're trying to have a module with the "brains" of the operation (as far as state updates goes.)
    3. Because a reducer can be easier to maintain than setStates everywhere.
    4. Eg. To migrate from direct state management to reducers, you migrate:
      • From direct state setting with useState, to "action" dispatching.
      • From state-setting logic in event handlers to data updates in reducers.
      • (Reducer logic is usually just a big switch.)
      • ("Action" is a plain object like { type: "delete", id: artistId })
      • ("Action" describes to the reducer what the user just did.)
    5. Eg.
      • const [foos, dispatch] = useReducer(foosReducer, []);
      • Where foosReducer is the function with the big switch, ie. the reducer.
      • Where [] is an array with any actions you want to run by default.
      • Where dispatch is a function used to dispatch actions.
    6. Note: One can combine contexts with reducers to further simplify a codebase.
      • Centralize state update logic in a reducer.
      • Pass state and dispatch functions down implicitly with context.
  15. Why useLayoutEffect?

    1. To do layout-related measurements before the next browser paint.
  16. Why React.createPortal()? docs

    1. Because sometimes you want to render some children into a different part of the DOM.
    2. Eg. rendering the UI for a modal dialog on document.body.
        <>
          <button onClick={() => setShowModal(true)}>Show modal</button>
          {showModal && createPortal(
            <ModalContent onClose={() => setShowModal(false)} />,
            document.body
          )}
        </>
      
  17. Why React.lazy()?

    1. To defer loading component code until it is rendered for the first time. spec
      import { lazy } from 'react';
      const MarkdownPreview = lazy(() => import('./MarkdownPreview.js'));
      
    2. Because not all users need all of the app's code all the time (eg. non-admin users don't need the code for AdminDashboard).
  18. Why <Suspense>?

    1. To display fallback UIs until children finish loading.
      • No more foo ? <C foo={foo} /> : <Spinner /> everywhere.
    2. To deal with the "flashing content" issue caused by conditional rendering logic usually related to waiting for data fetches.
      <Suspense fallback={<p>Loading...</p>}>
        <TwitterStats />
        <YouTubeStats />
        <McDonaldsStats />
      </Suspense>
      
    3. (^ The child components above can then also get rid of the usual ternary conditional boilerplate when waiting for data.)
    4. (Note: Only Suspense-enabled data sources will activate the Suspense component)
  19. Why Flux?

    1. Because Facebook needed a way to keep state updates under control, and found that it could do so by:
      • Having "singleton" stores. PostsStore, CommentsStore, etc.
      • Register each store with a Dispatcher.
      • Having dispatcher calls be the only way to trigger a store update.
    2. Because having any random part of the app mutate state makes maintenance hard.
    3. (Note: Facebook described this "Flux Architecture" in 2014. This inspired the creation of React and Redux.)
  20. Why a "pull" approach for UI computation?

    1. Because with a push-based approach you have to schedule the work, whereas with a pull-based approach, the framework does it for you.
  21. Why "reconciliation" / "virtual DOM"?

    1. Because an intermediate tree structure allows React to optimize its interpretation of elements into DOM (or iOS, Android, etc.) updates.
    2. Because separating rendering from reconciliation allows all the various renderers (DOM, React Native, pedagogical examples, etc.) to use the same clever algorithm for UI representation.
  22. Why JSX?

    1. To write HTML-like markup inside JS files.
    2. Because React.createElement() by hand gets ugly fast.
      • JS: React.createElement(Foo, {a: 42, b: "B"}, "Text").
      • JSX: <Foo a={42} b="B">Text</Foo>.
  23. Why the fiber architecture?

    1. To prioritize different types of updates. Eg. Animation updates over data store updates.
    2. To be able to pause/abort/reuse chunks of rendering work.
    3. (The original stack-based reconciliation algorithm couldn't do this.)
    4. (The original algorithm would render subtrees immediately on update.)
  24. Why this "fiber" abstraction?

    1. To represent a "unit of work," so work can be "split."
    2. Eg. Schedule high priority work with requestAnimationFrame().
    3. Eg. Schedule lower priority work with requestIdleCallback().
    4. (Note: A fiber can be thought of as a "virtual stack frame.")
    5. (Note: Fiber system = More flexible "call stack.")

References #

  1. (Text/HTML) React @ GitHub
  2. (Text/HTML) Design Principles @ Legacy.ReactJS
  3. (Text/HTML) Understanding React's key prop @ KentCDodds.com
  4. (Text/HTML) Built-in React Hooks @ React.dev
  5. (Text/HTML) How to fetch data with React Hooks @ RobinWieruch.de
  6. (Text/HTML) React Hooks - useContext @ React.dev
  7. (Text/HTML) React Hooks - useRef @ React.dev
  8. (Text/HTML) Extracting State Logic into a Reducer @ React.dev
  9. (Text/HTML) Component - getDerivedStateFromError @ React.dev
  10. (Text/HTML) React Hooks - useLayoutEffect @ React.dev
  11. (Text/HTML) React - createPortal @ React.dev
  12. (Text/HTML) Scaling Up with Reducer and Context @ React.dev
  13. (Text/HTML) React - lazy @ React.dev
  14. (Text/HTML) React - Suspense @ React.dev
  15. (Video/HTML) A Quick Intro to Suspense in React 18 @ YouTube
  16. (Text/HTML) Writing Markup with JSX @ React.dev
  17. (Text/HTML) Hello World Custom React Renderer @ Medium
  18. (Text/HTML) A (Brief) History of Redux @ Redux.JS.org
  19. (Text/HTML) React Fiber Architecture @ GitHub
  20. (Text/HTML) Reconciliation versus rendering @ GitHub
  21. (Text/HTML) What is a fiber? @ GitHub
  22. (Text/HTML) Window: requestIdleCallback() method @ MDN
  23. (Text/HTML) Window: requestAnimationFrame() method @ MDN

Redux #

  1. Why Redux Toolkit (RTK)? docs
    1. Because Redux by hand involves a lot of boilerplate.

References #

  1. (Text/HTML) Why Redux Toolkit is How To Use Redux Today @ Redux.JS.org

Frontend tooling #

Webpack #

  1. Why can't webpack "tree-shake" when using CommonJS / require() for modules?
    1. Because require()s are "dynamic," ie. are resolved at runtime, whereas webpack's tree-shaking relies on static analysis of ES6-style imports and exports.
  2. Why is the "dependency graph" important?
    1. Because that's the structure webpack uses to represent what code to bundle.

Web performance #

Profiling #

  1. Why should I care about reflows and repaints?
    1. To ensure that rendering a frame to screen tk 16.6ms or less (to maintain the 60fps),
    2. Ie. To compute the JS, styles, layout, paint, and compositing in <16.6ms.
    3. (Ideally 10ms or less, do to additional overhead operations on the browser.)
  2. Why do reflows occur?
    1. Because some style changed causing the need to recompute the placement/positioning of DOM elements.
  3. Why take heap snapshots?
    1. To investigate memory issues (leaks, etc.)
    2. To see how your JS objects and DOM nodes are being distributed in memory, before/after a certain UI events.

References #

  1. (Video/HTML) Profiling JavaScript Like a Pro @ YouTube
  2. (Text/HTML) Record heap snapshots @ Developer.Chrome

Infrastructure #

Docker #

  1. Why Docker?
    1. To have a consistent development and deployment experience.
  2. Why a Dockerfile?
    1. To have a standard way to create an image.
  3. Why a "container"?
    1. To run/stop an instance of an image.
  4. Why a registry?
    1. To share images.

Docker - Compose #

  1. Why Docker compose?
    1. For defining and running multi-container Docker applications, described/configured by a YAML file.
  2. Why services?
    1. To define the "computing components of an application." spec
    2. To define the "computing resource within an application which can be scaled or replaced independently from other components." spec
  3. Why networks?
    1. For services to communicate with each other. Eg.
      services:
        frontend:
          image: example/webapp
          networks:
            - front-tier
            - back-tier
      
      networks:
        front-tier:
        back-tier:
      
  4. Why volumes?
    1. To "store and share persistent data." spec
    2. To have named data stores that can be reused across multiple services. Eg.
      services:
        backend:
          image: example/database
          volumes:
            - db-data:/etc/data
      
        backup:
          image: backup-service
          volumes:
            - db-data:/var/lib/backup/data
      
      volumes:
        db-data:
      
    3. (Note: It's not straightforward to find the actual file implementation of the "volume" on macOS, because it lives inside an abstraction created by docker.)

References #

  1. (Text/HTML) The Compose Specification @ GitHub
  2. (Text/HTML) Services top-level element @ GitHub
  3. (Text/HTML) Networks top-level element @ GitHub
  4. (Text/HTML) Volumes top-level element @ GitHub

System design #

Distributed systems #

  1. Why distributed systems?
    1. Because there aren't enough resources for one gigantic machine to do everything.
    2. To do storage and computing on multiple computers because a single computer can't handle it.
    3. (^ "multiple computers" = mid-range, commodity hardware.)
    4. According to Kleppmann:
      • Because some domains are inherently distributed (eg. mobile telecommunications)
      • Reliability: If a node fails, system as a whole keeps running.
      • Performance: Get data from nearby node rather than halfway round the world.
      • Solve bigger problems: For some problems there's no single supercomputer powerful enough.
  2. Why decentralized systems?
    1. Because sometimes we need our resources and process spread across multiple computers.
    2. (Eg. Blockchain.)
  3. Why do distributed systems typically require complex configuration?
    1. Because they need to be cluster aware, deal with timeouts, etc.?
  4. Why are distributed objects usually a bad idea?
    1. Because you can't encapsulate the remote/in-process distinction
      • An in-process method is fast and successful, so it makes sense to make many fine-grained calls.
      • A remote method is slow and error-prone, so it makes sense to make few coarse-grained calls.
  5. Why are some distributed systems needlessly complex and full of patches?
    1. Because designers make several fallacies when designing.
    2. (The fallacies, according to Peter Deutsch and James Gosling:
      • The network is reliable.
      • The network is secure.
      • The network is homogeneous.
      • The topology does not change.
      • Latency is zero.
      • Bandwidth is infinite.
      • Transport cost is zero.
      • There is one administrator.)
  6. Why layered architectures?
    1. To achieve a conceptual separation of concerns, which ideally leads to ease of maintenance and team organization and specialization.
  7. Why service-oriented architectures?
    1. For loose coupling and further separation of concerns.
  8. Why publish-subscribe architectures?
    1. For decoupling data producers from data consumers in terms of codebase and technology as well as time and space (asynchronicity).
  9. Why Amazon's DynamoDB?
    1. Because you don't run it, Amazon does.
  10. Why Google's BigTable and MapReduce?
  11. Why Apache's Hadoop?
  12. Why Apache's Cassandra?
  13. Why Apache's Kafka?
    1. Because sometimes you want 3 services to consistently reflect, in order, the series of actions (eg. data changes) the user is making on a web app.
    2. To solve concurrency-caused Isolation issues by using serial event logs.
      • Eg. The "choose username if it doesn't exist" issue. SQL transaction can still yield the problem of two users with the same username, because of concurrent execution of transactions. A solution with Kafka is to push a message about the "I want to use username 'foo'" event into the specilized totally-ordered serial log, and then the relevant user/account/etc. (micro)services will read that log in order.
  14. Why distributed systems "As a Service"?
    1. So you can make the people who wrote the distributed software deal with the distributed-related ops issues, while you focus on your business domain matters.
  15. Why scalability, availability, performance, latency and fault tolerance?
  16. Why distinguish between fault prevention, tolerance, removal, and forecasting?
  17. Why is geographical scalability a tough problem?
    1. Because network latency is bound from below, so we need to copy data to locations closer to the client, which leads to the problems of maintaining consistency.
  18. Why "fault tolerance"?
    1. Because things will go wrong. Whether it's the user doing things wrong, or a hardware corrupting data, or a wifi dying, things will go wrong, and one should at least think about how to recover (if possible) from the main faults.
    2. (Eg. Apache Cassandra.)
  19. Why a strong consistency model?
    1. To replace a single server with a cluster of distributed nodes and not run into any problems.
    2. (Ie. It makes the distributed system's behavior indistinguishable from a single server's.)
    3. (Weak consistency models start to introduce anomalies that make them a "different beast.")
  20. Why is there tension between strong consistency and availability during partition?
    1. Because one can't prevent divergence between two replicas that cannot communicate with each other while both continue to accept writes.
  21. Why is there tension between strong consistency and performance?
    1. Because strong / single-copy consistence requires that nodes communicate and agree on every operation, which results in high latency.
  22. Why is there tension between caching and replication and consistency?
    1. Because we now have multiple copies of a resource, and modifying one copy makes that copy different from all the others.
  23. Why is global synchronization extremely hard or impossible?
    1. Because network latencies have a natural lower bound?
  24. Why does the FLP impossibility result matter?
    1. Because it highlights that algorithms that solve the consensus problem must either give up safety or liveness when the guarantees about message delivery do not hold.
    2. Because it imposes a hard constraint on the problems that we know are solvable in the asynchronous system model.
  25. Why does the CAP theorem matter?
    1. Because it is more relevant in practice because of its slightly different assumptions (network failures instead of node failures) leading to clearer practical implications.
  26. Why is time modeling important?
  27. Why is the replication problem important?
  28. Why prevent divergence?
  29. Why accept divergence?
  30. Why does adding a machine not increase performance and capacity linearly?
    1. Because of the overheads of having separate computers.
      • Copying.
      • Coordination.
      • Etc.
    2. (This is why various distributed algorithms exist.)
  31. Why is latency (short response time) a complex issue?
    1. Because it's harder to address financially than other aspects of performance.
    2. Because it's strongly connected to physical limitations.
  32. Why is availability mostly about being fault tolerant in practice?
    1. Because more components = higher probability of failure, so the system should compensate so as to not become less reliable as components are added.
    2. (Availability = uptime / (uptime + downtime))
  33. Why have timing and ordering assumptions?
    1. Because information can only travel at the speed of light, so nodes at different distances will receive messages at different times and potentially different order than other nodes.
  34. Why synchronous system model (assumption)?
    1. Because it allows the system designer to make assumptions about time and order.
    2. Because they're analitically easier (but unrealistic.)
  35. Why asynchronous system model (non-assumption)?
    1. Because sometimes the system designer just can't make assumptions about time and order.
  36. Why a client-centric consistency model?
    1. To avoid anomalies where a client sees older versions of values resurfacing.
  37. Why leaderless replication?
  38. Why single-leader replication?
  39. Why partitioning?
  40. Why Change Data Capture (CDC)?
    1. Because sometimes you want to push a message to, say, Kafka, whenever data mutation (eg. INSERT, UPDATE, DELETE on your MySQL) happens, so that other systems subscribed/listening to the Kafka topic will do something (eg. analytics, stream processing, etc.)
    2. For use cases such as:
      • Replicate data (send to a data warehouse, data lake, etc.)
      • Send a message to the user whenever his data changes.
      • Invalidate or update caches.
    3. To leverage on the Write-Ahead Log DB systems have to notify other services of changes to a database / source.
    4. To achieve "near-real-time data" (analytics, etc.) and/or historical data preservation.
  41. Why a Lambda function?
    1. Because sometimes you just wanna run a little piece of code (on, say, Amazon's server pool) whenever something happens, and don't want to maintain / pay for a server that's always running, or worry about where it will run in the distributed system (you leave all that to eg. Amazon).
  42. Why Streaming?
  43. Why Events?
  44. Why cross-service transactions?
  45. Why "eventual consistency"?
  46. Why a "2-Phase Commit"?
    1. To ensure that data is consistently committed across several different systems. ("All or nothing.")
  47. Why Conflict-Free Replicated Data Types (CRDTs)?
    1. Because it's a way of managing data appropriate for offline features, allowing different replicas to make progress independently from each other, even if there's no communication possible at points. Particularly useful for collaborative state mutation apps.
    2. (Note: "Conflict-free" is a bit of a misnomer.)
  48. Questions for design session:
    1. What are we optimizing for?
      • Reads?
      • Writes?
    2. Capacity estimates.
      • Eg. Characters per post (Twitter), code blob size (GitHub), Posts-per-day (Twitter, FB)
    3. Sketch the main operations.
      • Eg. Fetching followers/following (Twitter)
    4. Are we going to have gigantic transactions? Can they be avoided by design?

References #

  1. (Text/HTML) Getting Real About Distributed System Reliability @ Blog.Empathybox.Com
  2. (PDF/HTML) Basic concepts and taxonomy of dependable and secure computing @ IEEE.org
  3. (PDF/HTML) Fallacies of Distributed Computing Explained @ UNSW.edu.au
  4. (Text/HTML) Microservices and the First Law of Distributed Objects @ MartinFowler.Com
  5. (Video/HTML) Change Data Capture (CDC) Explained (with examples) @ YouTube
  6. (Video/HTML) What Is Change Data Capture - Understanding Data Engineering 101 @ YouTube
  7. (Video/HTML) Using AWS Lambda As A Data Engineering @ YouTube
  8. (Video/HTML) Martin Kleppmann | Kafka Summit SF 2018 Keynote (Is Kafka a Database?) @ YouTube
  9. (Video/HTML) Thinking in Events: From Databases to Distributed Collaboration Software (ACM DEBS 2021) @ YouTube

Mock Interviews #

  1. (Video/HTML) Google Systems Design Interview With An Ex-Googler @ YouTube
  2. (Video/HTML) 12: Design Google Docs/Real Time Text Editor | Systems Design Interview Questions With Ex-Google SWE @ YouTube

Domain-Driven Design #

  1. Why Domain-Driven Design?
    1. Because focusing on learning about the problem (domain) leads to better solutions.
    2. To ensure that the development process serves the business needs.
    3. To find political constraints and impediments early.
      • To avoid wasting money in projects that are doomed to fail.
    4. To keep the software language close to the business language.
    5. To keep every piece of code clear about which purpose it's serving.
  2. Why Bounded Contexts?
    1. To keep the amount of context required for understanding at a minimum.
    2. So a team can have isolation and ability to move without others.
  3. Why an Anticorruption layer?
    1. To ensure the legacy part doesn't corrupt new part and viceversa.
    2. (Analogy: The "No outside shoes indoors" rule some places have.)
  4. Why Theory of Constraints with DDD?
    1. To look for bottlenecks to choose the best model.
  5. Why do "microservices" and DDD go well together?
    1. Because bounded contexts help find the right granularity for services.

References #

  1. (Text/HTML) Domain-Driven Design in 2020 @ Blog.Avanscoperta
  2. (Video/HTML) Bounded Contexts - Eric Evans - DDD Europe 2020 @ YouTube
  3. (Video/HTML) The Art of Discovering Bounded Contexts by Nick Tune @ YouTube

Tolerance and prevention #

  1. Why "fault tolerance"?
    1. Because things will go wrong. Whether it's the user doing things wrong, or a hardware corrupting data, or a wifi dying, things will go wrong, and one should at least think about how to recover (if possible) from the main faults.
  2. Why "fault prevention"?
    1. Because even though fault tolerance is what really makes a system reliable, some faults are too stupid not to prevent them, eg. having database backups, to prevent full data loss in case of DB server catastrophe.
      • (And then having a process to use said backups, to tolerate the catastrophe. Thus Fault Tolerance and Prevention go hand in hand.)
      • (Note: Catastrophe is not necessary for things going down. Eg. You might have your program hosted on some paid Linux server, and the administrator might decide that there's an update, say an urgent security patch, that has to be applied, and therefore all servers will be restarted. Your system should tolerate such "scheduled downtime.")

Scaling #

  1. Why "elastic" systems (eg. AWS and similar IaaS)?
    1. Because sometimes you don't need, or can't afford, a human specialist to manually monitor load parameters and add computing resources as needed, as the system grows.
  2. Why not "elastic" systems?
    1. Because sometimes manually scaled systems are more predictable operationally.
  3. Why "scale up"?
    1. Because sometimes one big expensive powerful machine is the right tool.
  4. Why "scale out"?
    1. Because sometimes lots of small cheaper less powerful machines is the right tool.
  5. Why "scale up" and "scale out"?
    1. Because sometimes the right combo of big expensive machine plus small cheaper machines is the best approach.

References #

  1. (Text/HTML) How to Quantify Scalability

Typing #

Static types #

  1. Why "strong static typing"?
    1. Because, much like with FP, when done right, it eliminates an entire class of problems. Namely, the most annoyingly unnecessary runtime exceptions. If shit's gonna explode in production, at least make the explosion interesting! As opposed to a stupid mistake that would have been caught by a decent typechecker while writing the code.
  2. Why "structural" typing?
    1. I don't know.
    2. I guess because sometimes, specially in web dev where everything is some kind of "JSON," it's convenient for a function to say "just give me anything that has the structure { name: string, age: number }.
  3. Why not "structural" typing?
    1. Because you'll eventually have type Robot = { name: string, age: number } and type Cow = { name: string, age: number } and then a function function milk(cow: Cow) { ... } that will happily compile when accidentally called with a Robot, thanks to structural typing. And now you're just back to the same type unsafety of dynamic typing.
  4. Why "nominal" typing?
    1. Because sometimes you don't wanna have the "RuntimeError: Trying to milk a Robot" issue described above.
    2. Because sometimes you just want a Java type thing, where class Cow and class Robot are automatically different just be virtue of being different type declarations, regardless of their inner structure.

TypeScript #

  1. Why TypeScript?
    1. Because a lot of runtime errors can be turned into compile-time errors.
  2. Why any?
    1. Because maybe you're a horrible person, so you use any.
  3. Why generics?
    1. Because many functions should work with literally every type. But you don't want any, because any loses type information.
      • So instead of any, we pass types as parameters (ie. parametric polymorphism, not ad-hoc polymorphism) so the compiler doesn't lose type information.
      • Eg. The identity function. Clearly, it should work with all types, since it does nothing but return the argument. Ie.: id(x) == x for all types. Using any would lose type information, so it's better to us generics function id<T>(x: T): T { return x } which allows the compiler to preserve the type information.
    2. Because parametric polymorphism is the best form of documentation. Eg.
      • type Foo = <A,B>(a: A, g: (a: A) => B) => B tells you everything.
  4. Why unions?
    1. To specify exactly which values can inhabit a type. Eg.
      type Theme = "dark" | "light" | "sunny"
      type ChessRow = 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
      
  5. Why tuples?
    1. Because we sometimes want to keep various values together in a structure, but don't need to label them.
  6. Why type?
    1. To name/alias a type, so we can use (and operate on) it. Eg.
      • type Actions = { run: string; jump: string }
  7. Why not interface?
    1. It's just a more limited and verbose way to declare a type.
  8. Why conditional types?
    1. To decide what a type should be based on some type-level condition. Eg.
      type NonZero<T extends number> = T extends 0 ? never : T
      
      function divide<T extends number>(a: number, b: NonZero<T>): number {
          return a / b;
      }
      const x = 0
      // divide(8, x);
      // Compile-time error.
      
    2. (Super basic poorman's "dependent types.")
  9. Why mapped types?
    1. To derive a type from another type.
    2. (Usually to grab keys from an object type to create another object type.)
      type ActionsKey = keyof Actions
      type EventHandlers = { [key in ActionsKey]: () => void }
      const playerHandlers : EventHandlers = {
        run: () => {},
        jump: () => {},
      };
      
  10. Why type templates?
    1. To further manipulate/customize derived types. Eg.
      type EventHandlersV2 = {
        [key in ActionsKey as `on${Capitalize<key>}`]: () => void
      }
      const playerHandlersV2 : EventHandlersV2 = {
          onRun: () => {},
          onJump: () => {},
      }
      
    2. To generate combinatorial unions. Eg.
      type ChessRow = 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
      type ChessCol = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h"
      type ChessCell = `${ChessCol}${ChessRow}`
      
      // const test0 : ChessCell = "z"
      // ^ Error: "z"' is not assignable to type '"a1" | "a2" 
      // | "a3" | "a4" | "a5" | "a6" | "a7" | "a8" | "b1" | "b2" ...
      
      // const test1 : ChessCell = "a9"
      // Compile-time error.
      
      const test2 : ChessCell = "h6"
      // OK.
      
    3. For stronger compile-time string checks. Eg.
      type Protocol = "http" | "https"
      type Domain = string
      type Digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
      type Port = `${Digit}${Digit}${Digit}${Digit}`
      type APIURL = `${Protocol}://${Domain}:${Port}` 
      
      // const apiUrl0 : APIURL = "htt://localhost:8080"
      // Compile-time error.
      
      const apiUrl01 : APIURL = "http://localhost:8080"
      // OK.
      
    4. Why public, private, and protected?
    5. Why any, unknown, and never?
    6. Why declare?
      1. Because sometimes you need to call 3rd-party code (eg. a JS library) but there are no typings for the function(s) you want to call, so you declare the typing yourself with declare.
    7. Why ambient declarations?

References #

  1. (Text/HTML) The TypeScript Handbook
  2. (Video/HTML) Deep Dive into Advanced TypeScript - Christian Woerz - Oslo 2023

Testing #

  1. Why test?
    1. Because we wanna know if it works before I deploy it, obviously.
  2. Why unit test?
    1. Because we want some verification that at least some of the calls to a function/component produce the expected result.
  3. Why generative testing?
    1. Because we want a (way) more complete verification that the code does the right thing in extreme and corner-case-y circumstances.
  4. Why e2e testing?
    1. Because it is close to an actual user testing it.
  5. Why human Q/A testing?
    1. Because it is the closest thing to an actual user testing it.

Software design #

  1. Why pure functions?
    1. Because you can easily test them and easily type them.
    2. (Before you tell me "Easy is not Simple," yes I've seen that talk. I've seen all those software designer-philosopher-guru talks. The functional guy, the uncle bob guy, the stack overflow guy, the other stack overflow guy, the design-pattern guy, the angry guy, the JS conf pothead guy, the Scala showman-presenter guy, etc. Spare me the slogans.)
    3. Concretely, because a function like const greeting = x => "Hello, " + x, can be easily tested / played with in a command-line setting without any ceremony.
  2. Why side-effect-y functions?
    1. Because as fun as FP is, at the end of the day programs have to do stuff!
    2. Because sometimes you're making a video game, and mutating a number with bitwise operators is preferable to constantly copying arrays for the sake of functional purity and "equational reasoning."
    3. Because sometimes you can just do what the OOP people do: Write mocks for everything in order to write tests.
  3. Why Event-driven?
    1. Because what's more natural than "when User clicks button, execute this code"? Let's not overthink this.
    2. Note: The DOM people made the right choice. Functional Programming for coding (real, non-trivial) UIs is just... clunky. Sorry, Haskell, Elm, Purescript, etc. people. Event-driven is just the most natural.
  4. Why not Event-driven?
    1. Because in data processing, on the other hand, connecting functions is the natural thing. It's just a digital form of electronic circuit design. You plug the right input-output boxes (ie. functions) in the right place, and it works. Guitar to pre to amp to interface to computer. It makes sense.
  5. Why async syntax in JS?
    1. For the same reason there is do-notation in Haskell: We programmers love imperative code, so we make syntax sugar for it. (See my definition of imperative code below)

Functional abstractions #

  1. Why is Category Theory even a thing?
    1. Because whenever humans are trying to describe things, they end up on a whiteboard, drawing a bunch of some kind of objects (dots, letters, words, symbols) and a bunch of some kind of arrows to describe relationships. And after a zillion years of doing this, some mathematicians and other nerds started to notice patterns in these messy drawings (regardless of the contents), and called them categories and started to, well, categorize them.
  2. Why learn Category Theory?
    1. The delusional answer: Because knowing the mathematical theory from where the programming abstractions came from, will somehow help you.
    2. The realistic answer: Because you want to belong to some nerd club to fill some emptiness in your soul.
    3. (You could just use the FP functions that come with any FP library, and read the docs if you don't know what something does. You know, like a programmer.)
  3. Why Functors?
    1. To have a common interface for a certain type of transformation of a large number of data structures.
  4. Why Covariant Functors?
    1. Because it's useful that, abstractly speaking, these things have the same behavior:
      • Array.map.
      • OtherCollection.map.
      • Promise.map.
      • And even something like Function<A,B>.map.
    2. Mental exercise:
      • Speaking of Function<A,B>#map: You can only (covariantly) map over the output (ie. the B). Do you see why?
      • (Hint: A covariant functor maps over the contents of a metaphorical "box." If I forced you to think of a Function<A,B> as a box, which one makes more sense to view as its "contents," the input A, or the output B?)
  5. Why Contravariant Functors?
    1. (Note: this complements the last comments above)
    2. Because sometimes you want to "adapt the input" of some thing that takes an input. Like when you go to another country and the power plug on the wall is "weird," so you grab your adapter and plug it on the wall, and now the input is what you can use. You've "contravariantly" mapped over the hotel wall's power plug!
  6. Why Applicative Functors?
    1. Because sometimes you have, say, 3 things wrappedX, wrappedY, and wrappedZ (perhaps they're three results from three calls to different I/O operations, and it's not certain whether they all have their contents), and you want to write code that, if you squint your eyes, kind looks like you're just doing f(x, y, z).
    2. (Except it'll look more like liftA3 f wrappedX wrappedY wrappedZ)
    3. (Note: The code above is Haskell.)
  7. Why Monads?
    1. Because, much like "mapping over the contents of a box" with (covariant) Functor is a ubiquitous pattern, the "I called a function f that gave me a Maybe<User>, now I want to call a function g that takes a User, but obviously only call it if there is a Maybe<User> in the maybe box" is also ubiquitous.
    2. Because, if you've ever writen "GET USER; IF USER GOTTEN, GET USER DATA; IF USER DATA ..." pyramids of if-else, where you do an uncertain operation followed by a check for its data followed by another uncertain operation followed by a check of its data followed by another uncertain operation, and so on and so forth, then you've been doing manually the pattern that a Monad interface is supposed to handle (granted you're in the right language and/or using the right libraries). And if there's a sensible abstraction (or even syntax sugar like in Haskell and Scala) to do things less manually with "flatter" code, why not use it? Go monads.
  8. Why Monoids?
  9. Why OOP (class-based)?
  10. Why OOP (message passing)?

Data encoding #

JSON #

  1. Why JSON?
    1. Because everyone uses it because this guy Douglas Crockford one day decided that this was a good idea.

XML #

  1. Why XML?
    1. Because unlike JSON, you're allowed to specify a rich structure.

Protocol Buffers #

  1. Why protocol buffers? spec
    1. Because they support strongly typed schema definitions, are efficiently encoded in binary, and have wide tooling support.

References #

  1. (Text/HTML) Protocol Buffers Version 3 Language Specification @ Protobuf.dev

Data storage #

  1. Why databases?
    1. To store data that can be found again.
  2. Why caches?
    1. To store the result of expensive (in time and/or space) operations.
  3. Why indexes?
    1. To find data faster.
  4. Why streams?
    1. In the node.js sense: To process large amounts of data without loading it all into memory first.
    2. In the general sense: Because sometimes you can't even try to load all data into memory because it doesn't even make sense. Eg. ongoing live stream from audio device without a "last" element.
  5. Why batch?
    1. Because sometimes periodically crunching accumulated data is what's needed.
  6. Why full-text search servers?
    1. Because sometimes the text-searching capabilities of the "main" application database (whether SQL or "NoSQL") isn't enough. So you use something like Solr which has some "understanding" of how words work, so you can have text searches that smarter than strict text-matching. Eg. Know to grab documents containing the world "apples" even though the user searched for "apple," as well as automatic generation of sub-searches and suggestions (eg. Solr's "faceted search")
  7. Why domain-specific databases?
    1. Because of the general case of the above point. Some databases are designed for a specific type of use-case.
  8. Why SQL?
    1. Because most data you will ever work with is relational.
    2. Because most projects you will work on do benefit from keeping good ol' relational schemas.
  9. Why NoSQL?
    1. Because sometimes you can get away with not thinking about schemas or relations upfront, and just throwing data in "collections" in Mongo or something, and then later figuring out how to related data either by reinventing relational logic at the application level, or using some relational features that NoSQL database have built into their engines.
    2. Because sometimes being able to just throw a complex, possibly nested, JSON-like object into a "collection," without specifying schemas, is enough.
  10. Why GraphQL?
    1. Because you're tired of reinventing "REST" API server/frontend for things such as specifying which fields from the user object should be included in the request, etc.
  11. Why not GraphQL?
    1. Because ultimately there's no free lunch and there is no one-size-fits-all solution for the more complex relational data issues (performance, etc.) that your project may have.
    2. (But if you're project is basic relation-wise, graphql could save you a ton of API server and client development time.)

SQL #

  1. Why join?
    1. Because you rarely will get all you need by querying a single table (particularly the more "normalized" the database is), so you bring data from another table, joined by some predicate.
    2. (Or, less common in my experience, but also useful) you may want to check multiple rows of the same table at once.
  2. Why inner join?
    1. Because a simple set intersection often is all you need.
  3. Why left or right join?
    1. Because sometimes you're like "I want all rows from A no matter what, but if they happen to have data in B, I'd like to have that too."
    2. Because sometimes a basic set intersection is too strict, and leaves rows out that you want to keep, so you need some form of set union instead.
  4. Why "normalize"?
    1. To reduce redundancy.
  5. Why reduce redundancy?
    1. To save space and keep data integrity.
    2. (Even in "NoSQL" database like mongo you end up reinventing relational logic by using ObjectIds as manual "foreign key"s across collections.)

References #

  1. (Text/HTML) PosgreSQL 16 documentation – 2.6. Joins Between Tables

Networking #

TCP #

  1. Why Transmission Control Protocol (TCP)? spec
    1. Because the internet (IP) is a strong but unreliable packet routing system, where packets often arrive out of order at the other end, and it would be a pain for server applications to have to handle reordering (among other things), so TCP was invented to handle this, and have a reliable ordered stream.
  2. Why the ARP Table for Address Translation?
    1. Because IP address and Ethernet address are selected independently, so they cannot be calculated algorithmically.

References #

  1. (Text/HTML) TRANSMISSION CONTROL PROTOCOL @ IETF.Org
  2. (Text/HTML) A TCP/IP Tutorial @ IETF.Org
  3. (Text/HTML) 4.1 ARP Table for Address Translation @ IETF.Org

HTTP #

  1. Why HTTP/1?spec
    1. Because we want a standardized and platform-independent protocol for communication between clients and servers over the internet, based on the idea of "request" and "response," and "documents" that link to each other (by means of "hypertext").
  2. Why HTTP compression?
    1. Because most text has a lot of redundancy, and HTTP involves a lot of text.
      • (Eg. HTML, CSS, JS, JSON, XML, and SVG "images," are all just text.)
      • (Images and audios are typically already compressed binary formats.)
      • The most common compressor is gzip, which uses the Deflate algorithm.
      • Brotli, by Google, is a more recent contender.
      • Browsers say which to use in a header. Accept-Encoding: br, gzip.
  3. Why not compress non-text files?
    1. Because you might actually make a file larger.
      • Eg. Compressing an MP3, which is already maximally compressed, will likely just add extra headers and dictionaries.
  4. Why minification (or preprocessing in general) before compression?
    1. To give the compression algorithm compression-friendly input.
  5. Why gzip?
    1. Because although it's not the best, the tradeoffs are acceptable for most HTTP communication.
    2. Because it is fast at both compressing and decompressing.
      • Which is important for (de)compressing on the fly.
    3. Because the memory it uses is independent of the size the data, as it operates on fixed-size chunks of data at a time.
    4. Because there are free implementations that avoid patent trolls.
  6. Why HTTP/2?
    1. Because HTTP/1 is limited to one TCP connection per request.
  7. Why HTTP/3?
    1. Because HTTP/2 went wrong.
      • Head-of-line blocking hurts user experience, because TCP's hard ordering guarantees.

References #

  1. (Text/HTML) Hypertext Transfer Protocol -- HTTP/1.1 @ IETF.Org
  2. (Text/HTML) gzip
  3. (Text/HTML) DEFLATE Compressed Data Format Specification version 1.3
  4. (Text/HTML) Brotli
  5. (Video/HTML) Everything You Need to Know About QUIC and HTTP3 @ YouTube.Com

Caching #

  1. Why cache-control? docs
    • So we can control whether the cached response associated with a certain requests belogs to the private (private) or shared (public) cache or no cache at all (no-store).

References&caching; #

  1. (Text/HTML) HTTP Caching @ Mozilla.Org

Web security #

  1. Why sanitization?
    1. Because user data might contain malicious executable code.
      • SQL injection, when the user is able to input an SQL into a form that the system assumes is clean data.
      • XSS attacks, when the user is able to input JS that will then be loaded by other users to programmatically steal or break data.

Authentication and Authorization #

JWT #

  1. Why JSON Web Token (JWT)? rfc
    1. Because it's stateless and scalable.
    2. Because XML-based schemes like SAML are too cumbersome and verbose for space constrained environments such as Authorization headers and URI query parameters.
    3. Because it's URL-safe.
      • (By using a URL-safe variation of Base64.)
    4. Because it is "easy to implement using widely available tools."
  2. Why is JWT "stateless" and "scalable"?
    1. "Stateless" because everything is in the JWT, signed and/or encrypted.
      • Unlike, say, "Session + Cookie" schemes, where the user sends a session ID in a cookie, and backend has to load the authorization data, ie. state.
    2. "Scalable" because more users does not mean more DB queries for authentication/authorization purposes.
    3. "Scalable" because the user may interact seamlessly with many different backends, since all his authorization data is in the token.
  3. Why always verify the JWT's signature?
    1. Because it's super easy to crack a JWT by guessing weak secrets.
    2. (^So you should never just jwt.decode() the payload.)
  4. Why JSON Web Encryption (JWE)? rfc
    1. Because JWTs are not encrypted by default. They're just signed (either symmetrically or asymetrically) to maintain integrity (and in the asymmetric case, non-repudiation), but anyone can inspect the payload.

References #

  1. (Text/HTML) JSON Web Token (JWT) @ IETF.Org
  2. (Text/HTML) Introduction to JSON Web Tokens @ JWT.IO
  3. (Text/HTML) JSON Web Token Best Current Practices @ RFC-Editor.Org
  4. (Video/HTML) Cracking JSON Web Tokens @ YouTube.Com
  5. (Text/HTML) JSON Web Encryption (JWE) @ IETF.Org
  6. (Text/HTML) Understanding JSON Web Encryption (JWE) @ ScottBrady91.Com

OAuth 2.0 #

  1. Why OAuth 2.0? rfc
    1. Because of the delegated authorization problem.
      • "Give Yelp access to my Gmail contacts."
      • "Give Last.fm access to my Spotify history."
    2. Because we don't want to give our password to third-party apps.
      • Eg. What Yelp used to do: They'd asked for your gmail password so they could access your contacts. Which sounds insane nowadays!
    3. Because sometimes users want to allow third-party applications to obtain limited access to an HTTP service.
    4. Because we want to always input our password only to the app it belongs, and generate some "token" to serve as proof that we've consented to limited, granular access to our data.
    5. Because we want to clearly delineate between the actors in a delegated authorization scheme.
      • "Resource Owner": That's "you" (eg. your GMAIL account.)
      • "Resource Server": That's eg. the GMAIL server.
      • "Client": That's the third-party app that wants to access some of your GMAIL data.
    6. ("OAuth" is short for "Open Authorization.")
  2. Why scopes?
    1. Because we want to be granular about access levels.
  3. Why state?
    1. Because we want to prevent cross-site request forgery (CSRF) attacks.
  4. Why an Authorization Code separate from an Access Token?
    1. Because we want to reduce the risk of token theft.
    2. Because we don't want the user's browser (Front Channel) to see the access code, which could be captured by a plethora of malicious actors (browser extensions, malware, etc.), so the Authorization Server gives an authorization code to the browser, which pass it to the client app, who will use it through the Back Channel to get the Access Token from the Authorization Server.
    const express = require(‘express’);
    const axios = require(‘axios’);
    const app = express();
    
    app.get(‘/auth/redirect’, async (req, res) => {
      const authCode = req.query.code;
      const RESOURCE_AUTH_URL = "https://oauth.gmail.com/token";
      const tokenResponse = await axios.post(RESOURCE_AUTH_URL, {
        code: authCode,
        client_id: ‘YOUR_CLIENT_ID’,
        client_secret: ‘YOUR_CLIENT_SECRET’,
        redirect_uri: ‘<http://localhost:3000/auth/redirect>’,
        grant_type: ‘authorization_code’
      });
      const accessToken = tokenResponse.data.access_token;
      // Use accesstoken to access protected resources
    });
    
  5. Why is OAuth 2.0 prone to security vulnerabilities?
    1. Because many configuration settings necessary for keeping data secure are optional according to the specification.

References #

  1. (Text/HTML) The OAuth 2.0 Authorization Framework @ IETF.Org
  2. (Video/HTML) OAuth 2.0 and OpenID Connect (in plain English) @ YouTube.Com
  3. (Text/HTML) Cross-Site Request Forgery @ IETF.Org
  4. (Text/HTML) OAuth 2.0 authentication vulnerabilities @ PortSwigger.Net

OIDC #

  1. Why OpenID Connect (OIDC)? spec
    1. Because OAuth 2.0 has been misused for authentication (instead of what it's intended for: authorization) often enough that finally a simple layer (OIDC) was added on top of OAuth 2.0 to do authentication right when an authorization server also provides end-user authentication.
  2. Why is JWT relevant to the discussion of OIDC?
    1. Because the primary extension that OIDC adds to OAuth 2.0 is the ID Token data structure, which is represented as a JWT.
  3. Why is non-repudiation an important property?
    1. Because you don't want an ID Token provider to be able to deny that they generated the token. This prevents disputes about token authenticity.

References #

  1. (Text/HTML) @ OpenID.Net

Web architecture #

Server-side rendering #

  1. Why (modern) Server-side rendering (SSR)?
    1. Because we've outgrown the "SPA" approach, because the amount of JS code we send to the browser nowadays has become a UX issue.
      • (Which is a sort of "return to the old days" except with better tools for eg. streaming responses.)

References #

  1. (Video/HTML) Understand the Next Phase of Web Development - Steve Sanderson - NDC London 2024 @ YouTube

Micro-frontends #

  1. Why micro-frontends?

Software architecture #

  1. Why have a software architect?
    1. Because it's useful to have someone dedicated to the slightly bigger picture, ie. someone who can answer at least these three questions:
      • How smoothly is the system running (ie. Operability)?
      • How understandable is the system (ie. Simplicity)?
      • How easy is it to make changes to it (ie. Extensibility)?
  2. Why microservices?
    1. Because sometimes extensibility grinds to a halt when a monolith grows beyond a certain size.
    2. Because sometimes you want components A and B to be worked on by different teams, as decoupled as possible and, in fact, on different repositories, using different programming languages.
    3. (Note: It's 2024 and it's still a pain to achieve truly "micro" microservices for frontend projects. Why?)
      • ("Mashups" are not it.)
  3. Why not microservices?
    1. Because if you do it wrong, you end up with just another monolith except more complex with more parts which are only decoupled in your imagination.

API networking #

General #

  1. Why SOAP?
    1. Because XML and schemas are cool (except when they're not).
  2. Why REST?
    1. Because ditching XML, going schemaless, and being able to do whatever the hell we want is better (except when it isn't).
  3. Why GraphQL?
    1. Because actually having a schema is cool again and having the server handle property-selection in API queries without us having to reinvent it for every project is practical (except when it isn't).

REST #

  1. Why is there tension between correctness/type-safety and REST/RESTful?
    1. Because normal programming interfaces can be statically checked, whereas REST calls are about strings and runtime-only checking.
    2. Eg. Compare:
      • import { bucket } from "aws"; bucket.create("a");
      • http.put("https://a.s3.amazonsws.com/)

References #

  1. (PDF/HTML) Distributed Systems 4th edition @ Distributed-Systems.Net

gRPC #

  1. Why gRPC? docs
    1. Because it promotes the microservices design philosophy of coarse-grained message exchange between systems while avoiding the pitfalls of distributed objects and the fallacies of ignoring the network.
    2. Because it takes advantage of HTTP/2 features such as multiplexing, stream prioritization, and server pushing.
      • (It does work over HTTP/1.1 but it's not as performant there.)

References #

  1. (Text/HTML) gRPC Documentation @ GRPC.io
  2. (Text/HTML) gRPC Motivation and Design Principles @ GRPC.io
  3. (Video/HTML) The RPC Revolution: Getting the Most Out of gRPC - Richard Belleville & Kevin Nilson, Google @ YouTube
  4. (Video/HTML) gRPC Crash Course - Modes, Examples, Pros & Cons and more @ YouTube

Algorithms #

Just some of the very few algorithms that actually interest me.

Big O notation #

  1. Why Big O notation?
    1. To tersely describe how something (a process, etc.) grows in required resources (CPU, memory) in a standardized way.
    2. Note by Noriega:
      • Just ask yourself "What shape/graph is this drawing?"
      • Then find the simplest O-notation to express the shape.
      • (O(n) for linear, O(n^2) for quadratic, etc.
  1. Why Minimax?
    1. To minimize the maximum loss (thus "minimax") in a game's turn, by looking ahead at all the possible outcomes, for as many turns ahead as practical.
    2. (There is an inverse algorithm for the inverse need, ie. maximizing the score, called "maximin.")

References #

  1. (Text/HTML) Minimaxer Part 1 - Building a minimax library in Typescript

Appendix #

Additional references #

  1. (Text/HTML) How Web Works @ GitHub.Com/vasanthk
  2. (Text/HTML) What forces layout / reflow @ Gist.GitHub.Com
  3. (Text/HTML) On Layout & Web Performance @ Kellegous.Com
  4. (Text/HTML) You Won’t Believe This One Weird CPU Instruction! @ VaiBhavsagar.Com

Related