The Little Vulgar Book of Mechanics (v0.17.3)
Last updated: March 22nd 2023
Latest updates #
- v0.17.3:
- Changed Sound I - Space I:
- Edited out unnecessary paragraphs.
- Added comment ITD and ILD factors.
- Changed Sound I - Space I:
- v0.17.2:
- Changed Sound I - Waves I:
- Edited out unnecessary paragraphs.
- Added to sound wave propagation in water vs. air.
- Changed Sound I - Waves I:
- v0.17.1:
- Changed Probability I: Bertrand Russell quote. Rephrased sentence.
- v0.17.0:
- New section Probability I: Motivation, John Petrucci, UFOs, phrasing, belief, avoiding mystical substances, notation, Maxwell.
- v0.16.1:
- Changed Sound I:
- Added technical definition of sound.
- Added small bio of awesome engineer Harry F. Olson.
- Edited out joke that was too dragged out and just wasted reader's time.
- Changed Table of Contents:
- Remove redundant text from indented sub-section links.
- E.g. Print "Cochlea I" instead of "Hearing I - Cochlea I."
- Changed Sound I:
- v0.16.0:
- New section Table of Contents.
- New section Music I - History I.
- Changed Appendix - Lightning Network I: Added numbers to blockchain speed comment.
- v0.15.2:
- Changed Hearing I: Phrasing.
- v0.15.1:
- Changed Appendix - Lightning Network I:
- New: Payment requests aka invoices (format, use case, structure overview); Offers (BOLT-12).
- Changed Appendix - Lightning Network I:
- v0.15.0:
- New section Music I.
- Course overview / Introducing the ZN approach to music education: Pulse, meters, and tempos; Melodic content; Harmonic content; Tonality; Timbre and orchestration; Emotional content; Story; Performance; Notation and representation.
- New section Music I.
- v0.14.2:
- Changed Appendix - Lightning Network I:
- Extended: General use case, payment channels, commitment transactions, external resources.
- New: Payment forwarding, Hash Time-Locked Contracts, gossip protocol, source routing, onion routing, trial-and-error path finding, Noise Protocol Framework.
- Changed Appendix - Lightning Network I:
- v0.14.1:
- Changed Appendix - Lightning Network I: Public channels. Commitment transactions. Refunds and punishing cheaters. Full vs. self-managed combinations.
- v0.14.0:
- New section Appendix - Lightning Network I. Bitcoin. LN. Payment channel. Security primitives. Funding. Time locks. Gossip. Routing. Refund.
- v0.13.0:
- New section Models I. Causality. UNIX grep. Cannibal Corpse. Never marry a metaphor.
- Changed Functions I: Added note on functional relation not implying causality. Added ref to Models I.
- v0.12.2:
- Changed Functions III: Added Ludwig von Mises quote.
- v0.12.1:
- Changed Functions I: Added more on why we use functions.
- v0.12.0:
- New section Functions III: Mathematical notation. Cyclic. Inverses. Bit of Group Theory.
- Changed Differential Equations I: Rewrote paragraph on order of a derivative.
- v0.11.0:
- New section Differential Equations I
- Changed Functions I: Improved intro and why. Removed useless paragraph.
- v0.10.1:
- Changed Vibration I: Extend note on simple harmonic oscillation.
- Changed Force I: Clarification on forces. Note on electromagnetism.
- v0.10.0:
- New section: Vibration I
- Equilibrium, restoring force, spring, 1D kinematics.
- Added quote to Sound I - Medium I.
- Added history to Sound I - Waves I.
- New section: Vibration I
- v0.9.1: Functions I
- Added paragraph on why we use functions.
- v0.9.0: Hearing I
- v0.8.4: Sound I - Frequency I
- v0.8.3: Sound I - Level I
Table of Contents #
- Latest updates
- Table of Contents
- Introduction
- Feedback
- Types I
- Genesis I: Mechanics
- Genesis II: Universe
- Matter I: And Then There Was Shit
- Motion I
- Force I
- Vibration I
- Life I
- Information I
- Probability I
- Problem Solving I
- Functions I
- Functions II
- Functions III
- Models I
- Differential Equations I
- Sound I
- Hearing I
- Music I
- Fourier Transform I
- Appendix
Introduction #
The nail clipper doesn't get enough respect. I mean the good ol' nail clipper. You know it's a machine, right? Your nail clipper is a machine. A simple one, but still a machine. Technically speaking, a compound lever.
More on levers later. This is the introduction. But I don't really wanna waste your time with introductions, so, long story short, for the sake of history, I'll just say: This document started as notes and links for a science-y podcast episode. But, like everything I start, it grew too fast, and became something else, beyond its original purpose. So, instead, I'm gonna be growing this same document, throwing notes for several podcast episodes, song lyrics, troll posts, etc. It's just a collection of useful definitions, historical anecdotes, glued together by my shitty and rude commentary. Topics: Acoustics, Audio, other branches of Physics, Chemistry, Math, and also everything.
This document is for us, the vulgar engineers who are making shit. If we share a common language, we can build better nuclear reactors inside our death metal recording studios on our offshore drilling rigs, during the zombie apocalypse. So that's what this "book" is for. If we all agree on what we mean by "energy," "waves," etc. that'd be great. I think. Better communication through math and shit = Better at building shit.
Warning: My prose is bad, my english is almost as bad, and my approach is extremely vulgar and informal. We're not academics here. Finally, and importantly: We're not here to confirm our opinions, or to feel like we're smarter than our grandparents. We're here to optimize our communication so we're better at building, maintaining, and assessing shit.
Feedback #
Let me know about any typoes, grammerz, and badder engrish you find. (Though if you do it about this specific paragraph, I will stare at you until you realize what I think of you.)
Types I #
Let's always be as clear as possible about the types of shit we're talking about. I mean computationally. The units. The data structures. How shit is actually computed. This is the only type of feedback I will take for now: Let me know when a symbol doesn't have an explicit (or implicit from the topic of discussion) type.
Genesis I: Mechanics #
"In the Beginning there was Mechanics." – MAX VON LAUE (1879–1960)
What we're trying to do as engineers is build shit. So we want to know how shit works. I.e. we want to understand the mechanics. The mechanics of shit. We are shit ourselves, of course. So sometimes it's about understanding the mechanics of ourselves. E.g. our ears, parts of our neurology, etc. The point is that mechanics is what we care about, to have a mechanics is to understand how shit works.
Genesis II: Universe #
In order to build shit, we (and the shit) need to exist somewhere. And so there's this "Universe" thing. Thankfully, for you, I'm not that kind of "science-loving" guy. I'm not gonna talk about origin of the universe and shit. You know those pop-sci rockstars who will literally tell you "Let me blow your mind with this amazing fact: ..."? That's what I want to avoid at all costs. The elitist, pretentious "I'ma blow your mind" horseshit, disguised as "passion for science." It's the pop-sci rockstar saying "I'm assuming your mind is tiny, so it's gonna be blown by what I'm about to say." Fuck that shit.
Back to the universe, though: You can read a million books and watch a million science-y things on tv, and follow a million astrophysicists on the web, and you will be none the wiser about the origin of the universe. So don't bother. No one knows how the universe came to be.
For our purposes, instead of "universe," think "spacetime." (Even though a lot of the shit here has a Newtonian mentality, where spacetime isn't even a word, but it'll be OK). If you don't know what that means, you will, later. I mean, think "spacetime" to yourself. But still say "Universe" to others (unless you both happen to be talking about relativity). Otherwise people will tell you "get the fuck out of here, nerd." As they should.
Matter I: And Then There Was Shit #
"In doing this we’ll see that the nature of matter (i.e. body considered in general) consists not in its being a thing that is hard or heavy or coloured, or affects the senses in this or that way, but simply in its being a thing that is extended in length, breadth and depth." - RENÉ DESCARTES, Principles of Philosophy (1644)
"It seems probable to me, that God in the Beginning form'd Matter in solid, massy, hard, impenetrable, moveable Particles..." – ISAAC NEWTON, Opticks (1730)
Descartes had a simple, terse definition of matter: "Body considered in general." Newton, on the other hand, had a little story involving the big guy, the head honcho of all things, main founder of the Universe, who at some point decided to fill the Universe with shit.
My problem with both is: What the fuck am I supposed to do with it? In my computer program, I mean. Say I wanna program some simulation or experiment involving pieces of matter. My data type for matter right now would look like type Matter = {}
. Empty. Vacuous. I don't know what type of data structure would represent a chunk of "matter." Sometimes a "definition" leaves you with nothing I think what I'm looking for is properties of matter.
Still, though, you should at least get the general, if vague, notion that matter is basically shit. Rocks, clouds, trees, water, cells, etc. As well as, of course, literal shit, such as dog poop. And mediocre shit, like Galileo.
Motion I #
"Give me matter and motion, and I will construct the universe." – RENÉ DESCARTES.
OK so the universe is filled with shit. But why does anything happen? You couldn't even read this if you couldn't move your eyes. You couldn't listen it as a podcast, if a speaker couldn't move to disturb the air and eventually your ears and brain. So we gotta have motion. Shit needs to move. The beginning of all true study of mechanics and dynamics of sound and other forms of physics, chemistry, etc. is the recognition of motion as crucial.
Let's add a bit of mathematical salt to this already. Please hammer this into your brain, right now: Shit that moves, we imagine as an arrow. This arrow always has a direction and something else. Two data values. If the piece of matter happens to be a bullet, you never just care about "it's moving." Moving in whose direction?! A single number is never complete information about the motion of shit. You need at least two bits of data: some magnitude, and a direction.
So, Programmers: Make note. Vectors. (Normies: You make note too. You'll inevitably become a programmer too, after hanging around with me for long enough). If you slept through school, your idea of a mathematical operations is likely limited to plain, primitive numbers such x + 3
where x
and 3
are some plain numbers. We're going to be doing computations with more complex data structures. We will see things like x + y
but often those letters ("variables") will refer to values more complex than a simple number.
If you're a programmer, you already know this: Addition, for example, is just a function (*): add(x,y)
. Well, you can imagine that we'll be add()
'ing, but with more complex structures for x
and y
, instead of just integers or something. Remember: We're not physicists. We're not mathematicians. We're engineers and programmers. We model and compute data.
(*): Or "method," in inferior programming styles, such as object-oriented programming.
Force I #
"A Vulgar Mechanick can practice what he has been taught or seen done, but if he is in an error he knows not how to find it out and correct it, and if you put him out of his road, he is at a stand; Whereas he that is able to reason nimbly and judiciously about figure, force and motion, is never at rest till he gets over every rub." – ISAAC NEWTON, Letter (25 May 1694) to Nathaniel Hawes.
There seem to be four fundamental forces in the Universe.
Each force "dominates" things at a particular scale. Gravity dominates at the largest scales (planets and galaxies). At small scales the two nuclear forces dominate. And then there's electromagnetism, which seems to dominate everything in between (including, for example, giving rise to light, as well as the electrical activities in your ears, which means it is responsible for the way you see and hear.)
In this book, when I say "force" it'll often be in the context of good ol' classical, or "Newtonian," mechanics. In other contexts, it will be in electromagnetic domain or the nuclear domain. Regardless of the scale, you should generally think of "force" in this way: Force is the cause of motion and power.
This notion of "force" was conceptualized by Newton. He brought the concept into physics (or rather "experimental philosophy," which is what he would have called his work) to explain motion in terms of it. Using this concept, he developed, among others, the theory that all objects have a gravity. So the essence of Newtonian mechanics is that we analyze the motion of shit in terms of the forces to which shit is subjected by its environment.
We'll talk about many types of forces, including: Pressure, tension, air resistance, friction, normal force, and, of course, good ol' gravity. Some forces, such as gravity and electromagnetism, act on objects without having to physically touching them. We call this type of force "non-contact force."
You are probably in some type of building right now. Picture the internal structure of the building. There are some columns supporting some weight and shit. Or, just picture anything that's under pressure, without any motion. We call this a "static" force. Tension, torsion, and compression are common instances of "static force." So that's an additional categorization of forces: "static" vs. "dynamic."
After you take a shower, and get dressed, etc. and you go and hang your towel somewhere, what do you have to do in order to achieve this? You have to make sure to balance the combination of gravity, friction, etc. so that the towel stays in place and doesn't fall off. You are dealing with static forces there.
Force has a computable meaning, with units and everything: The aptly-named Newton. One famous derivation of it is based on mass and acceleration: F = ma
. In programmer speak: force = (mass: number, acceleration: number): number => mass * acceleration
. Give me a mass, and an acceleration, and I'll give you the force, by simply multiplying them. This equation expresses "Newton's Second Law of Motion."
Do you notice anything missing in the above program? Better types, obviously. But, specifically: Is force really just a number?
Apparently, modern physicists define the whole of Mechanics in terms of energy, momentum and "action" (the integral of energy with respect to time). We may adopt that framework for certain explanations. But we'll never just "leave Newtonian mechanics behind" or something. That's not how builders work. We build shit. If Newtonian models and concepts help us build shit that works, there's no reason to drop them.
Books #
Vibration I #
If vibrations didn't exist, the Universe would be literally dead.
The word "vibration" comes from the Latin vibrationem ("shaking, brandishing"). For our purposes, a vibration is when shit oscillates around its equilibrium point.
For vibrations to exist – or to even conceptualize them – first you need a state of equilibrium. Because, otherwise: What the hell is being disturbed? Disturbed from what? So we have this notion of an equilibrium point. This is when we say a body is "at rest." So that's the idea here: For a disturbance, and therefore a vibration, to exist, there must be an equilibrium point.
Vulgarly, the "equilibrium point" is the state of shit when you don't annoy it.
A vibration must happen somewhere. We call this a medium. This is why in Sound I - Medium I I talk about how sound needs a medium in which to happen. (Spoiler alert: Sound is vibration.)
But it's not just that shit has to "be in a medium."
The medium has behavior #
This is what we need, more specifically: You disturb shit. Shit leaves its equilibrium point. And the medium itself starts restoring the equilibrium. You see the role of a medium now? The way equilibrium is restored depends on the medium. For shit to vibrate, it must be in a medium that has the feature of "wanting" to restore equilibrium, some time after the disturbance.
Hit a guitar string. This is the disturbance. You've displaced shit (part of the string) from its position. You've slightly deformed it, if you will. The medium is now gonna restore the equilibrium, but it's not just gonna immediately clamp right back to rest. Instead, it will start to force it back to equilibrium, as lots of secondary vibrations continue to occur. The bouncing back and forth between equilibrium and non-equilibrium is called an oscillation.
The string is the medium. But there's also the air, another medium. Disturbances propagate from one medium to another, and each medium restores equilibrium in its own peculiar ways.
How to think of disturbances #
Of course, if you hit the string with a baseball bat, massacring the whole guitar in the process, that obviously counts as a "disturbance," but the experiment is over. It's not creating the type of phenomena we're studying, is it? So when I say "disturbance," I mean slight disturbance, or non-party-ruining disturbance. Mostly, think of disturbances as some type of slight position displacement.
Medium and Force #
Have you ever come across this equation: F = -sx
?
Like I said, the medium will restore shit to its equilibrium position. This act of "restoring towards the equilibrium position" is a force. I mean, it has to be, right? Because remember what I said, Newtonianly, in Force I: Force is the cause of motion and power.
Here, the medium is producing a motion back towards the equilibrium. So there's a force at work. F
is, thus, the force restoring shit back to rest.
What are -s
and x
, and why are we multiplying them?
x
is the position. Yes, a single number to describe the position. Which means we're describing it one-dimensionally. We're reviewing the basics of 1D kinematics.
Simple model of a spring #
So I'm looking at the spring you installed on the lab's ceiling. You hammered one end to the ceiling. And now we can hang shit off the bottom end.
x
is the position of the thing we're hanging off the spring. It's the shit's "vertical coordinate" if you will. Neither horizontal nor depth axes are relevant right now. Remember, it's 1D kinematics. x
is the displacement from the equilibrium position.
Another way to view x
is to say that x
is the state of deformation of the spring with respect to its fixed reference configuration, or default state, or resting state, or equilibrium state.
When x
is zero, F
is obviously zero. Body is at rest. If x
is too large, you might deform the spring, and then the experiment is over, and our model breaks, and then I scream "God damn you! God damn you all to hell!"
Not all springs in the world are the same. Springs come in different shapes and sizes. And colors, etc. What's the relevant property of a spring for us to model how it vibrates when we have shit off it?
Essentially, we care about how stiff it is. That's s
in the equation. The spring's stiffness. We're trying to express something really basic. The essence of simple vibration. We'll formally call this simple harmonic oscillation later, when we talk about how small displacements, around any stable equilibrium, make the system act pretty much like a spring.
s
is a constant number right now, which places a limit on what x
can be. Because if we stretch out the spring too much, i.e. if x
gets too large, we may permanently deform the spring, and then its stiffness will change, and whatever constant for s
we were using will no longer be correct.
What deformation is, exactly #
By the way, at this point you should be able to answer this question, using what we've seen in this section (and in Force I): What is a deformation? A deformation is a change in shape due to the application of force.
How much can you deform the spring before you ruin it? How much can you deform a bone in your body before you fracture it? It depends on the material. This threshold of maximum stress before fracture, is called the Ultimate tensile strength (UTS), often shortened to tensile strength (TS).
Back to F = -sx
. Take a moment to think about this relation. Imagine different combinations of s
and x
, and think about what they are expressing.
For example, what's going on when x
is 0, i.e. F = -s0
? Nothing! The restoring force is zero. Shit is in equilibrium. The gravity pulling the mass down and the spring stiffness pulling it up are cancelling each other. There is no net force on the mass.
Something to think about #
I leave you with a question, which is the one bit of the equation I haven't explained: Why the negative sign in -sx
?
We'll answer this in Vibration II, where we'll delve into the notion of a simple harmonic oscillator. But here's a hint: Remember we're talking about a force that restores the equilibrium point.
Some additional questions:
- How is a guitar string like our spring?
- How about a bow and arrow?
Something to think about, as you listen to my instrumental metal track ESCAPE MECHANICS UNLOCKED!
External resources #
Books #
Life I #
So we have shit, also known as matter, and we have motion. That means now we can have jiggling shit. Presumably, Newton's God created some kind of jiggling soup that exploded and gave rise to this literally insane variety of shit we see (and are) today.
One of the weirdest types of shit is this one that is self-replicating, and constantly self-organizing into more complex forms, by eating each other. I.e. living shit, aka Life. This is just weird, sorry. I just wanted to say that. One of the rules we'll eventually see is that all shit should naturally devolve towards ever more disordered, meaningless shit, aka 2nd Law of Thermodynamics. And yet, we see – and are – this particular form of shit that decided, as an initial primitive simple form, to start ordering itself more and more, consuming energy, getting more complex. Weird. Does the 2nd Law of Thermodynamics conflict with the emergence of life? Let's explore that later.
Back when NASA (a big, clumsy, US government program formed during the dick-measuring contest between the US and Soviet Russia) used to at least be interesting, they sat down one day to agree upon a meaning of life. As in, the technical meaning. I.e.: What exactly will we be looking for, if we go out there to search for living shit in other planets or galaxies?
The NASA employees came up with this definition: "Life is a self-sustaining chemical system capable of Darwinian evolution." However, there are plenty of microbiologists, and other people from other disciplines, who define life differently. So there's no definition of life that is universally agreed upon. The most obvious example of this is the drama that arises among scientists with regards to the question: Are viruses a form of life?
The point is: Some shit we call "alive" and requires energy to make itself more complex. Is there a difference between shit and energy? Is energy in shit? Around shit, like a magic aura? What the shit is energy, anyway?
Information I #
Let's not shit on the scientists who fight over the definition of life, though. Cos I'm basically an "everything is information!" guy, but when you ask me for a universal definition of information, I just... shit my pants! So I'll conveniently postpone the "what is information?" philosophical shit for... never. Mark that on your calendary. For now, to give you something, I'll say this: Information is what data becomes after a human interprets it. Shitty definition, I know. But you get the idea: If I find a paper full chinese symbols, that shit means nothing to me. It's not information. Data? Sure, maybe. Information, no.
Probability I #
"Probability is the most important concept in modern science, especially as nobody has the slightest notion what it means." – Bertrand Russell, Lecture (1929) (cited in Bell 1945, 587)
"We have to come back to something like ordinary language after all when we want to talk "about" mathematics!" – Sir Harold Jeffreys (1891–1989)
Probability can be confusing because everyone – including myself, but also your school teacher, your textbook, and even your favorite "probability rockstar" in pop-sci circles – talks about probabilities using phrasing that gives many people the wrong idea, even in "technical" contexts.
I have three goals for this introduction:
- Tell you why Probability exists.
- Give you a couple of examples of why loose language can confuse people.
- Clarify what one actually means when one talks about probabilities.
There will be no function graphs, no numbers, no sets, or combinatorics talk in this section. All of those things will enter the picture in the following sections. Though I will introduce some basic symbols (without the numbers) at the very end.
To understand what Probability is, and why it exists, let's first see a case where we don't use it. Consider the following proposition:
"John Petrucci owns a 7-string electric guitar."
For most practical purposes, that proposition is going to be either False
or True
, and that'll be the end of it. In a computer program, a "boolean" data type would suffice to describe the state of knowledge about it.
But sometimes we need more than True
or False
. Consider this question:
- Was the UFO reported by the geologist at the Antartica research station an alien spaceship?
When confronted with such questions, we usually don't have enough information to be able to say either True
or False
, so we want something more granular, which we can use while in the process of discovering the truth – ie. the process of arriving at either True
or False
(at least temporarily). Something to use as we accumulate/improve data from observations, experience, etc.
And that is why Probability exists: To represent, with mathematical and logical rigour, the different stages of what we might call "truth discovery," which in common parlance we express with phrases such as:
- "I'd be very surprised..."
- "I don't know..."
- "I bet $50 bucks..."
...and so on.
Since True
and False
are not useful enough as values for such purposes, we use numbers between 0
and 1
(never actually 0
or 1
, cos they're just equivalent of good ol' False
and True
respectively), along with certain operations to engaging in "probabilistic" reasoning and deductions. In this sense, Probability is an extension of Logic.
We also follow strict rules for calculation, so in the study of Probability there's also a calculus to be learned – in fact, a century ago French treatises would talk about "Calculus of Probabilities" instead of "Probability Theory" as we call it nowadays – which we will study later on.
(The big c, Calculus, will also enter the picture later, but in the previous paragraph I'm talking about Probability Theory having a "calculus" too. A calculus is any set of rules for calculation in some context. E.g. Propositional calculus is the system that specifies how to make inferences in Logic.)
Of course, it's not about pulling arbitrary numbers out of your ass to express your feelings about some guesses you have. Ie. you don't just pick 0.416
out of the blue to express how likely you think the UFO from our Antartica researcher was an alien spaceship. You must explain why 0.416
and not, say 0.415
. So we will learn that probabilities are derived from data (or, at very least, from some extremely common, common sense, near-truth/near-false, "for all practical purposes"-type assumptions).
(Spoiler alert: Having methods and algorithms to count things is going to help a lot. Also: There Will Be Fractions. And numerators and denominators in said fractions will be based on counted things. And from such fractions, and operations on them, you'll arrive at such specific values such as 0.416
.)
OK that's enough of why we use Probability. Now let's talk about the shit we all say irresponsibly. The loose way of talking about probabilities which can and does confuse others, and even ourselves.
Consider this:
"The object encountered by the geologist has a probability of 1 in 55000 of being an alien spaceship."
Do you see anything peculiar about that proposition?
I know what you're thinking: It is a "probabilistic" proposition. Yes, it's meant to be. But here's what's "peculiar" about it: It doesn't make sense!
Can you see why? Let see if it gets better if I rephrase it this way:
- "The geologist has a chance of 1 in 55000 of having seen an alien spaceship."
What do you think? Does it make sense now? Let's put the two next to each other:
- "The object encountered by the geologist has a probability of 1 in 55000 of being an alien spaceship."
- "The geologist has a chance of 1 in 55000 of seeing an alien spaceship tomorrow."
Which one do you think is more accurate?
The right answer is: Neither! Both propositions are examples of the wrong speak I'm talking about. Both make the mistake of talking about some mythical substance. Who "has" the probability? The geologist? The UFO? The answer is neither. Because here's the thing: There is no such thing as a probability.
By which I mean: Nobody "has" a probability. No object does. No event does. Both examples above are trying to say the same idea, but they're being loose with language in the same way: They seem to say that a probability is somehow a property, or attribute, of a person, or some UFO, or event.
Another example:
"There is a 30% probability that Zander Noriega's next song is good."
Well, my next song doesn't even exist, so that "30% probability" certainly can't be a property of it. An entity that doesn't exist can't have any property.
Which leads us to the second lesson in this introduction: Probability is not a property of anything. A probability is a measure of an observer's uncertainty.
That's good news, though. I mean, what would you prefer?
- Probability as a mystical substance somehow reified, floating around, being embedded in objects, people, events, as an imaginary property.
- Probability as a rigorously computed number, reflecting someone's degree of uncertainty with respect to some proposition about the world.
Call me crazy, but I'm glad we got the latter. No mysticism here (though plenty of belief all over the place, as I'll explain next.) But you can see why loose language can confuse people into thinking Probability is a mystical substance. Here's one last example, coming from a guy who constantly reminds everyone that he is a master probabilist:
"...the risk of being killed as a pedestrian is one per 47,000 years."
See his use of "the" risk. Implying there's "the" probability of being killed as a pedestrian. But now you know that that's not a thing: No things have "the" probability of this and that. No cars, no pedestrians, no drivers, or roads. "The" probability is not a property of anything in spacetime.
What he means, or, what you should derive from his babble, is that, according to some data that he (presumably) has reviewed, plus his experience and so on, the quantification of his degree of belief in anyone getting killed by a car as a pedestrian is "1 in 47,000 years." And I'm sure somewhere in the calculation there's some total of street crossings per capita, per year, etc. And some "Exponentials," of course.
But what if this month you are living in a particularly busy and chaotic urban area, and have to cross particularly busy intersections, on your way to work, and it's a month with particularly extreme weather, with slippery pavement and foggy vision? Is "the" probability of getting wrecked still "1 in 47,000"? Is the number provided by Mr. Probability Guru of any use to you?
Probably not. (See what I did there?). I mean, you could take his number, act on it as if it was "data," and/or go around regurgitating it. But there's no such thing as "the" probability of being killed as a pedestrian, or "the" probability of dying in a plane crash, or "the" probability of a UFO encounter.
There's only a calculation you can do, based on some data, leading to a number that will reflect your level of uncertainty. Other people might then just regurgitate your number, as if it was "data," but that's a different thing from your number being "the" probability, as some kind of property of the fabric of the Universe or whatever. Never ever forget: A probability is a number expressing someone's degree of belief in something.
Speaking of, let me finish this introduction with a word on "belief." A probability is either:
- Our degree of belief in a hypothesis
H
(from "hypothesis"), given some dataD
. - Our degree of belief in data
D
, given our belief in a hypothesisH
.
Respectively expressed symbolically:
\[P(H \mid D)\]
\[P(D \mid H)\]
It's all about belief. Yes, belief. Not "facts." Facts are True
and Right
stuff. Which is handy, but, whether you like it or not, almost all of your actions are based on belief. Aside from mathematical theorems, "facts" are a minuscule part of your life. You don't really know much of anything for a fact.
You have very little idea of what's gonna happen in your day after you wake up (if you wake up). All kinds of substances, physical and mental illnesses mess with your perceptions and memories. Software and hardware bugs confuse your monitoring devices. Not to mention personal biases, fears, life goals, peer pressure, group think, and emotions in general. And that's not even taking into account that you might also just be dumb as a fence post.
Nonetheless, you still build things. And so do I. That's literally all I do, all day, every day: Create things. Engineer things. From music to software to meals. So we still need to at least compare the relative (un)certainties with regards to various possible beliefs. Probabilities encode our ever-changing incompleteness of information, and Probability Theory provides the logic and calculus for operating with them.
Here's a last quote by one of the biggest minds in history:
"The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful, none of which (fortunately) we have to reason on. Therefore the true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man's mind." – James Clerk Maxwell (1850)
Let this be your welcome to Probability I.
In the following sections, we'll be looking at examples from various areas from the rest of the book.
Books #
Problem Solving I #
You've probably heard of "divide and conquer." (Shit, you are living under "divide and conquer"!) "Divide and conquer" is a problem solving technique.
Say you have 2 problems: P
and Q
. But solving P
and Q
as they are is too hard. So you break them each into easier ("smaller") problems. You could, for example, break P
into A
and B
. And break Q
into C
and D
. Now you have four smaller problems. You solve those, and sum them into the final solution S
. i.e. S = solve(A) + solve(B) + solve(C) + solve(D)
.
"Divide and conquer" splits problems into subproblems. (This is the essence of parallel computing.)
Another, less talked about technique, is the "solve a simpler version of the problem" technique (there's no cool name for it.) Like its name (which is not its name) says, it's about simplifying the problem somehow, and then dealing with that.
For example, say you have to fight a mastodon, but it's too large. One way to simplify the problem is to make it tired. Now you don't have to fight a mastodon. Now, you have to fight a tired mastodon. They are not the same problem. The latter is easier (unless you got yourself tired too, but let's pretend you did it right, with teamwork and shit.) An example in math is logarithms. Logarithms were invented in order to transform unwieldy numbers into simpler ones, operate on those simpler numbers, get a solution, and (usually) transform the solution back to the unwieldy form.
Don't get the wrong idea: This is not about "mathematical problem solving," or "thinking like a mathematician" bullshit. Mathematics is a fantasy land where religious ideas, such as the "Real numbers" and "Infinity," are treated seriously. Mathematician "thinkers" buy into those fairy tales, and then, hilariously, find themselves with all kinds of idiotic paradoxes. "The Banach-Tarski paradox! You can get a bigger circle from the smaller circle!" or whatever the fuck. Yeah, no shit: If you believe in infinity, you're one (infinitesimal!) step away from believing you can get something from nothing.
That's not a paradox, that's your fairy tales being incompatible with each other.
So no, "thinking like a mathematician" is not our goal. Let's think computationally instead. Which brings me to: Floating-point arithmetic. The floating-point number system is a system where numbers – get this – actually exist! It's how we get computers to be somewhat precise. Let me give you a primer.
Functions I #
All events in the world are interconnected. Equations and functional relations are how we talk about this with mathematical rigor. And that's why mathematicians and engineers use these things called functions.
There are two main intuitions people use when talking about functions. The mathematical intuition of a function as a "relation between things" (quantities, abstract elements, etc.) The computational intuition is that of function as a box (or machine) that "transforms inputs to outputs." In goes a signal, out goes a "transformed" version of the signal. Like a guitar pedal. Both mathematical and computational intuitions are used as needed in this book.
In science, you will often obtaining functions to represent events and phenomena is key to understanding the world, because a function will then supposedly "tell us" how one thing "depends" on another. We need to be careful though, because as I explain in Models I, having a function does not imply having an understanding of causal relationships.
I have a confession to make: I wasn't always as hot and rich as I am today. I used to be ugly and mediocre, like Galileo. Until one day I decided to start playing metal guitar. So that girls would like me. But the problem was that my guitar riffs were shit. They were as mediocre as Galileo's attempts at science and math before he stumbled upon his magic little toy (which he deliberately kept from his intellectually superior peers as long as he could, so that he would be "first!" to observe and confirm stuff.)
Luckily, just like Galileo, one day I accidentally came across a new technology, before any of my peers! So I ran back home, and used that technology to become amazingly hot, and rich, and historically important. This technology was a box. A box that, mathematically, looked like this: metalzone( )
.
As you can see, the box has a hole. So I put my shitty riff x
in the hole, like this: metalzone(x)
, and, wouldn't you know it: The output was an amazing riff. And all of a sudden I was a rockstar. I was the talk of the town. The point of the story is: BOSS MT-2 fo' lyfe!
Another, more useful point to that story is that: A function is a way to express the idea of putting some data (a time value, an audio sample, etc.) in a box that transforms it into some other data (the "output").
Functions help us express our understanding of how certain things work in the world. "The amazing sound depends on the original input in this way: {insert Metalzone pedal diagram}." Specifically, we use functions to model the way something (an output such as length, position, etc.) depends on something else (an input such as time, force, etc.) Thus my opening statement for this section, which I'll repeat:
All events in the world are interconnected. Equations and functional relations are how we talk about this with mathematical rigor.
Inputs and outputs will be some numbers expressing, e.g. weight, pressure, time, etc. Anything can be an input or an output. It depends on what aspect of the world we're trying to model. I.e. What problem we're trying to solve.
- Here's a metal guitar noise I made (without a Metalzone, cos the story above is a lie, except for the Galileo parts): Zander Noriega - Abort! Abort! (Abomination)
Functions II #
Here's a collection of numbers: 2, 5, 1, 4, 3. Imagine that's all we have. Just those numbers. Nothing else. In fact, don't even call them numbers. Think of them as literally just symbols. As if you were an alien who just arrived on Earth, and has no idea what e.g. "5" means.
(Yet, conveniently for me, this alien knows about equality. Phew!)
(I don't mean "equality," as in "#Equality #JusticeForJussie." I mean the one that isn't idiotic: Mathematical equality, I.e. =
.)
Anyway, I have this collection of symbols: 1, 3, 2, 5, 4. Let's call the collection "V." We have nothing else. No arithmetic. Nothing. And I would like to have something, so I'm gonna start by defining a function that specifies the successor for each element. I'm gonna call it next(x)
, and I'm gonna define it on a case-by-case basis, like this:
next(1) = 2
next(2) = 3
next(3) = 4
next(4) = 5
next(5) = 1
With these definitions in mind, whenever we see an expression such as e.g. next(3)
, we know we can replace it with 4
. Complicated functions are defined using fancy clever shit called algorithms. Simple functions can be defined as just a list of equations matching each input to an output.
Remember what I said in Functions I: Functions are like guitar pedals. E.g. metalzone( )
has a hole in it. You put a riff in it like metalzone(riff)
and you get an output. Well, here we have next( )
. A box with a hole. If you put "1" in it, i.e. next(1)
, you will get "2," because I say so (literally, I defined it.)
Notice how I didn't say next(5) = 6
. There is no "6" in this world. All we have is the elements in my made-up collection, V. Mathematically, we say that next
is a function "in the V domain." Similarly, next(10)
or next("hello")
are not defined. There is no "10." There is no "hello." Those expressions are, literally, undefined. You are familiar with the idea of operations being undefined in math, e.g. division by zero.
Why didn't I define the whole function in a single line, more abstractly, like this: next(x) = x + 1
? Because there is no +
either. What is "+"? I haven't defined it. There is no addition, multiplication, etc. No arithmetic. We're in the world of V. All we have is the symbols "1," "2," "3," "4," and "5." (And now next(x)
, which I've just defined.) If we want addition, we're gonna have to define it.
Before we do that, though, I want a prev(x)
function. You can probably guess the case-by-case definition. It'd be the same thing I did for next(x)
but with a different mapping of inputs to outputs. E.g. Defining prev(1) = 5
, so as to do the same "looping" but in the "opposite direction," so to speak.
However, could I define prev(x)
in a single line, instead of a case-by-case definition? And no, I don't mean by writing prev(x) = x - 1
, because, you guessed it: There is no subtraction in this world. At least not yet. What I'm wondering is: Could I use next(x)
to define prev(x)
?
Let's see:
prev(x) = next(next(next(next(x))))
Voilá! Can you see why that works? Let's try it with e.g. prev(5)
, replacing things with their definition, starting from the innermost expressions:
prev(5)
is the original expression.- which becomes
next(next(next(next(5))))
(by definition ofprev(x)
.) - which becomes
next(next(next(1)))
(by definition ofnext(5)
.) - which becomes
next(next(2))
(by definition ofnext(1)
.) - which becomes
next(3)
(by definition ofnext(2)
.) - which becomes
4
(by definition ofnext(3)
.)
Note that when I say "by definition," I literally mean exactly that: Look up the definition of the function, and replace input patterns with their respective outputs. Notice that now that we have prev(x)
and next(x)
, we have some notion of "order" to our collection.
(Funny. A function called prev(x)
, defined in terms of a function called next(x)
. Why did this work? Maybe it has something to do with the fact that next(x)
"loops," cos I defined next(5) = 1
.)
OK, so prev(5)
computes 4
, which is intuitive, I hope. We haven't written any proofs that any of this shit works for all possible expressions. Welcome to the world of the programmer: You write shit that might break later, cos you didn't prove it mathematically, didn't test it on all possible values, etc.
We'll do proof work later. Here I just want you familiarized with the situation of starting from scratch. To have to build everything. Arithmetic operations (addition, multiplication, etc.) are not some built-in "behaviors" coming "out of the box" in the symbols themselves that we call numbers.
Notice how every single step taken consists only and exclusively of replacing an expression with some definition. Computation is the replacement of expressions by their definitions, until there's nothing more to replace, e.g. a symbol like 4
. Computation is rigorous search and replace.
Speaking of arithmetic: Now that we have prev(x)
and next(x)
, is it possible to define addition of elements in V? Check this out:
1 + n = next(n)
m + 1 = next(m)
m + n = prev(m) + next(n)
There first two definitions are the obvious: "Adding one" to any number x
, is just another way to express next(x)
. And the third equation specifies what m + n
(for any two symbols that aren't "1") should be replaced with.
In other words, I've defined what it means exactly to use +
in an expression. I.e. I'm saying what to do when you see anything of the form x + y
in the world of V.
Let's try computing e.g. 3 + 2
:
3 + 2
is the original expression.- which becomes
prev(3) + next(2)
(by definition ofn + m
) - which becomes
2 + next(2)
(by definition ofprev
) - which becomes
2 + 3
(by definition ofnext
) - which becomes
prev(2) + next(3)
(by definition ofn + m
) - which becomes
1 + next(3)
(by definition ofprev
) - which becomes
1 + 4
(by definition ofnext
) - which becomes
next(4)
(by definition of1 + n
) - which becomes
5
(by definition ofnext
)
So, 3 + 2
equals 5
? Seems legit! Commit and deploy to production!
(Just kidding, don't deploy any functions before mathematically proving them! Abort! Abort!)
(Wait, I had something for this... Oh, right, a metal track! Zander Noriega – Abort! Abort! (Abomination))
Functions III #
"The formulation of the equation has a practical importance because the constant relations which it includes are experimentally established and because it is possible to introduce specific known values in the function to determine those unknown. These equations thus lie at the basis of technological designing; they are not only the consummation of the theoretical analysis but also the starting point of practical work." – Ludwig von Mises (1881–1973)
What a legend, that Mises.
Anyway, have you ever had your mind "blown" by some set theory geek telling you about infinite sets shit and whatnot? Well, I hate all that shit. The set theory ivory tower shit. I hate sets. That's my strongest activist stance: Kill all the sets!
However, the mathematical definition of a function (as well as the mathematical definition everything, it seems) uses the concept of set, so I'm gonna define a couple of finite sets, to work with. First, this one:
\[X := \lbrace a, e, i, o, u \rbrace\]
That is an example of constructing a set by listing, one by one, the elements it contains. In this case, some letters of the alphabet.
Here's another one, which is also finite, but too much of a pain to write down entirely, so I'm gonna use the dots thing:
\[Y := \lbrace 1, 2, 3, \cdots, 98, 99, 100 \rbrace\]
I'm writing the elements of both sets in order, because of my normie human brain, but I should clarify that in set theory a set has no built-in order. You have to define some functions and shit for it to have order.
Anyway, \(Y\) is a set of 100 elements that I don't wanna write down fully. But it's still finite. I'm gonna be lazy and assume that you know that the unwritten elements are 4, 5, 6, and so on. (And again I'm writing them in the order we all know and love, but technically a set is not ordered "by default.")
I'm already sick of talking about sets, though, so let's leave the rest of the set shit for later, or never. We have what we need now. We can now define a function:
\[f : X \rightarrow Y\]
We say that \(f\) is a function from the finite set \(X\) to the finite set \(Y\).
\(f\) is a rule that assigns to each element \(x\) in the finite set \(X\), a unique element of \(Y\) which we call \(f(x)\).
We can define the rule explicitly, as a list of equations, like:
\[f(a) = 1.\] \[f(e) = 2,\] \[f(i) = 3,\] \[f(o) = 4,\] \[f(u) = 5\]
There are more possible functions from \(X\) to \(Y\). We could map all the vowels to 1
for some reason.
We could have a function from X to itself. E.g.:
\[g : X \rightarrow X\]
Where the rule is:
\[g(a) = e,\] \[g(e) = i,\] \[g(i) = o,\] \[g(o) = u,\] \[g(u) = a\]
Note how \(g\) cycles through the elements. Do you see it? How about this, a set \(Z\) and a function \(r\):
\[ Z := \lbrace n, e, s, w \rbrace \]
\[ r : Z \rightarrow Z \]
Where the rule is: \[r(n) = e,\] \[r(e) = s,\] \[r(s) = w,\] \[r(w) = n\]
That is, a set \(Z\), whose members represent the cardinal points North, East, South, and West, and a function \(r\) representing a clockwise rotation. I say it "cycles" through the elements, because after four rotations you get back to the original position. I.e. for every element \(z\) in \(Z\), we have that \(z = r(r(r(r(z))))\). Just think of going with your finger clockwise. After four steps, you're back to the starting point.
Now here's a function that "goes in the other direction":
\[ l : Z \rightarrow Z \]
Where the rule is:
\[l(n) = w,\] \[l(w) = s,\] \[l(s) = e,\] \[l(e) = n\]
Because they "cycle" in opposite directions, we have that for each element z, \(r(l(z)) = z\). In this case we say that the functions "invert" each other.
So there, now you have examples of:
- Defining a finite set from scratch.
- Defining a function on it.
- Noticing certain properties of functions (invertibility in this case.)
By the way, these "invertible" functions from a set to itself are called permutations. But now I'm venturing into Group Theory, which is some abstract algebra shit that I don't wanna pollute this section with. So let's stop here.
You know what else is a set? The collection of horrible guitar riffs that is my metal track "Escape Mechanics Unlocked"!
External resources #
Books #
Models I #
Just because you have a model, it doesn't mean that you understand causation.
In Functions I, we saw what a function is, and I mentioned that functions are the usual mathematical way to express scientific understanding. I.e. a crucial tool for scientific modelling.
However, having a perfectly valid and mathematically rigorous function doesn't mean that we know what is causing what. This is clear from the mathematical definition of a function: A relation between sets. That's "relation," generally. Not causal relation, specifically. Not all functional relations express causation.
As an example, consider that a text file can be represented as a function \(f: N \rightarrow T\), where \(N\) is a set of natural numbers (representing line numbers, and let's say it's limited to the amount of lines in the file), and \(T\) is a set of text strings from the file. This would be a perfectly valid mathematical representation of the file.
Let me make it concrete, by running a little UNIX command to list the headings of the big file on which I'm writing this book (I know, this won't scale forever):
$ grep -irn '^## ' src/book/index.md
It basically tells the computer "show me the lines that begin with '## '." This is (part of) the output:
src/book/index.md:219:## Information I
src/book/index.md:223:## Problem Solving I
src/book/index.md:241:## Functions I
src/book/index.md:265:## Functions II
src/book/index.md:344:## Functions III
src/book/index.md:441:## Models I
src/book/index.md:477:## Differential Equations I
There's your function. It relates a (line) number, to some string of text.
But does it mine that each line number causes the content it is mathematically related to? Of course not. Trust me, I wrote the book. And I can tell you that it wasn't being on line 241 somehow compelled me to type '## Differential Equations I'."
The main takeaway is that a mathematical function can be helping you model the real world (e.g. the contents of a book), in a perfectly valid and rigorous way, and yet not tell you a single thing about cause and effect.
For another example, consider that you could define this function:
\[ y : A \rightarrow Y \]
Where \(A\) is the set of Cannibal Corpse album names, and \(Y\) is a set of years when Cannibal Corpse release albums. The definition would be something like:
\[y(Vile) = 1996\] \[y(Kill) = 2006\] \[y(Torture) = 2012\]
...and so on and so forth.
It's a perfectly legit function. It correctly models an aspect of the world. But it obviously doesn't say anything about causality. The year 2006 wasn't the year 2006 because Cannibal Corpse released the album "Kill." As much as I love Cannibal Corpse, I have to accept that it was the Earth's rotation around the Sun that caused the calendar year to go up from 2005 to 2006!
These examples should also reveal why the more computational intuition, or metaphor, for functions, as "a box transforming the input to the output" kinda totally breaks too. Was the Cannibal Corpse album "Kill" somehow "transformed" into the year 2006, in the real world? No. So always be careful with metaphors. Metaphors are not reality. They are, at best, temporarily helpful mind tricks. Don't get married to metaphors.
Now, you may think that this lack of causality, this "valid model that doesn't tell us anything about causes," is exclusive to the type of function I've defined so far: The simple, manual mapping from inputs to outputs. But no, later we will see functions that are defined by mathematical formulas, but also don't say anything about causality.
Bottom line: There is no reason to assume a causal relationship between the input and output of a function.
You will hear people talk about "toy models," "analytical models," "computational models," etc. What types of models are there? This will be the topic in Models II and III.
For now, take a break. And listen to my death metal track Escape Mechanics Unlocked!
P.S. here's a more complicated
grep -irh '^###\? ' src/book/index.md |
grep -iv 'Books\|Resources' |
sed -E 's/^## /1. /' |
sed -E 's/^### / 1. /'
grep -h '^###\? ' src/book/index.md |
grep -iv 'Books\|Resources' |
sed 's/^## /1. /' |
sed 's/^### / 1. /'
Which achieves the same, but with three commands instead of one. Explanation line by line:
grep -irh
to:- Iterate th
Books #
Differential Equations I #
Fuck function graphs.
Instead of entering the Calculus headquarters through the front door, we're gonna break in through the kitchen window, at night, while the fat watchman naps.
I think I mentioned in Functions II that sometimes I prefer to talk about some "function of time" by showing you, not the cartesian plane, nor the "algorithm," but rather just a limited list of "outputs." That's what I'm gonna be doing here, to get us started.
So let's start with this data sequence. I'm gonna call it f
(yes, later f
will be a function, that's why I'm using that letter for the name, but right now it literally is just a plain old list of numbers. And you only know they're "outputs," cos I just told you.)
f = [1, 2, 3, 4, 5]
That looks like an equation, though. Because of the equals sign =
. But I really just wanna give a name to the sequence. So let me use the symbol :=
instead, which means I'm just providing a definition:
f := [1, 2, 3, 4, 5]
Imagine these are the results we get from repeatedly asking for the position of a robot. I.e. the outputs of some function. I.e. data.
Assume some unit of distance for the type of each value. Say, meters. As in, meters from some origin point.
Just looking at that, what can you tell me about the robot? We don't know color, size, or anything other that the data we have. And the one thing we can say is that it's clearly moving. That is literally what it means to have a position change: It's moving.
The rest of our conversation is gonna be about this: Asking questions about data. That's all.
I'm gonna rename f
to pos
, because we're really talking about the "position from the starting point":
pos := [1, 2, 3, 4, 5]
Yes, I'm describing a position in one dimension. I don't care about 3D space. I'm thinking of the moving robot as a point along a line. One axis, if you will. Moving left to right, from my perspective.
(Or moving vertically, if you prefer. I.e. like a rocket. I'm ultra simplifying things to you, but I do expect you to be able to carry multiple analogies at once. Make up some yourself. Re-read sections with your analogies in mind.)
OK so now I'm gonna show you another sequence, based on analyzing pos
:
f' := [1, 1, 1, 1, 1]
Kind of a boring sequence, but this is what's important, in case you didn't catch it the first time I said it: f'
is based on pos
.
So can you guess what my analysis was? What question did I ask about pos
, such that f'
is the answer?
I asked the question: "How much does each output change compared to the previous one?"
I'm gonna rename f'
to vel
, because it's really telling me how much the position is changing each time, i.e. How fast it's going, i.e. the robot's velocity:
vel := [1, 1, 1, 1, 1]
OK so the robot is going at fixed 1 meter per... Whatever the unit of time is. I already forgot. See? That's the problem with not being specific about types. Anyway, it's seconds. Let's pretend we're looking at data that was sampled per second. I.e. a robot whose position is reported every second. Or hour. Hell, century, if it makes it more interesting to you. Maybe it's not a robot, but a tectonic plate moving North.
In short: vel
tells us how fast pos
is changing.
Anyway, here's another sequence, that I get from asking a question about vel
:
f'' := [0, 0, 0, 0, 0]
Can you guess which question I asked now?
That's right, same question again: "How much does each output differ from the previous one?" I'm just asking it about vel
this time.
And, as you can see, the answer is a bunch of zeros. I.e. nothing. Nada. I.e. the velocity is not changing. Always 1
.
I'm gonna rename f''
to acc
, because that's what we're talking about now: Acceleration.
acc := [0, 0, 0, 0, 0]
In short: acc
tells us how fast vel
is changing.
The first takeaways here are:
- We built each sequence from analyzing the previous one.
- All analyses are essentially the same question: What's the rate of change?
- i.e.
vel
tells us the rate of change ofpos
. - i.e.
acc
tells us the rate of change ofvel
.
One way to categorize the sequences that we derived from pos
is by using the terminology of "order": We call vel
is "first order," acc
is "second order," and so on, if we kept deriving sequences.
In proper math we'd say "first order derivative," and so on. f'
would then be the proverbial "first derivative of f", being equally generated: f' = [f'(t), f'(t+1), ...]
. But we're in my "data sequence world" right now.
We stopped at acc
because we're not gonna derive anything from acc
, since we can see that its rate of change doesn't change at all. Bunch of zeros. Whoever controls the robot, with a joystick or something, is keeping the joystick at a fixed position the whole time during our observation (perfect pulse? had it all the way forward? who knows).
Let's introduce some math now.
Remember this equation from Force I?
\[F = {m}\times{a}\]
It's the equation that relates force, mass and acceleration, aka Newton's 2nd Law of Motion. The variable a
in there is the acceleration. And as we know now, as a function of time, acceleration is the "second derivative" of position, or the "first derivative" of velocity. Let's define it as the latter:
\[a := \dfrac{dv}{dt}\]
There's more than one "dialect" in math to express this, but for now we're gonna go with this one.
Now let's rewrite the 2nd Law, using our definition of a
as the first derivative of velocity.
\[F(t,v) = {m}\times\dfrac{dv}{dt}\]
Now we're looking at an equation that contains a derivative. We call it a Differential Equation, because it contains a derivative.
You can see that I made time explicit too, on the left side. More on this later.
Anyway, let this be your welcome to Differential Calculus. It's not some "blow your mind" thing. It's a "deriving data from other data" thing.
Now go take a break. Listen to some of my "music": Escape Mechanics Unlocked"
External resources #
Books #
Sound I #
"In Space, no one can hear you scream." – ALIEN (1979), movie tagline.
This is one of those "facts" that people repeat all the time: There is no sound in space. Which is fine. It's true enough. The problem is that people will often "explain" it further, by saying it's because "sound needs an atmosphere in which to travel," which is wrong.
So let's start there: "There's no sound in Space." Yes, correct. "There is sound here on Earth." Also correct, obviously. And, using your powers of deduction, you can conclude: Therefore, sound requires... something, in order to exist.
So far so good.
Before we go on, though, let me set a rule: We do not say that sound "travels." Death metal screamers: I don't want you to think of your screams "traveling" from your mouth to the mic. There is no such traveling entity, or agent. There is no "scream" that is "traveling" from your mouth to the mic.
If you think sound "travels through the air," you're gonna be a shitty singer, engineer, and you will never understand anything. Your life will be that of failure. Do not think of sounds as things "traveling through the air."
Why? I'll explain in the following sections.
Before I close this introduction, though, let me leave you a legit definition of sound, from a badass engineer:
"Sound is an alteration in pressure, particle displacement, or particle velocity which is propagated in an elastic medium, or the surperposition of such propagated alterations. Sound is also the auditory sensation produced through the ear by the alterations described above."
– HARRY F. OLSON (1901–1982), "Music, Physics and Engineering" (1952).
Olson was a pioneering engineer extraordinaire. Born in America to Swedish immigrants, he was engineering all kinds of things from an early age: Building and flying model airplanes, building a steam engine, inventing a wood-fired-boiler-to-100-volt-DC-generator system, designing and building amateur radio transmitters, and so on and so forth.
Yeah, one of those guys.
He then went on to study electrical engineering, and to work for RCA Laboratories for almost 40 years, where he made lots of inventions as evidenced (at least partially) by the long list of patents to his name, involving microphones, loudspeakers, amplifiers, noise reduction systems, music synthesizers, sound absorbers, etc.
He also took the time to write some classic sound physics and engineering books, one of which ("Acoustical Engineering" (1957)) was used as a bible by The Grateful Dead's early sound engineering crew, for designing the whole "Wall of Sound" thing.
His definition of sound will make more sense once you read through the next sections.
Let this be your welcome to Sound I.
Sound I - Medium I #
"To a great extent the theory of Sound, as commonly understood, covers the same ground as the theory of Vibrations in general [...] As a general rule we shall confine ourselves to those classes of vibrations for which our ears afford a ready made and wonderfully sensitive instrument of investigation." – JOHN WILLIAM STRUTT, BARON RAYLEIGH, The Theory of Sound (1877).
I opened Sound I with the Alien tagline, but let's switch to Star Wars for a second, because I have an anecdote for context.
The other day I came across some comment on an Internet sci-fi forum thing, where someone was saying that Star Wars is "soft sci-fi." That it is "more fantasy than sci-fi."
Impressive. Noticing that SW is not scientifically accurate. How did Mr. big brains arrive at such insight? Was it the people vanishing out of their clothes, when getting killed by a 900-year-old little green monkey with a sword made of light?
No. He says the reason Star Wars movies are "more fantasy than sci-fi" is that they have "sounds in Space" during the battles. "In real life, you woudn't hear those explosions," he says. Because there is no sound in space. Because "sound needs an atmosphere in which to travel."
So that's how whales and dolphins communicate sonically, is it? By shooting their whistles up into the air, so they can "travel" across the atmosphere?
No. Dolphins and whales rely on sound, but it's all underwater. It's not that sound "needs an atmosphere," it's that sound needs a medium.
Back to Alien's tagline, let's play out the hypothetical situation:
The two of us are in Space. Well, three of us, counting the xenomorph. The xenomorph bites your leg or something. You scream at me, for some reason. Always blaming me for everything. Your vocal cords and shit vibrate normally, as usual. I.e. all the mechanics of your screaming are working as expected. But those vibrations in your body are not disturbing anything else, outside of you.
In this "vacuum of Space" there are no water molecules, no air molecules, nothing that vibrates the way we need for sound. "In Space, no one can hear you scream," because the vacuum of space is not a proper medium for propagating the vibrations we perceive as sound.
One way to visualize what I call a "sound-friendly medium" is as a collection of particles. Particles that can vibrate. It could be the ocean. Or some atmosphere. Or air in a helmet. The particles have some mass, so you can disturb them, and they can disturb each other. And by "disturb them" I mean make them vibrate. And by "vibrate" I mean displace them, back and forth, from their default equilibrium position.
The short slogan is: Vibration makes sound. A more complete slogan would be: Vibration makes sound if you're in a medium that propagates vibrations interpretable as sound.
Why do I say "interpretable as sound"?
Does I mean that not all vibrations can be interpretable as sound by our brains?
Sound I - Frequency I #
A thunder in the distance.
A bird's whistle.
The former is a low frequency. The latter is a high frequency.
Grab a piece of metal. I don't mean like a part of a Cannibal Corpse song, I mean an object. A metallic bar. And hit it. I assume you're not in Space, so you're in a vibrating system. Ideally, the thing you're hitting is a tuning fork. That's a metal tine that goes "diiiing!" when you hit it, generating a simple vibration, which we perceive as a simple sound. The rate at which it vibrates, we call frequency. The unit is hertz (Hz). E.g. If it vibrates 440 vibrations per second, we say it's a frequency of 440 Hz.
Why did I specifically say "simple" vibration, "perceived" as a "simple sound"?
Because, in real life, when you hit something, it will not vibrate in one single back and forth motion. We could more accurately say that, when you hit the tine, 440 Hz is the frequency of the main, or "most prominent," or fundamental, vibration, among other "secondary" ones. But we'll delve into that complexity later. For now, let's say this: Theoretically, the simplest sound, in terms of frequency, is a "sine wave," which corresponds to a "pure tone," acoustically. Hitting a tuning fork gets close to that.
So hitting a tuning fork causes a simple, periodic, "back and forth" type of vibration, which translates to a "pure tone" acoustically. And by the way, the opposite of "pure tone" is "complex tone," not "impure tone"!
How can we cause a complex vibration, then? Try grabbing a drum stick and hitting a crash cymbal (preferrably as your drummer is playing, to annoy him.) See how the cymbal is vibrating chaotically? That complex and chaotic collection of vibrations gives it its noise-like, waterfall-like, quality.
Go back to the tuning fork. It's called a "diapasón" in Spanish. I will refer to it in both ways, to keep you on your toes. In the past, musicians used this thing as a referece to tune their instruments to. Nowadays we use better technologies. But we all should have one, if only for experiments and teaching. Also, tuning fork-type devices are still used as part of some musical instruments. I'll give an example of that in the next section.
We'll delve into hearing later, in "Sound I - Anatomy I," but let's ask this question already: Can our ears sense all frequencies?
No. The range of frequencies our ears perceive as sound is limited. E.g. we can't hear some frequencies that other animals can. Also, this range shrinks as we grow older, not so much due to age, but rather due to living in noisy cities, listening to loud shit on headphones, going to wars, dating militant feminists, etc. I.e. We put our ears through constant torture.
But, injuries and mutations aside, humans can hear in the frequency range from 17 Hz to about 17,000 Hz.
Stop hitting the diapasón. I have a history question:
How are these two things related?
- World War II aircraft.
- A certain classic, mellow 1970s piano sound.
The next section has the answer to this thrilling mystery!
External resources #
Books #
Sound I - Pitch I #
Vibration is what happens mechanically. What emerges acoustically we call a pitch. The pitch is what the singer, or musician in general, cares about. E.g. A basic wave of 440Hz from a vibrating tuning fork, we will perceive acoustically as a pitch, which we call "A."
Back to the question: How are World War II aircraft historically related to a classic, beloved mellow 1970s piano sound?
I'm sure you've heard that mellow 1970s sound. It's a peculiar type of piano. A piano that was designed by an American piano teacher and inventor, who would make miniature pianos from scrapped airplanes during WW2, and would craft his piano lessons as a form of therapy to soldiers. His name was Harold Rhodes.
Rhodes pianos sound super smooth, and don't go out of tune. This is because, unless you're a real son of a bitch, and really try hard to deform it, a metal tine will never go out of tune. And, unlike traditional pianos, whose keys are hammers that hit strings, the Rhodes' keys are hammers that hit metal tines. Inside a Rhodes piano you have basically a bunch of tuning forks, of the right shape and mass, to vibrate at specific frequencies, producing the right pitches for making music.
Sound I - Noise I #
Go back to the Rhodes piano, and press one key. As I said, each key is a hammer that hits a specific metal tine, and each tine vibrates at a specific frequency to get a specific musical pitch. So you've pressed a single key. We're now hearing a pitch. There is regularity to the vibration generated by one note, which translates to regularity, or periodicity, of the vibrations of air around your ears.
(The musician may say "note" instead of just "pitch" when he talks about more aspects of the sound, such as duration, timbre, etc. That's for a separate conversation. See Music I)
Now, what happens if you press several keys at once?
E.g. Press the first, third, and fifth white keys, left to right. What are we hearing now? It's an addition of vibrations, obviously. (We've actually been hearing an "addition of vibrations" all along, but I won't get into that yet.) This is more complex sonic information to our brains. We can call it a "chord." But what happens if you press all the piano keys at once?
Use both arms or something. And a leg, maybe. But be careful. Press all the keys at once. What are we hearing now? This is much more information. In fact, it's so much "information," that it's not information at all. We're gonna call this noise. Musical tones have an order to them, whereas noise is basically the addition of so many irregular vibrations that our brain treats them as non-information.
Now get off my Rhodes, you fucking animal.
Sound I - Checkpoint #
So far we've gone over:
- Medium and vibrations.
- Frequency and pitch.
- Noise.
What's another fundamental property of sound, that I haven't talked about yet?
Sound engineers, look at your console...
Sound I - Level I #
"So violent [...] that the ear-drums of over half my crew have been shattered. My last thoughts are with my dear wife. I am convinced that the Day of Judgement has come." – CAPTAIN'S LOG, BRITISH SHIP NORHAM CASTLE (1883)
What the hell was going on there? You'll find the event in question on a table below. But first, let's learn about sound intensities.
Vibrations can be subtle, strong, and every intensity in between. That's what makes sound have different levels, which we measure in decibels (dB). The dB scale is logarithmic, because it makes the mathematics of sound better express human perception, which, in turn, makes engineering easier. Scratch that: It makes sound engineering possible. So we measure sound intensity in decibels (dB) so we can engineer cool stuff.
I mentioned in Problem Solving I that logarithms are simpler numbers that better serve how we perceive and use some things. The dB scale is a perfect example of this.
If your mixing console used faders made with a plain linear resistance, it would work like this: The top 4/5 of the fader's range would change little of the sound loudness. All the loudness variation would be in the bottom 1/5. In other words, it'd be stupid. A non-logarithmic fader on your mixing console would be wasteful and impractical.
The pressure of the loudest known sound is more than one billion times the pressure of the faintest sound. Now ask an engineer to design you a usable measurement tool for that range. Is he gonna build you a 1 kilometer long fader or something? Or a normal size fader that's super oversensitive? A logarithmic range of 0 dB to 200 dB is more practical than a linear range of 0.00002 pascal to 20000 pascal.
There are a few variations of the dB scale. E.g. The dBA scale is like the dB, except adapted to account for the different reactions our ears have to different frequencies. A 100 Hz tone at 100 dB has a certain loudness to our brain, equal to the loudness of a 1000 Hz tone... at 80 dB. Our ears hear some frequencies more than others. The dbA scale is "A" weighted, using some curves that approximate human hearing.
But let's leave further talk of the dB to more advanced sections.
Just like with frequency range, we also have a minimum and maximum of intensity that we can handle. Very low intensity vibrations are just silent to us, obviously. Too strong vibrations can harm, and even permanently damage, our ear drums. 40dB is about the lowest we can hear, and 120dB is very loud, and also the point at which our ears start to hurt (which is why 120 dB is called the "threshold of pain.")
Here are the levels for some familiar (and not so familiar) sounds:
Sound Pressure Level | Sound Pressure | |
---|---|---|
Quiet woods | 15 dB | 0.0001 Pa |
Bedroom | 20 dB | 0.0002 Pa |
Library | 38 dB | 0.0016 Pa |
Conversation | 58 dB | 0.016 Pa |
Normal traffic | 80 dB | 0.02 Pa |
Blender | 88 dB | 0.5 Pa |
Pneumatic hammer | 100 dB | 2 Pa |
Rock show | 110 dB | 6 Pa |
Firecrackers | 125 dB | 36 Pa |
Airplane take-off (from 25m) | 140 dB | 200 Pa |
1883 eruption of Krakatoa (from 160km) | 172 dB | 7962 Pa |
Saturn V rocket | 204 dB | 316978 Pa |
(Note how the logarithmic dB scale makes numbers more manageable.)
WARNING: As the table suggests, the relative perceived intensity of sounds works like this: Your hearing mechanism perceives a sound to be "twice as intense" as another, already when it's 3 dB higher. The "Conversation" at 58 dB is much, much louder than the "Library" at 38 dB, even though the number 58 is not even twice 38.
The smallest change in sound level that we are able to detect is around 1 dB. This is called the "just-noticeable difference" (JND) in psychophysics.
Finally, you can now logically deduce the answer to this one: What is silence?
Silence is what happens when the medium's pressure (usually the atmosphere) around your ears isn't being changed by anything intensely enough to disturb your ear drums.
External resources #
- Speaking of silence, here's an old track of mine! Zander Noriega – Enjoy the Silence
- (Text/HTML) ANSI/ASA S1.1 & S3.20 Standard Acoustical & Bioacoustical Terminology Database
Books #
Sound I - Waves I #
When I say "wave" in this section, try not to think of the rings that form when you throw a stone into calm water.
Instead, imagine a sphere made of many sub-spheres. I.e. Layers, or levels, like an onion. Think of each layer as either condensed, or rarefied air. Those are, respectively, the crests and troughs of a sound wave. Because when you disturb a particle in the air, it will disturb its neighbors above, below, etc. A spherical propagation of vibrations.
Say you're a metal singer, and you're thinking "I'm sending my powerful vox from my mouth to the mic!" while recording. No. This is what you must think instead:
I'm vibrating every bone in my skull and torso. Shaping my mouth to direct some of the energy. The vibrations are propagating spherically. This energy sphere is expanding and colliding with the walls, floor, and ceiling, etc. Now they are vibrating. Creating additional energy spheres, which overlap with my original sphere (which I'm still generating, as I sustain my scream). A sum of all this is being caught by the mic.
Your "voice" is the sum of the whole atmosphere vibrating, because of your whole body and the room's acoustic properties. This is why body posture, mic position, and room acoustics, all matter.
A powerful, monster, beast scream, is not something that sounds like it's right next to your ear. Any child can get up close and disturb your ears greatly. A fucking mosquito can make itself heard when near enough. (Clarification, though: The mosquito's sound is from its little wings flapping super fast. That's the buzzing sound.) A beast is a thing whose roars shake the whole fucking room. So room sound in metal vocals is often key.
However, too much reverberation makes things sound "distant," and a "beast in the distance" is usually less of a threat! Keep that in mind too.
So how do you make your metal (or video game, or movie) monster scream to sound threatening? Gotta find that sweet spot: Clear and dry enough for the brain to go "Oh shit, the beast is near me!" yet with a room sound (either real or manufactured in the mix) to make the brain go "Oh shit and this beast is powerful cos it's making the room vibrate."
Recording studio acoustics is not just about slapping some shit on the walls to "kill reflections" or "isolate." Proper room acoustics must take into account the artistic necessities of the types of instruments (including vocals) meant to be recorded in it.
Try setting up a mic on a snare. Place it near the snare, pointing at it. Record it. How does it sound? Like shit. Go back and tinker with the placement. It doesn't fucking matter: The close snare mic always sounds like shit, no matter what.
This is why room mics, overhead mics, "mic leak," and/or artificial reverberation, are always added to the close snare sound in the mix. To recreate the complex natural signal. So don't panic, recording engineer: Close snare mic sounds like shit. You always have to "fix it in the mix."
But I digress. Acoustics and sound engineering should be separate sections.
Enough about screaming monsters. Let's go back to the tuning fork.
Strike the tuning fork again. It's vibrating now, and you're in a sound-friendly medium, so the disturbed tine disturbs the particles in the medium (atmosphere in this case), and the medium lets particles disturb their neighbors. This "chain of disturbances" has a certain order, or pattern, that we call a "wave." Sound waves are compressional waves. They are made of compressions and decompressions of the medium.
How fast sound waves propagate depends on the medium where the propagation is taking place. In seawater, sound waves propagate at about 1500 meters per second. That's like 15 soccer fields end-to-end per second. In the air, sound propagates at about 340 meters per second, much slower.
Sound waves travel faster in water because water is denser than air, which means particles will more likely bump into one another. There are about 800 times more particles in a bottle of water than there are in the same bottle filled with air.
The first accurate experiments to measure the speed of sound were done by members of the French Academy in 1738. They fired cannons, and observed the retardation of the reports at different distances. Temperature and wind affect everything, but back then they nailed it down to 337 meters per second, for still, dry air at 0 degrees Celsius.
Walk into a cathedral. Snap your fingers. Or say "hello." You'll notice that the sound doesn't just cut off instantly. Even seconds after you stopped causing vibrations, you're still hearing the "tail" of the sound so to speak. That's what we call reveberation. But that's not what I wanna talk about. I wanna ask this question:
When does sound stop? Or, why do the spherically propagating vibrations ever stop?
Books #
Sound I - Energy I #
Important point: The medium itself does not flow. There is no matter moving to "carry the sound." The medium is not being restructured. The "waves" transmit energy from one place to the next, without the medium moving any matter around. The vibrating tuning fork is not making pieces of atmospheric mass travel to your ear.
Have you ever been to a sports game, or watched it on TV? You know the audience "wave" thing they do? Well, if you take the audience as the medium, notice how we can talk about the wave "moving." The wave is "traveling," right? But notice that nobody in the audience is literally traveling with the wave. The particles (each audience member) are barely moving in that axis. They're only vibrating (sitting up and down). The wave is not moving matter along with it.
Now, think of each person's energy. Loosely speaking, there's some stored energy in each person. When they're sitting, i.e. doing "nothing," their energy output is zero. When they get up, they are emanating some y
amount of energy. The wave is an organized pattern of energy.
When you hit the tuning fork, the medium, i.e. the atmosphere, doesn't move matter from the tuning fork to your ears. The wave transmits energy from one place to another, without moving matter.
Yet another way to put it: The motion you used when exciting the particles in the tuning fork, has been communicated to your ear drum, which is elsewhere in the medium.
The push-pull disturbances create the wave. Its characteristics are the frequency, intensity, etc. The properties we've gone over. As each particle disturbs its neighbors, the vibrations preserve (or are based on) these characteristics. The chain of push-pull disturbances, i.e. the wave, extends all the way to the atmospheric pressure around your ear drums. Energy dissipates and vibrations stop, according to the medium, and this dictates how you perceive the way sounds die off.
Remember how I've been talking about a medium being "sound-friendly," and "having properties that define how things vibrate in it." That's been too vague so far, so let's ask the question: What exactly determines how things vibrate in a medium?
For this, let's make the guitarists happy (and I happen to be a guitarist, too. Lucky me!) and use their instrument for this example:
Imagine an idle guitar string. Undisturbed, doing nothing. In its equilibrium state. Then you strike it with your pick. With a strong death metal palm-mute. This supplies energy to it. It starts to vibrate. As it vibrates, it dissipates its energy, radiating it away as heat and/or sound. You know this. Jumping to audio for a second: This is the "tail" of the amplitude of your waveform. That's the string dissipating energy. Eventually the energy dissipates completely and the string returns to its equilibrium state.
I mentioned in Sound I - Level I that intensity of sound corresponds to intensity of vibration. And I told you here that vibration stops when all energy dissipates (as sound and/or heat.) Well, then: Sound level corresponds to energy level.
Obvious, perhaps, but it can't hurt to say it explicitly.
Now you know what I mean by "properties of the medium." I mean properties such as its particular combo of "vibratory" and "dissipative" forces.
Sound I - Space I #
Interaural time differences (ITDs) are one of the primary cues available to the auditory system for determining the spatial location of sound sources. ITDs come about due to the separation of the two ears in space, and the resulting differences in path length that a sound must travel to reach the two ears. – Virginia Best and Jayaganesh Swaminathan. "Revisiting the detection of interaural time differences in listeners with hearing loss." 2019. Journal of the Acoustical Society of America.
Imagine there's a sound source producing disturbances in a medium. Your brain, conceptualizing it as a single agent, analyzes, for example, the differences in time and intensity of said disturbances on each of our ears. This is one of main factors that influence your perception of spatial content in sound.
So you have these factors:
- Interaural Time Differences (ITD).
- Interaural Level Differences (ILD).
According to a 1986 study by Blauert and Lindemann, your brain needs ITDs of around 650 microseconds and ILDs of approximately 12 decibels for accurate localization of a sound source.
If you're an engineer mixing music, this should ring bells (in stereo!)
When mixing, you are often trying to make things sound "wide." As in "stereo" vs. "mono." And usually, the reason your "double-tracked" metal guitars (to give an example) don't sound "as wide" as those in someone else's mix that you admire, is that the left and right sound content is too similar.
The point is: The perception of spatial content in sound emerges from, among other things, the differences in time and intensity of the disturbances, caused by anything we consider a single agent, on each of our ears.
External resources #
- (Text/HTML) How Well Do Dogs and Other Animals Hear?
- (Text/HTML) 1883 eruption of Krakatoa @ Wikipedia
- (Text/HTML) Interaural time difference @ Wikipedia
- (Text/HTML) Audio Spatial Representation Around the Body
Hearing I #
"Since I had only a diploma for physics, I had very big difficulties to get permission for receiving some heads of cadavers that I could dissect."
Anecdotes from someone's autobiographical notes that I'm reading at the moment. I'll tell you who the author is below.
Why does a death metal vocalist sound similar to a pig?
Why, and in what way, do the two sounds differ from each other?
And how do they both differ from the sound of, say, a bunch of chopsticks falling on a ceramic kitchen floor?
In Sound I we went over the fundamental physics of sound, so it's probably a good idea to read that section before this one, if you don't know what I mean when I say e.g. "frequency." Here we're going to learn about what happens after everything we talked about in Sound I. The actual hearing, as done by your ears and brain.
"Fortunately the solution was simple because in every anatomical institute there are two doors, one in the front [...] and then a back door where the cadavers are taken in and out. I found out that by going through the back door, I could get as many heads as I wanted."
"This permitted me to dissect inner ears of not too old cadavers and that gave the base for all my later work."
"...of not too old cadavers" Good! Never take too long before you dissect cadavers, because, otherwise it'll be like... "Took Too Long, Already Rotten." Come on, I always have to plug my death metal "music."
Anyway, how does your brain separate the guitar solo from the rest of the instruments? How do we go from an ignorant, clueless, purely mechanical, vibrating ear drum, to the recognition, understanding, and appreciation of each individual part as well as the whole mix?
The answer requires a combination of anatomy, physics (that's right: we're not done with waves, even inside the ear), chemistry, and neuroscience.
"I am very thankful to all the people who helped me in that respect. I am especially thankful to one police officer who one day told me that he could have arrested me any time for murder since I carried a human head in my briefcase."
So let this be your grim welcome to Hearing I.
(Don't worry, it won't be too cadaver-heavy. Though we will begin with anatomy.)
Hearing I - Békésy #
"After a certain time, it was quite clear that Hungary will be occupied by the Russians. I don't want to talk about my experiences during the siege of Budapest. It lasted a long time and nobody was sure if he will survive. Most of my friends were killed during that period, my mechanics were deported to Russia and everything became unproductive and there was no way to continue research."
His name was Georg von Békésy (1899-1972).
He was a Hungarian biophysicist who clearly had an interesting life, so I will certainly not do it justice with this short bio. But he studied chemistry in Berne, Switzerland, and got a PhD in physics on the subject "Fast way of determining molecular weight" in Budapest. The WW2 happened, and, while most of his friends died, he kept working on signal quality in telecommunications.
His telecommunications work got him interested in the mechanics of hearing. He was eventually awarded the Nobel Prize in Physiology or Medicine in 1961 for his research on the function of the cochlea in the mammalian hearing organ. Then his lab was destroyed by fire in 1965, and he moved to Honolulu, Hawaii, where he taught at the University of Honolulu, until his death in 1972. He was 73.
His contribution is central to the current understanding of the inner ear, even though he had limited tools. Cadaver heads, optical microscopes, and other basic 1900s technology. He basically had to blast super high level sound waves onto dead human ears, which isn't exactly how you experience your favorite Bach Toccata! So in Hearing I we are going to learn also from more recent developments made possible by modern technology.
External resources #
- (Text/HTML) Georg von Békésy @ Wikipedia
- (Text/HTML) MY EXPERIENCES IN DIFFERENT LABORATORIES, Georg von Békésy (1899-1972)
Books #
Hearing I - Cochlea I #
"It turned out that for certain frequencies, let's say for high frequencies, the maximum of the travelling vibrations was near the entrance to the inner ear and for low frequencies, it was far away. This way, it became evident that there is mechanical discrimination of frequency done in the inner ear." – GEORG VON BÉKÉSY (1899-1972), My experiences in different laboratories.
The cochlea. That magnificent spiral, fluid-filled labyrinth in the inner ear.
It's a coiled duct. Coiled like the shell of a snail, it consists principally of two fluid-filled channels (actually three, but I'm leaving the third one out in this section), basically tubes, whose cross sectional areas start large and get narrower as they coil together upwards. These internal channels are separated by a membrane. And this membrane supports all along its edge an organ that detects and amplifies motion. The cochlea is the ear's filtering and amplification system.
This membrane that separates the two main fluid-filled tubes (the scalae) is called the basilar membrane (BM). It interacts with the fluid in the tubes, and is constrained by their shape, which affects the way it forms a transmission line for mechanical waves to travel. Along the process, both linear and non-linear operations happen. The cochlea even acts as a distributed amplifier, adding energy to the traveling waves to boost the response to weaker sounds.
You probably have questions already:
- Fluid-filled tubes? What fluids, what's their chemical composition?
- Filtering? Amplification? How?
All good questions. But let's finish the general overview first.
Bring back the tuning fork, from Sound I (which I recommend you read before this section), and hit it.
Diiiiing!
From the point of view of the cochlea, what's happening is that sound waves are reaching the outer ear. Then they're getting transduced into the fluid in the cochlea. This transduction from air to fluid is done by the ossicles. The ossicles are a group of bones (ossified portions of cartilage) in the middle ear, that are among the smallest bones in the human body. They work together like a machine. Their names are malleus, incus, and stapes.
The ossicles mechanically amplify the force of the received vibrations like this: The malleus is joined to, and therefore moved by, the ear drum. The incus is in the middle of the malleus and the stapes. And the stapes (which btw is the smallest bone in the human body) is joined to, and directly pushing and pulling on, the oval window (fenestra vestibuli or fenestra ovalis). The oval window is the membrane separating the air space in the middle ear from the cochlea.
On the other side of the oval window is the scala vestibuli, one of the two fluid-filled, spiralling-up tubes mentioned above. So now you got differential pressure waves propagating through the cochlear fluid of the scala vestibuli, thanks to the stapes. I.e. The stapes conveys the (amplified) energy from the vibrating ear drum to the fluid in the scala vestibuli.
Meanwhile, the other channel, or tube, called the scala tympani, is not being driven by the ossicles. Instead, this one is coupled only to the air in the tympanic cavity of the middle ear.
From studying Pascal's Law, or from using a hydraulic lift, or from strangling a water balloon to death in self-defense, you know that when you push an incompressible fluid, it has to go somewhere.
Well, when the oval window pushes on the essentially incompressible fluid in the scala vestibuli, the round window (fenestra tympani or fenestra rotunda) on the other end bulges out. Pushing in on one end, bulging out on the other, and viceversa. I.e. two membranes at the ends of a tube, vibrating in opposite phases. Just like the top and bottom membranes of a drum. The forces, or pressure, causing this motion, create pressure differences all across the cochlear partitions.
Let's go back to the membrane that separates the two scalae: The basilar membrane (BM).
Being somewhat stiff but springy, the BM gets deflected as well. And not only does it separate the scalae, but also supports the organ of Corti. This organ, sitting on top, and along the edge of the BM, has the assembly of outer hair cells that add energy to the travelling wave. It also has the inner hair cells, that detect sound-induced motion.
The BM is not just being deflected for no reason. As it bends, it too is generating a wave. A displacement wave. So in total, from the ear drum's vibration, we get two coupled waves traveling from the base to the apex of the cochlea: The waves in the scala vestibuli's fluid, and the displacement wave of the BM's motion.
Our friend, Von Békésy was awarded the 1961 Nobel Prize for his pioneering work on what we're talking about: Cochlear mechanics.
He discovered the cochlear mechanical traveling waves. He did the first measurements of the BM's vibrational response to sound, and showed (using his handy cadaver heads) a key part of the mechanics: Frequencies are mapped to longitudinal position along the BM.
He realized this from having observed that the BM's stiffness decreases by 2-4 orders of magnitude as a function of distance from the stapes. So the motion of the basilar membrane allows the cochlea to essentially do Fourier Analysis! (See Fourier Transform I.)
The BM's motion is the effective stimulus that inner hair cells detect and convert to the neurotransmitter release that causes the primary auditory neurons of the spinal ganglion to spike, and send the sound-evoked signals to the brain via the auditory nerve.
The hair cells use their own energy to pump positive potassium and calcium ions out, and achieve a negative internal potential that is 150 mV different from the region immediately outside their end (that holds the cilia, or "hairs," that transduce motion.) This 150 mV is known as the endocochlear potential, or EP, and is the largest potential difference found anywhere in the body. This is what drives the sensitive and fast transduction that these cells achieve.
External resources #
Books #
Music I #
I recommend you read through Sound I (in particular Sound I - Pitch I), and perhaps even Hearing I. Those sections are relevant, as we're about to talk about the subset of those sounds you hear which can be objectively called musical. But feel free to ignore my recommendation.
- Pulse, meters and tempos.
"The whole duty of a conductor is comprised in his ability always to indicate the right tempo." – RICHARD WAGNER (1813 – 1883), composer and conductor.
- Explicit pulse: Think of an ACDC song, or some techno song, where there's always some percussion keeping a constant pulse (or subdivision of it) throughout the whole song.
- Extremely slow to non-existent pulse: Think of drone music. Or the long chords in film music that don't cause you to bob your head or follow any "beat" in any way.
- Temporal dynamics: Think of an orchestra conductor, varying the tempo up and down according to his feel of it, and/or notes from the original composer (or some other type of cue, in the case of film, theater, TV, video game, etc.)
- Why study this? Does a pop ballad preserve its feel and meaning if you change it from 80 bpm to 250 bpm? No. The 250 bpm would sound like some ridiculous cartoon. Faster is different. Slower is different. A ternary meter (or "triple time") will enable the Waltz. Binary meters won't. "Progressive" metal and rock play with meters all day long. Drummers may play different meters at once. Etc., etc. I mean, we're talking about Time. The temporal dimension of music. It's kind of a big deal.
- Melodic content:
- Intuition: A melody is the part that you can whistle. As simple as that. Does your whistling carry the information about which musical notes are being played? Then it's a melody. Otherwise, it's something else (noise, speech, percussion, chords, etc.)
- More technically: Concatenation of notes.
- Why study this? Melodies and riffs and "licks" (in the guitar world) are basically the phrases, sentences, speech of music.
- Harmonic content:
- Intuition: From notes, you make chords (i.e. one or more notes played at once). From chords you make "chord sequences" or "chord progressions". If melodies are the words and sentences of music, then harmony is the context. The same exact melody ("sentence") that sounds happy when accompanied by one harmony ("context") can communicate something totally different if accompanied by a different harmony.
- More technically: Addition of notes make chords. Addition of chords make... more chords (we'll skip this at first. Advanced stuff. But notice it's a closed operation, mathematically: Chord + Chord = Chord). And Concatenation of chords make chord progressions.
- Hands-on intuition: Sit on the piano right now. Press down a bunch of random keys at once. That's a chord. Wait a bit. Now press another bunch of keys. And so on. You've just played a "chord progression" – which probably sounded like shit, cos you pressed random keys (unless you're lucky and happened to choose some cool weird jazz chords or something.
- Much like a pulse sometimes there's no explicit harmony in a song, i.e. no piano, synths, or guitar literally playing chords. Sometimes the harmony is implicit, in the sense that each instrument is playing a single melody, i.e. nobody is playing chords, i.e. no one pressing multiple keys on that piano at once like you did earlier, but obviously when they all sound at once you can hear the whole thing as forming a chord. And yet other times, there simply is no harmony.
- Why study this? You can only do so much with single melodies. Even if you're dealing with instruments that can't play chords (e.g. human voice), at some point you're gonna have more than one of those instruments playing at once, and then you have chords, and harmony arises.
- Tonalities:
- Intuition: A "tonality" is what you get when your melodic and harmonic information (see above) follows certain rules as to what notes and chords are "allowed."
- "Major" and "minor": In Western music there are 24 tonalities (or 30 depending on how you count) tonalities, before you start engaging in worthless redundancy, arising from the fact that we're dealing with structures that cycle (mathematically: groups.) E.g. C## major is exactly the same sonically to D major, but more annoying notationally. Much like in computer programming: Things which can be constructed, but are pointless.
- Why study this? At the very least you should know when you are in major, minor, or ambiguously in "both," or "in between," (at least perceptually, if not notationally.)
- Timbre, orchestration:
- Intuition: This is the description of a 4-note melody (assume quarter notes): "C D Eb D." Notice anything missing? How about... the instrument?! This determines the sonic texture, so to speak, i.e. timbre. As we will learn in synthesis and Fourier I and II, complex sound waves are made of a "mix of simpler sound waves," the acoustics of musical instruments determine the propertion of the different "ingredients" in this mix.
- Why study this? Does it make a difference if I play it on a flute or a church organ? Yes it fucking does. The timbre, or "texture" of a sound can make all the difference in the world. It's the reason orchestras have a certain configuration of instruments. It's the reason guitar players buy effect pedals and shit. It's the difference between Celine Dion and Corey Taylor.
- Emotional content:
- Intuition: This is, of course, the least objective, and thus theoretically weaker, part of music theory. For all the talk of how "universal" music is as a language, the fact is that there is no formal notation for "tense," "nostalgic," "eerie," "angry," etc. even within the specific world of theory that is CMN (Western common music notation system.) And yet, your music is shit (there's also no theoretical symbol for shit, btw!) if it doesn't trigger, or evoke, and manage, emotions.
- And now that I think about it, this part is what I think about 99% of my music composition time: "Sci-fi-ish," "elves running in the forest and shit," "pure fucking anger," "a planet being destroyed," "WWI," etc.
- Why study this? Can we even study this? If everything is fucking subjective? I could go the pseudoscience way: Grab some fuzzy "Theory of musical emotion" from some Social Science PhD dude, and pretend it's serious stuff. But I'm not gonna do that. I prefer the historian and observer approach: Point you to ways music has been used, e.g. in films and very emotional songs, and simply describe everything involved in it, theorizing a bit about it. Because there are things we can all agree on. E.g. I'm pretty sure the Schindler's List OST was simply not intended to be used to pump your body to, during your cardio workout at the gym.
- Story:
- Intuition: Have you ever fell asleep while watching a movie cos you don't give a shit about what's gonna happen next? That because there's no emotion being triggered. However, you still would recognize that there is a story. Logically, then, I argue that we can talk about story as separate from the study of emotion. Kind of like the "architectural grammar" of composition.
- General structure: "Verses," "Choruses," "Intro," "Climax," "Bridges," etc.
- Why study this? Every metalbro knows this feeling: "Dammit. My 'song' is just a collection of riffs. #RiffSalad." Every electronic musicbro knows this feeling: "Dammit. My 'song' is just one loop/beat after another. #BeatSalad #LoopSalad." What's missing is the story-telling. This is, for many people, the most permanent amateur weakness: Inability to tell a story with the music.
- Performance:
- Intuition: Check out this 6-note melody: "E3 F#3 G3 E4 G3 F#3" Assume it's quarter notes (so it's e.g. a two bar Waltz-y 3/4 melody). Assume, further, that it's meant to be played on a piano. Do we have all the information we need? No! We don't know how hard to press the keys, for example. All equally hard? Start soft and increase the force as we go? We don't know. There is notation for this. Some of it CMN. Other we make up. (E.g. You'll be shocked to hear that there's no CMN symbol for "play this part with a spoon" or "play the last two notes nearer to the electric guitar bridge"!)
- Why study this? Do you play piano, violin, timpani, cello? Can you compose things that make sense if you have no idea how each instrument is played, and which notes are hard or easy to play depending on hand position, tuning, etc.? To give you an idea: I play guitar as my main instrument (I consider myself a "brown belt" at it. Cross fingers I don't stop, and by my 40s I reach black!) Then, my secondary skills are drums, bass, and keyboards as my secondary (distant from my guitar skill, though. At most blue belt at the rest.) And yet, I still constantly fail to remember the peculiarities of each, and compose impossible-to-play shit, when sitting on the computer or working on paper!
- Notation and representation:
- Intuition: I'm adding this category, which is like adding a "theory of programming languages" section to a programming book. The average programmer might tell you: "I don't need that shit. It's too meta. I want to program a video game, not theorize about computer languages used to program video games." But as we saw in Programming II, it is quite important to at least know the different families of programming languages. In music it's even more justified, because as a musician you will encounter different notations (sometimes made-up ones, or even your own), as well as different visual representations.
- Visual representations: Classic staff vs. "piano roll" vs. vertical "tracker" vs. "arrange" vs. guitar/bass/drum "tabs" vs. audio "waveform"). What are the pros and cons for each? Much like (in fact, exactly like) programming languages, the pros and cons of different notations and representations are to do with: Familiarity, space (and other resources) required, learning curve, level of information (big picture vs. every explicit detail of e.g. timbre), history & support/adoption, etc.
- Why study this? Should a filmmaker study film theory formally? Should a poet study grammar formally? Should a software developer study algorithms formally? The answer to all of those questions is: Maybe. It depends on what you want to do. Is your goal today to right a simple folk song for acoustic guitar? All you need is chord names on a napkin. Is your goal to hire an expensive orchestra to play a film score? Then you want notation with all the symbols and annotations necessary for the conductor to do his job.
- Why study this? Continued: Due to the above, I am neither in the "theory = bad = limiting = for academics" camp, nor in the "you must study notation if you want to be a Real Musician (TM)" camp. It's a pointless discussion. I personally have a very specific reason why I value studying a bit of CMN whenever I can: It makes it easier to develop music-related computer programs. Why poorly reinvent the idea of a "bar," a "quarter note," etc.?
- History:
- Why study this?: Because studying anything without its historical context is stupid.
So let this be your welcome to Music I.
Before moving on, here's some homework-y items, to assess your current level of understanding:
- Pick some of your favorite tracks, and write a little review, with at least one sentence describing it from the point of view of each of the categories I listed earlier: Pulse, melodic content, harmonic content, tonalities, timbre & orchestration, emotional content, story, and performance.
- If you are incapable of commenting on any of the areas, that's ok. It's what the rest of Music I is for.
Music I - History I #
"Music directly represents the passions of the soul. If one listens to the wrong kind of music, he will become the wrong kind of person." — ARISTOTLE (384–322 BC)
"Music is a moral law. It gives soul to the universe, wings to the mind, flight to the imagination, and charm and gaiety to life and to everything" – PLATO (428/427 or 424/423 – 348/347 BC)
The Ancient Greeks were absolute masters of music.
In their mythology, music was invented and practiced by gods and demigods such as Apollo, Hermes, Amphion, and Orpheus, and the word music (Greek mousikē) comes from the word for the Muses, and denoted the arts associated to them, from history to dance. Music in Ancient Greece was an both art to be enjoyed, and a science closely related to arithmetic and astronomy.
Which is a good thing, since that's exactly the way I view music: Art and science. And engineering.
Music was everywhere in Ancient Greece. Work, military, school, ceremonies, theater, etc. And definitely theorized about it too. We know that at least Plato and Aristotle wrote about the nature and effects of music, And there's of course also the the music-theoretical work of Pythagoras (500 BC) all the way to Aristides Quintilianus (fourth century, AD).
From writings and archaeological findings, we know they had many instruments too. Harps, panpipes, horns, proto-organs, and a variety of percussion instruments, including drums and cymbals. Wouldn't surprise me if archaeologists found some sick double kick drums! \m/ In any case, it's believed that the most important, or most popular instruments were the aulos, lyra, and kithara.
There's no record of them ever forming large orchestras, though. But who's to say they never did? They certainly had the instruments and the number of musicians. For example, we know music competitions were held, and prizes (such as vases amphoras with wine or oil) were awarded to the top musical virtuosos. So these were people who dedicated themselves to the craft.
Speaking of performers, as a performing art, music was called melos, from which we get the word melody.
Books #
Fourier Transform I #
Most of us can hear sound. But is there a way to see sound?
We often want to visualize what's going on with the sound. See what it's made of. Whether it is for working with music, or sound effects, or industrial equipment whose noise tells us things about its integrity, for maintenance purposes. That (among many other crucial operations in the modern world) is where all the Fourier shit comes in.
We're gonna use audio analysis as our first context for explaining Fourier, for obvious reasons. I'll generalize it all in later sections. For now, at all times, I want you to think: "We want to build an equalizer. That's is the point of what I'm reading." Constantly remind yourself that our end goal is to build an EQ. Pretend that you are an EQ manufacturer. The main message, in its more general form, is this: Any wave is the sum of less complicated waves.
So here we go, a quick Fourier primer:
- We start with a waveform.
- I.e. The sound we're gonna fuck with.
- I.e. Your death metal riff.
- As a mathematical object, the waveform is a function.
- It is a function of time.
- Think of time as the horizontal axis on the plane.
- Just think of time going left to right, like in a DAW.
- Let's call the waveform
s(t)
, like the mathematicians do.- Using the letter "s" cos of the "signal" concept.
- Using the letter "t" cos time.
- Where we're heading: We're going to extract information from
s(t)
. - Before some programmers confuse themselves and others:
- Neither of these two programming notions will help you: "Functions are first class, bro" (FP). Or "everything is a object, so a function is just another object!" (OOP).
- E.g.
function f() { } let x = f
and then you say "x
is a function." - E.g.
let square = new Function(x => x * x);
and then you say "the functionsquare
is an object, like everything else!" - Don't bring any of that programming language shit here. It's not helpful here.
- To the mathematician, the function is the whole thing. The graph. The table. However you visualize it.
- I.e. the function is all the inputs with all their respective outputs.
- Bringing that to programming: A collection of pairs
[ [1, x(1)], [2, x(2)], ..., [1000, x(1000)] ]
. - Pairs of inputs and outputs.
- That is (our programmer translation of) the function as a mathematical object.
- (Of course, the mathematician thinks of it with infinitely many inputs, going infinitely to the left and right in the time domain, infinitely infinite between any two numbers, and blablabla. They can't control themselves, with their Infinity nonsense.)
- Aside on why we named the waveform
s(t)
.- "s" because "signal" (Information-wise. Put on your "Signal Processing" engineer hat.)
- "t" because it's a "function of time."
- "Function of time" = time is the input, and the x-axis on a graph.
- (Btw the vertical axis here is gonna be telling us the amplitude.)
- Have you seen audio and sound tools referring to "DSP"? That stands for "Digital Signal Processing."
- Your death metal riff, i.e. our waveform
s(t)
, is the signal we'll be digitally processing here.
- Stop here and remember: We're walking towards the data we would need to build an EQ.
- Enter Fourier: From a clever analysis of
s(t)
, we can synthesize two sinusoids of different frequencies.- (Two or more, of course. But think two for now. As a minimal non-trivial mental picture.)
- I.e. we "decompose" big fat
s(t)
into two thin sinusoids of different frequencies and amplitudes.- In practice: It's an algorithm. Put on the programmer hat. It's just a fucking algorithm.
- (Of course, in the real world it's actually various algorithms, for different practical purposes.)
- The sinusoids, if added back together, untouched, would give us the original waveform back.
- I.e. the analysis gives us a "lossless" decomposition of
s(t)
into its "building blocks." - (By "lossless" I mean as a mathematical principle.)
- (Nothing is really "lossless" in practice, except for a copy operation on a digital file.)
- In the real world, "lossless" processing = "Rounding errors during computation were not too bad."
- I.e. the analysis gives us a "lossless" decomposition of
- Remember
s(t)
is the waveform. The whole thing. The audio recording.- The mathematician does not mean a "return value" of "calling the function
s
" with at
, when he tells you "Let's analyzes(t)
" - He means analyze whole thing. The whole signal. Which he refers to as "
s(t)
" (and pronounces "s of t.")
- The mathematician does not mean a "return value" of "calling the function
- Anyway, computationally: After analysis, we get two "collections of amplitudes." The data for the two sinusoids.
- Think
300Hz = [ ... ], 1Khz = [ ... ]
. - I.e. Separate amplitude data for the "children waves/signals/sinusoids" of the big wave
s(t)
.
- Think
- Now we can step through, and mess around with, the amplitudes of each sinusoid.
- We can step through them, individually, in time
- (Think about the timeline in your DAW.)
- Programming speak: We'll step through them using an
index
in the collection of amplitudes. - (And likely some arithmetic involving the "sampling rate" of the recording and shit. But let's talk about that later.)
This is just getting started with the intuition of the Fourier Transform. The next section will continue the story at this intuition level. And then I'll get more into the math. But at this point, you should already understand why this is the basis that lets us e.g. build an EQ (or any other form of processing.)
Meanwhile, listen to this amazing track, by an even more amazing metal band: Zander Noriega - Took Too Long, Already Rotten
Appendix #
Lightning Network I #
Even though this is not a book about bitcoin, I'm gonna be using Bitcoin and the Lightning Network as examples when talking about networks, cryptography, etc. enough that it's worth adding a brief introduction to both.
Of course, you should only read this section after listening to my top-notch, Oscar-winning, Pulitzer-winning instrumental thrash-death metal "music," E.g. Escape Mechanics Unlocked.
With the "music" shilling out of the way, let's get on with it:
- Why Bitcoin?
- Because we want to do global transactions, without a third party holding funds, without needing anything beyond a computer and internet.
- Because of its blockchain's "gossip protocol," where every node knows about every single transaction that occurs globally, so that everyone agrees on the state of everyone's balances. I.e. for decentralized consensus.
- Why the Lightning Network?
- To "scale Bitcoin" off-chain, because one thing a "gossip protocol" on-chain just cannot provide is fast payments.
- Because a blockchain by itself is too slow to cover the entire world's commerce in all its frequencies. I.e. everything from large enterprise and government transactions, to normal people's daily transactions and "micro" transactions.
- Because as an unavoidable result of its truly fully decentralized operation, Bitcoin processes only around 10 transactions per second, and confirmation of each transaction can take up to 1 hour.
- To get irreversible payments in seconds without the "waiting for confirmations" issue.
- To craft and update transaction data, and pass it around on a separate network, deferring its broadcast (to the blockchain) to the future, for near-unlimited transactions per second, so that we can pay for small stuff easy and fast.
- Because we don't want 8-gigabyte Bitcoin blocks (which would be needed to reach e.g. Visa-level level of daily transaction volume per second), which would either collapse the Bitcoin network, or make it so that only those with enterprise and industrial-grade resources could operate nodes that validate the blockchain. I.e. It would price average people out of blockchain validation. I.e. it would centralize the blockchain.
- (Note: In fiat land there's also the same "layered" structure. E.g. Credit card payments are a "higher layer." The millions of credit card payments that occur each minute don't actually cause a million physical moves of physical money.)
- Because scaling "off-chain" is also a backwards compatible approach, which keeps large scale adoption stable.
- Why call it a "Second layer" or "Layer 2 Protocol"?
- To reflect the fact that it "relies on" the Bitcoin network, that it assumes features of the lower layer, i.e. Bitcoin's security building blocks (multisig, time locks, non-expiration capalitities, double-spending prevention, censorship resistance, etc.)
- Why payment channels?
- To establish an
A -> B
relation between nodes, becauseB
is the final recipient in an economic relationship, and/or a well-connected and well-funded node in the network (see "Why payment forwarding?" below). - To work with transactions that are not committed to the blockchain, but are nonetheless secured by it (and in fact channel identifiers are based on funding transaction ids).
- So that our off-chain transactions always refer to money that actually exists, i.e. spendable btc on the blockchain.
- (Note: The real benefit comes from the paths that form as the network graph grows. I.e. users don't need to directly "open a channel" to each other, cos that'd be stupid. See below for the actual network behaviors, E.g. routing, path formation and finding, etc.)
- To establish an
- Why a three-stage payment channel lifecycle?
- Establishment: A Funding Transaction on-chain for deciding the starting funds and balances based on real blockchain history. I.e. UTXOs that either of the channel partners can unlock.
- Normal operation: Zero or more "update" transactions created and passed around off-chain for speed, privacy, etc. E.g. for a unidirectional channel, the payer keeps making payments by creating new 2-of-2 transactions, whose outputs go to both payer and payee, with each one's quantity updated (i.e. the balance updates mentioned above.), so that when the interaction stops, the transaction is ready to be sent to the blockchain.
- Closing: A final Settlement transaction on-chain for recording the state of bitcoin balances to the blockchain.
- Why use Bitcoin security primitives (i.e. the blockchain transaction structure and Bitcoin Script language) in the Lightning protocol?
- To automatically ensure fairness.
- To ensure that the right (as in morally right) transactions are broadcasted to the Bitcoin network. I.e. To avoid settlement of transactions whose state isn't the most fair, by encumbering transactions with hash locks, time locks, etc.
- To automatically penalize cheaters at the protocol level.
- To operate in the same "trustless" manner of Bitcoin, i.e. to communicate using the Bitcoin protocol's security primitives, but without dealing with the blockchain for each action.
- So that channel partners don't need to trust each other. They only need to trust the protocol.
- Why the Funding Transaction on chain?
- To set initial balances so that off-chain activities during Normal operation refer to real transactions inputs found on the blockchain. I.e. Funding must come from real blockchain sats.
- Because one or both players must fund the inputs, because the rest of their payment interaction will be entirely about updating the balances of this exact total, and thus this funding is called the "channel capacity."
- Why 2-of-2 Multisig?
- So that no one can run away with the funds.
- To agree on off-chain balances at all times.
- Why Time locks (
CHECKLOCKTIMEVERIFY
)?- To secure the funds, by setting a future time (expressed in blocks or timestamp) before which the script aborts execution.
- To not allow some bitcoin to be spent before some time (or block height of the blockchain) in the future.
- So that each subsequent off-chain "balance update" has a increasingly sooner time availability on the blockchain, so that if a cheater tries to submit an old transaction (whose balances are more convenient to him) his victim can defend by broadcasting the latest transaction, which Bitcoin's blockchain will process sooner.
- Why payment forwarding?
- To form paths by connecting payment channels end-to-end, to have an actual payment network.
- So that not every node in the network has to open a channel (i.e. broadcast a funding and settlement transactions to the blockchain) to the final recipient whenever they want to make a payment, because that would defeat the purpose. (It'd just be a conceptual collection of separate 1-to-1 pairs/connections.)
- So that
A
can send "micropayments" toD
, i.e. send payments smaller than the on-chain fees involved in opening a newA -> D
channel. - (WARNING: Physical analogy) So that you don't have to build a road from
A
toD
(and thus pay the costs involved in such building) when there already exists anA -> B -> ... -> D
road.
- Why hash time-locked contracts (HTLC) and payment secrets?
- To implement an atomic ("all or nothing" payment forwarding, time-locked refunds, no "half-paid" states, no race conditions), trustless (no ability to steal, or do ransom attacks, time-locked refunds), two-round multi-hop payment channel network.
- To add a deadline to payment forwarding contracts, after which they become invalid and senders are refunded, to avoid "No one can steal from anyone, but funds locked forever in a multisig, cos someone is being an asshole / MIA / etc."-type situations. I.e. So that payment forwarding is "all or nothing," i.e. atomic.
- To implement a fairness protocol, which requires, i.e. specifies as condition of payment, that intermediaries relay a hash
H
of a random secretR
(called a "payment secret" or "payment preimage") – whereH = RIPEMD160(R)
– from the recipient back to the sender. - To ensure that the sender doesn't get cheated by the intermediaries, that intermediaries get reimbursed (and ideally compensated with fees) by the sender, and the recipient gets the payment.
- Why a gossip protocol?
- To implement payment forwarding.
- So that a node
A
, of some channelA -> B
, can listen to node_announcement and channel_announcement messages from other nodes (the network "gossip") to create a channel graph, i.e. its view of the network, from which it can find paths, e.g.A -> B -> C -> D
, so that it can make payments to e.g.D
without opening a channel to it.
- Why Source routing?
- To give sender nodes full control over the route their payment follows through the network.
- So that senders can specify the total number of hops, total cumulative fee, and total worst-case time-lock period enforced by the HTLC.
- Why Onion-routing / Sphinx-based mix network?
- To securely and privately route HTLCs, i.e. conditional payments, within the network.
- So that senders can encode the whole path as nested encrypted layers (like an onion) so that each intermediary ("hop") sees ("peels") only the layer that pertains to it ("hop payload"). I.e. Cryptographic packet routing.
- So that intermediaries, called "routing nodes" in Payment forwarding context, don't know who the original sender and final recipient are, and thus can't engage in things such as censorship. (Unlike e.g. Internet IP routing, where each hop can see the origin and recipient in the IPv4 packet.)
- So that intermediaries forward payments without knowing which other nodes, besides their predecessor or successor, are part of the path.
- So that intermediaries don't even know what their position is in the path.
- Why Path finding by iterative trial-and-error?
- Because of incomplete (by design, see Onion-routing section) and ever-changing information, e.g. a channel's balances (how the capacity is allocated between the two peeers at a given moment) and liquidity (balances minus channel reserve and shit) are unknown to the network, for privacy reasons.
- Because the only channel information known to the rest of the network via channel gossip announcements is the aggregate capacity, for privacy and scalability (because broadcasting balance updates to the whole network would slow everything down again).
- Why Commitment transactions?
- So that no one can cheat, because any participant can take their off-chain state (which is kept structured as a transaction) on-chain at any time.
- To implement refund clauses during Establishment, so that nothing ever gets lost, and channel partners don't have to trust each other. So that before broadcasting the Funding transaction, you ask your peer to sign a transaction that pays your whole initial funding back to you.
- Because you don't want to put your money in the 2-of-2 multisig Funding transaction without having a refund transaction first, because you'd then be vulnerable to the other end.
- To avoid reverting back to trust.
- To commit the latest balance distribution each time it changes during Normal operation. I.e. each time a payment is sent.
- (Note: These transactions are then the precise technical meaning of "sending a payment through the Lightning Network.")
- So that if your channel partner disappears, you have a transaction you can broadcast to the blockchain, to get your money back.
- It's implemented in various forms. E.g. It's the
Else
in HTLC. It's a refund timeout in Multisig.
- Why timelock delays and revocation secrets?
- Timelock: To prevent and punish cheaters who publish old commitment transactions to the blockchain.
- Timelock: To prevent those who cheat by publishing old commitment transactions, from being able to immediately use the stolen funds, giving a window of time to the cheated to move them first.
- Timelock: To give the cheated a chance to claim the whole balance of the cheater.
- Revocation secret: To allow either party to bypass the timelock. So each partner has half of the secret and, with each balance update, one of them asks for the other's half, before signing the new commitment transaction.
- To eliminate profit incentives for cheating.
- Why a required reserve for each channel balance?
- So that the cheating penalties always actually hurt. I.e. so that there's always enough "skin in the game."
- Why Full Lightning Node (or 3rd-Party) + Self-managed (or custodial)?
(3rd-party Node, Custodial wallet)
for when you don't want to maintain a node, and don't care about someone else holding your keys. I.e. you only care about convenience.(3rd-party Node, Self-managed wallet)
for when you don't want to maintain a LN node, but you want to manage your keys.(Full Node, Custodial wallet)
for when you want to maintain your own LN node, but you don't care about managing your own keys. (I'm failing to imagine this use case.)(Full Node, Self-managed wallet)
because you want full control, zero asking for permission for anything ever. Mr. Sovereign Individual.
- Why the Noise Protocol Framework?
- For authentication, encryption, privacy, and resistance to traffic analysis, eavesdropping, and other malicious interference.
- To authenticate any information announced on behalf of a node, by using a long-term public key on Bitcoin's
secp256k1
curve.
- Why the payment requests format?
- To implement single-use, digitally signed invoices, created by having the recipient send his payment hash and destination (i.e. the minimum required information to make a payment) to the sender, using a QR-friendly format.
- Why are invoices used only once?
- Because the recipient will have revealed the payment secret (see payment forwarding section) back to the entire payment route after the first time.
- Because if a sender reuses a payment hash, they risk losing funds, since all nodes in the route already know the payment secret, and thus any of them could settle the payment for themselves.
- Why is an invoice digitally signed?
- So that if some node in the network modifies it while routing, the invoice will be invalidated.
- Why bech32 encoding?
- To get the same benefits that SegWit-compatible bitcoin addresses get: Use less space, only lowercase letters, better error correction and detection, etc. (See BIP-0173)
- Why key-value pairs in the data section?
- To include both the required data (e.g. the payment hash), as well as extra metadata.
- To include arbitrary invoice descriptions like
"Invoice for 1 pound of steak"
. - To include information that can be used to invalidate and expire invoices (unlike BTC addresses, which never expire), such as creation date, ID of recipient node, etc.
- To include a fallback BTC address in case payment over LN fails.
- To include extra routing hints, i.e. information that private nodes include to still be able to receive payments even though their existence is never advertised. I.e. more routing information for the payer to use.
- To specify features supported/required for receiving a payment.
- To evolve the protocol in a backward-compatible way.
- Why was this "recipient makes invoice, sends it to sender, sender uses invoice to construct payment" approach chosen originally?
- Because... Reasons. I haven't looked up this specific history. But there are proposals for direct payments straight to arbitrary nodes without invoices, see e.g. the "Keysend" proposal in bLIP-0003. See also offers in the next section.
- Why the offers format?
- Because of the limitations of invoicing, a.k.a. payment requests, as described above.
- Because of the dangers of invoice reuse.
- Because of the signature applying to the whole invoice, making it impossible to prove it without revealing its whole contents.
- Because of the payment secret approach being only safe as long as the invoice remained private between recipient and sender.
- To support the "give me money" payment flow, where the publisher, as a recipient, publishes a "give me money" type of offer, senders in the network request a unique invoice (using the
invoice_request
message), and the recipient sends each of them their respective unique invoice, to which they make payments. - To support the "take my money" playment flow, where the publisher, as a sender, publishes a "take my money" type of offer of a specified amount (which will also serve as a refund), and recipients in the network send him invoices for said amount, to which he makes payments.
- To enable both payment proofs and payer proofs, because as per BOLT-11, the Lightning Network can only prove that an invoice was paid, but not who paid it, because after the payment is made, every node in the route has the preimage/payment secret.
- To have a precursor to invoice creation. I.e. an offer describe what someone has or needs, such that from it one or more invoices will be created, and sent, within the network. Thus, offers outlive invoices, and amounts can be specified in non-lightning currency.
External Resources #
- (Text/HTML) Lightning Network In-Progress Specifications @ GitHub
- (Text/HTML) Lightning Onion @ GitHub
- (Text/HTML) Lightning-dev: An Alternative Onion-Routing Proposal @ lightning-dev
- (Text/HTML) Noise Protocol Framework
- (Text/HTML) Authenticated encryption @ Wikipedia
- (Text/HTML) Understanding Lightning Invoices
- (Text/HTML) BOLT #12: Flexible Protocol for Lightning Payments