Back to Articles

Do I really need to know math?

I’m going into computer science. Do I really need to know math?

Someone recently asked me more or less this exact question. It’s a really good one! If you have limited exposure to computer science it might not be obvious the ways a strong mathematical foundation can help (or for some things, be a pre-requisite). Let me show you a few ways in which knowing math is very useful.

As the title suggests, this article is intended for those late in high-school or early university who aren’t quite sure why math is so important. There’s still some interesting content in here if that doesn’t describe you — I cast a pretty wide net about what I talk about and avoid getting into the nitty-gritty for the most part, so there might be something that you are unfamiliar with! If the idea that there’s so much math in computer science is off putting to you, I would just say that a lot of the math that we’re going to see here is very different from what is taught in the American education system before university. For more details on what I mean by this, you should read by forthcoming blog post on mathematics education.[1]

tl;dr: Computer science as a field is fundamentally based on mathematics, just not mathematics as you know it from high school. It pops up everywhere. Here’s some examples.

Complexity: the basics

This is the most inescapable math-heavy part of computer science. Even if you hate math, aren’t willing to learn anything that even sort of smells like math, and just want to crank out javascript for web apps all day, you are going to have to have at least some understanding of algorithms and complexity. What do I mean by complexity? As a bit of background, an algorithm is just a list of instructions to accomplish a certain goal. A classic example of algorithms would be algorithms for sorting an array, such as bubblesort, quicksort, etc. By complexity, I’m referring to a class of mathematical analysis used to figure out how efficient (very generally) an algorithm is. The foundational concept for complexity is this idea of big O complexity (or more generally, asymptotic analysis), which describes how much longer an algorithm takes relative to the size of its input. Without getting too deep into the weeds, this type of analysis is extremely important because it allows you to reason about how fast your programs are without running them, and be able to tell how good an algorithm is for large inputs, all while not getting bogged down in the exact details which become much less important for large-sized inputs. Unsurprisingly, being able to write fast, efficient code is deeply important for all branches of computer science, so this is one of those things that you absolutely must learn if you want to go into computer science. I don’t say that lightly either — nothing else I’m going to discuss in this post is as essential for everyone as asymptotic analysis.


In computer science, there’s set of “canonical” well studied algorithms for solving various problems. The math comes in a few places. First, the methods that computer scientists use to study these algorithms are based in mathematics. We have already discussed asymptotic analysis (which is used a lot in algorithmic analysis), but there’s more. We often want to show that out algorithms are correct. This means that we have to do mathematical proofs showing that certain properties hold about them — e.g that they will always produce the correct output given valid input. We can also prove things about the data-structures that they work on, e.g one can prove how many nodes are in a given row of a complete binary tree.

Secondly, the tasks that these algorithms actually perform are often very mathematical in nature. Often, algorithms work on things like graphs and trees, which are studied in depth in the mathematical fields of graph theory. You might be surprised just how many problems can be framed in terms of a graph! They involve mathematical concepts in their implementation like a hash function like you get for hash-tables. Perhaps unsurprisingly a lot of algorithms are just for doing certain mathematical computations, e.g a fast Fourier transform, matrix multiplication, etc. Math tends to pervade a lot of other fields (not just computer science!) so being able to do a lot of complex math very quickly is really useful for other fields of study.


If you have done some programming before, the question of how do you go from very textual, basic programs to something as complex as a 3d renderer or a video game may have come to mind.

The answer is linear algebra. A lot of it.

You may have heard talk about how many polygons a particular character or level has in a 3d game. It’s easy to visualize in your mind that 3d art is made of a bunch of small simple shapes smushed together, but it may not be immediately obvious that these are going to be stored in the computer as matrices, and actually figuring out how take all of these matrices and determine how they should be displayed is a very mathematically intensive process. You have to worry about shading, rasterization, and a bunch of much more complex things that I don’t really understand because I am not a computer graphics person. The field of computational geometry is a highly related field of computer science which, as the name might suggest, involves a lot of geometry.

The exact details on how this is actually performed, both on the software and hardware level is also mathematically dense. GPUs are just a huge collection of processors that are very good at doing huge amounts of mathematics at once, making them extremely fit for things that come up often in graphics, such as matrix multiplications.

This does not even begin to get into the actual gameplay of most games. Doing things like simulating gravity, collision detection, game AI, and a lot of the other basic things that you expect from video games are mathematical in nature. We’ll come back to AI, but for now, let’s move on to the next topic

Program Analysis

When we write code, we generally would like it work, and without bugs at that. One way of helping to make this the case is by analyzing a program to make sure that it has certain properties. Most techniques for doing this are very mathematical in nature — let’s go over a few examples. Note: some of the things I discuss here might not fall under a strict interpretation of the term program analysis, as I’m also including analysis that is part of the language.

Type checking

If you’ve ever done any programming, you probably have some idea of what a type is. Something that is very important to writing correct programs is type safety — the idea that an error due to the use of an incorrect type (such as subtracting a string from a number, or trying to get a field from an object that doesn’t exist) should be avoided at run time. Not every language does this, of course, some languages simply have incomprehensible behavior that causes bugs and security vulnerabilities instead. It might seem slightly silly at first — I’m an adult, i know to not subtract letters! — but the amount of bugs and pain that not having this being checked ahead of time is immeasurable; as someone who has worked on a large javascript codebase, it has caused me personally dozens if not hundreds of hours of pain. It’s incredibly frustrating to be stuck on a problem for a long time and come to the realization that an assumption that you made that a type checker would have caught is wrong, and the issue is that the data was of the wrong type.

What is a type checker, though? it’s a well named term — a type checker is a part of a programming language that makes sure that your program is either type safe, or doesn’t allow it to run (e.g by not compiling). The idea being that it forces you to recognize the already existent bug in your program and deal with it as you write it rather than discover it as it crashes your program and ruins your afternoon. How these systems work is very mathematical — the idea of a type originates from type theory, which was invented as a formal basis for all of mathematics. It was then later realized that this system could be applied to programs to prevent large swathes of bugs from being run at all. The way that these systems for preventing bugs are organized is also very mathematical — type systems are described used something called typing rules, which are based on (or at least very similar to) the natural deductions of formal logic. These allow one to encode very specific and powerful properties about the types of your program — we’ll talk more about advanced type systems when we get to formal proofs and functional programming.

Memory Saftey Analysis

You may have heard of the rust programming language — the main attraction of the language is that it guarantees that your programs don’t have memory errors — which if you are unfamiliar, are a class of errors that come up a lot when doing low level programming — while not being any slower at runtime. This is a hugely useful property — according to microsoft, 70% of all security vulnerabilities are caused by memory errors (assuming their sample is representative). Most programming languages solve this problem by using something called a garbage collector, which takes the onus away from the programmer to manage memory, and does it (at the cost of some speed) at runtime. Rust is neat in that it does a type of program analysis called borrow checking to manage memory ahead of time. This analysis, you guessed it, is based on a mathematical logic called separation logic, which is based on something called Hoare logic, which we will talk more about when we get to formal proofs.

Flow analysis

Say you don’t want some secret piece of data (like your password) be sent over the internet to scary hackers. How can you show that the program you just wrote doesn’t do this? One way is to use something called an information flow control system. This is a way of managing the flow of information (the study of which is studied in the mathematical field of information theory) such that huge classes of security vulnerabilities are prevented. I will spare you of most of the details, but the use of these systems involves putting different levels of security in a mathematical object called a lattice or two and marking your variables with these security levels.

Keeping track of the flow of your program is also very useful for doing optimizations on your program — many programming languages use data flow analysis and control flow analysis, to optimize their programs. These analyze where the data in your program can go as it runs, and which parts of your program can be executed (and in which order). I am very much am at the risk of getting too into the weeds here, so I will just say that these rely heavily on various ideas in mathematics (Graphs, Lattices, Fixpoints, etc) and leave the details as up to the reader to discover.

Model Checking

Model checking is a method in which you model the possible things that your program can do in a precise mathematical way, and then use mathematical magic to determine that behavior that you don’t want (such as, say, the door on your elevator not opening while it is moving) cannot occur. Most model checking systems have you mock your program in a mathematical language of some sort, such as first order logic, and then using one of many different mathematical techniques to determine it’s correctness. A simple way some systems do this is by determining if the description is mutually satisfiable with the property you want to be true.

Logic programming

I’ve been talking about logic a lot in this post so far. If it’s so cool and applicable, why don’t we program in it instead!? This is the (presumably exact) thought that Alain Colmerauer had in 1970. Prolog is a programming language where, in essence, you write logical formulas that describe what you want, and the computer does a search to find data that fits that description. This is a really different way of programming — it allows you describe certain problems with incredible ease — taking the amount of lines you have to write from tens from hundreds. It even allows you to generate prolog programs based on these mathematical formulas! Prolog has a really cool property called homoiconicity where it can treat it’s own code as data, which makes it really easy to modify the behavior of the language, especially when combined with the logic-formula paradigm of writing programs. This idea of logic programming is also really applicable to databases — datalog is a programming language that is used to make queries on a database (like SQL, if you are familiar with that) that is based in logic programming. Prolog is based (in part) on something called higher order logic which means that it can do things like quantify over predicates, which in turn means that it can do things like describe a logical rule and then try to find rules that match that description (among other things). While i’m talking about the particular logic that it’s based on, prolog is specifically based on a subset of logical formulas called Horn clauses. Logic programming is one of those things that is more or less a field unto itself, so in lieu of talking about it all day, I would recommend taking a look at prolog if you are looking for something really different than programming as you are used to it!

Foundations of Computer Science

This is the big one. What is it that a computer can do? What separates things that are possible from what are impossible? What is computer science, and what is outside of it? It turns out that we have a pretty good answer for most of that — Turing completeness.

In the early 1900s there was a huge debate among the titans of the math world about the basis of mathematics. A debate so large and important that it is known as the foundational crisis in mathematics. The question was about the foundations of math — the concept that math was based on before then, naïve set theory, was found to have fundamental issues, such as Russell’s paradox. Skipping a few steps in the middle, as part of the search for a new basis for mathematics, the mathematician David Hilbert ran an intensive search for a foundation that is consistent (cannot derive a contradiction), complete (every true statement can be proven true), and decidable (every statement can be decided (by an algorithm) to be true or false). The last requirement is known as the decision problem, or in German (the language of math at the time), the Entscheidungsproblem.

The answer to this search was a very unsatisfying one for Hilbert — no such algorithm can exist, because not all statements are able to be computed to be true or false by a theoretical device called a Turing machine, and anything that can’t be computed by a Turing machine can’t be computed at all. The study of Turing machines (by Turing) was the first time that the abilities of a computing machine were ever studied in such mathematical detail — there was some notion of what computation was outside of human thought due to mechanical computing devices (such as those Ada Lovelace wrote programs for), but what exactly a computer is, and what it could do were band new ideas that were just now being studied. As you may know, Alan Turing is often labeled as the father of computer science — and with good reason! His most groundbreaking, important work (in my opinion), one that was essential to the development of computer science as a whole, was a mathematical paper written in response to a famous mathematician, and began the field of studying computers and computing mathematically, which we know today as computer science. Mathematics is at the very core of computer science. The very idea of what is and is not computer science — what can be done by a computer, i.e what is decidable, is a notion that is mathematical, and was derived from a mathematical program.

Complexity/Computability: Slightly More Advanced

As mentioned in the previous section — the idea of what can or cannot be computed by a computer is an essential question in computer science. It turns out that this problem was not solved in the 1930s with the work of Turing, but rather just begun. There is actually a huge field of study determining exactly what problems are and are not decidable. There are a bunch of interesting (mathematical) techniques one can use to determine this, such as reduction, or diagonalization. As well as just determining what is possible, we can also figure out what is able to be efficiently done, which is what is studied in the field of complexity theory. The key idea is that there are different categories of how complex a problem is — categories more general than those we get form basic asymptotic analysis. The two categories that (i would say are) the most important are P and NP, short for polynomial and non-deterministic polynomial time. P is the category of problems that can be solved by a Turing machine in (asymptotically) polynomial (e.g 1,n, n2 , n3, etc) time. NP is the category of problems that can be solved by something called a non-deterministic Turing machine in polynomial time. There’s a lot of interesting mathematical work that has been done to lift this questions away from Turing machines to more general structures, the details of which are cool but out of the scope of this post. The general intuition, though is that generally we know of ways to do problems in P pretty efficiently, while it takes a lot more (computing) time to do problems in NP. Within NP there is another set of problems called NP-complete problems, which means that it can be used to simulate any other NP problem (given the right input), which means that if a way to do that problem effeciently is found, any other problem in NP can be done efficiently. A lot of very difficult important problems are in the class of NP — famous problems like the travelling salesperson problem, or the boolean satisfiability problem which have lots of real world applications. The key thing here is that nobody has been able to either (A) find a way of doing NP problems efficiently (i.e in Polynomial time), or (B) shown that you can’t solve NP problems in polynomial time. This question of if NP problems can be done in polynomial time (i.e if P=NP) has been eluding computer scientists for a long time — in fact the P=NP problem was one of the Millennium Prize Problems — mathematical problems identified as some of the most difficult and important problems in 2000, which have a million dollar prize attached to them.

Additionally, there is another system called the Lambda Calculus which has been proven to be exactly as powerful as Turing machines. The Lambda calculus started off as a way of creating proofs, but was shown to be inconsistent (meaning that you can mathematically prove a contradiction). There are ways to fix it and still use it to create proofs (more on that soon), but people noticed that it was also really good and useful for modeling computation, and then noticed that lambda terms were basically programs, which lead to the development of…

Functional Programming

Functional programming is a type of programming based on the lambda calculus. Instead of writing a series of instructions that are then executed in order, you write a function, and then evaluate it algebraically, more or less as you do with the lambda calculus. In fact, functional languages can be thought of as just extensions of the lambda calculus. I’m not going to get into the detail of how the lambda calculus works, but know that it is based on a certain set of rules for simplifying mathematical terms over and over again until you reach the value you want. Unsuprisingly, this leads to a very mathematical way of writing your programs, which has a lot of nice properties. In functional languages there are no (or much more limited) side-effects: when you run a function with the same input it should return the same value — there are no variables that change state to make this not true. Not being able to change variables once you make them (and as such be able to write loops…) might seem like a huge disadvantage, but it turns out that there are a lot of benefits, and that you can do everything you would want to do with side-effects with a functional programming language using the (you guessed it) mathematical ideas of recursion, and higher order functions.

Sometimes, though, you will want to do something that has a side-effect, like perhaps reading or writing from a file. You could just make your functional programming language do this without worrying (and make it “impurely” functional), or you could make sure that you manage this effect in some way. Why would we want to do this? Normally when create side effects in our programs, they just are made without us having to think about it at all. As such, it’s a lot easier for it to sneak up on us and cause problems where we don’t expect it. All to often some variable will be changed incorrectly somewhere, or some bad data will be read from somewhere, and then that change will propagate through the entire program until a crash is caused somewhere totally unrelated, and you have to spend all day ripping your hair out figuring where the problem was caused. All of that is a thing of the past if you use less side effects, and manage those that you do! It turns out that for managing side-effects, we can borrow more ideas from mathematics to help us out! From the field of category theory, there is a concept called a monad. This is pretty deep in the muck of some pretty advanced math, but just trust me that the structure that we see in that part of math we can take, put into our program and manage state with it. Just to use them in your code, though, you don’t need to understand how the exact mathematical rules work and why they are the way they are, you just have to use them (which is not particularly hard). If you want details on how this works, I would recommend learning you a bit of a haskell, and then reading one of the many monad tutorials out there. For now, just notice that it’s pretty cool that you can use this random mathematical object to deal with a seemingly unrelated problem in programming :). There’s actually a bunch of different mathematical things that influence functional programming, such as functors.

In fact, functional programming is so related to mathematics, that there is a correspondence between functional programs and mathematical proofs called the Curry-Howard isomorphism. How does this work, you might ask? Well…

Formal Proofs

There’s a lot of different ways to extend the basic lambda calculus. You can add types and polymorphism (generic functions), and add type constructors (so you can make new data types like lists), like you get in a lot of functional programming languages — such as Haskell and OCaml. There is another (very theoretically important) thing you can add — something called dependent types. This allows you define new data types that depend on values (like having a type that is a Vector of Length 3). Through some various type magic (which is incredibly interesting but somewhat out of scope), we can use this to build program terms that represent proofs. That we can both prove mathematical things using our programs, which is really useful because then the machine can check if the proof is correct! It’s really cool (to me, at least) that taking a programming language that people developed as a tool to make software, and then just simply extending it, you get something that is the “same” (up to a point) as mathematics. Computer scientists started using it because it’s useful, and then it just ended up being really useful to mathematicians. This is perhaps more computer science implementing mathematics than the other way around, but this does have cool effects on how we can write programs. These proof systems can also be leveraged to write programs with really strong guarantees — as well as being able to prove mathematical facts, you can prove (e.g) that your program works as you expect it to (in specified ways). This is the core idea behind proof-assistant languages like Coq and agda. There are also other programming languages that can do these things, but are designed to be primarily for regular programming, just with extra benefits and guarantees, the most notable of these being idris. This allows you to write much more correct programs, albeit at the cost of having to put extra effort into typing your program. Making these work better and allowing them to prove more things more easily is a huge field of study unto itself.

Formal semantics

How do we actually model these programs? We can prove properties of mathematics using the techniques from the previous section, but how do we actually represent (particular) programming languages to reason about them? There are a few ways (unusprisingly), chief of among these being operational semantics, and denotational semantics. In operational semantics, we view an execution of a program as a series of steps, each of which has a particular effect and meaning, and describing how these compose. The way we actually deal with this is by constructing some sort of mathematical model (such as a particular type of set), and manipulating this in accordance to constructed rules governing the effect of each step, analyzing the resulting sets, in particular how they relate to other programs, under relations like bisimulation. Denotational semantics work in a slightly different way, where we give particular meanings (called denotations) to individual terms in a language. This is a somewhat broader category than operational semantics, so I will allow the wikipedia article to give a more in-depth overview.

On a somewhat different level, when working with programming language we often want a model, for it, that is, a mathematical object that corresponds to how we think about the language. Common models use category theory or topos theory. This is a very active (and mathematically mature!) field of research.

All of these are very important both when creating a new programming language, a new programming language feature, or reasoning about how your program works, very pratical things.

AI/Machine Learning/Natural Language Processing

This is the big buzzword of the current era of computer science. I’m honestly kind of sick and tired of hearing and talking about it, so I’ll keep this part brief. Most of the things you hear about nowadays when it comes to AI are machine learning based. Machine learning is based on the notion of a neural network, a deeply mathematical and statistical structure that attempts to “learn” from inputs using these calculus-based techniques called gradient descent and backpropagation, which is based on multi-variable calculus and linear algebra. There’s also lots of different types of neural networks, all of which vary a lot depending on what mathematical tools glue them together. To just take one example, a common type is convolutional neural networks, which utilize the mathematical notion of a convolution, and idea from statistics.

There also exists more traditional AI methods which are still used today that are highly based in mathematics. Chess engines are traditionally (and still, for the most part) based on something called Alpha-Beta Pruning, which utilizes an analysis of trees to search all the possible moves that can be made more efficiently. Huge classes of AI problems can also be solve with other methods, like Monte Carlo simulation, SMT solvers, and so on, all of which are, you guessed it, applied mathematics.

For natural language processing, most of it nowadays is machine learning based, but some particular problems use more traditional (highly mathematical) methods. To give a quick example, morphological analysis is most commonly done using Finite-state transducers).

Coding/information theory

Say I want to send a message with some piece of information, but it may be corrupted on the way over (cosmic rays…), how do I make sure that my message gets received intact? Take the similar situation where I have a movie on a disk. How would I make sure it will still play even if it gets scratched? I could just send/store more copies of the same data, but that’s massively inefficient and leads to the problem where you can’t tell which is the original if one gets corrupted. The much better way of doing this is to use something called error-correcting codes. With these, you can tell if a message has been modified and perhaps even fix the error at the receiver, all while sending a shockingly small amount of extra data. If you want to know more about this, I would highly recommend the excellent 3 blue 1 brown video on the subject.

Say you want to store or send some data while taking the least amount of data possible, something everyone who uses a computer wants to do at some point (Totally uncompressed video takes gigabytes of data per second of video). What’s the best way to do this? The answer depends heavily on what you’re compressing, but is always highly based in mathematics. If you don’t want to lose any information, a common solution is to use Huffman codes, and if you don’t mind losing some data, (e.g some audio quality), transforms like the discrete cosine transform.

The study of data compression, error correcting codes, and cryptography, are all part of coding theory. Coding theory is itself in turn part of the greater field of information theory, created by one of computer science’s few true geniuses, Claude Shannon, who also first provided an equivalency between electronic circuits, and something called boolean algebra, a result which (in my opinion) was a prerequisite for computer science forming as a field.[2] Information theory is a massive field that I of course cannot do justice here, but generally formalizes the idea of “information,” and the provides many many tools for reasoning about how much information one gets from a message or system under various different circumstances. A fun introduction to more of the principles of information theory can be found in 3 blue 1 brown’s video on wordle.

A case study: Cryptography

Cryptography owes its entire existence to the results of abstract algebra and number theory. To see exactly why requires some knowledge of how the techniques actually work, so it makes the most sense to go deep rather than wide for this topic. I’ve also been keeping very high level for this post so far for the sake of not turning it into a thesis, but the lack of depth perhaps doesn’t convey how important math is for each of these. It will be good to see an example in detail. So then, let’s go into a widely used technique/algorithm in cryptography: Diffie-Hellman Key Exchange.

Diffie-Hellman Key Exchange is what is known as a public key protocol, where there is one key that can be spread everywhere (including to the party you are trying to communicate with) without worry, and a private key that must be kept secret from everybody. In some algorithms this is used to encrypt/decrypt data directly, but here we will be using it to share one secret. Contrast this with symmetric key cryptography, where the same key is used for everything. Often we want to use symmetric key cryptography (because, for example, it’s faster), but that requires that each party already shares the same key. How do you get the same key to begin with without other people also getting it? Diffie-Hellman, at the highest level, is just the idea of creating and sharing particular public keys that, when you combine them with the other party’s private key, leads to the same secret. What actually are these keys, though? What do I mean by combine?

The answer lies in the integers modulo a prime number. The modulus operator is (basically the same as) the idea of a remainder. The symbol for it that’s usually used in computer science is %. As a quick example, 7 % 3 = 1 (since 2*3 = 6, and 6+1 = 7). Modular arithemetic is the idea of doing all of your regular mathematical operations but instead of checking if two numbers are equal, you check if the result of using modulus on each number. So then, if I am doing arithmetic modulo, e.g, 7, then instead of checking if 9 = 2 (which I’m pretty sure it isn’t), i instead check if (9 % 7) = (2 % 7) (which it is). In modular arithmetic, we represent (a % n) = (b % n) by writing a ≡ b (mod n).

With that little bit of background, we can now see how we use that to get public key cryptography, and what it has to do with group theory. Let’s say Alice wants to have a shared secret with Bob. How should they go about doing this with Diffie-Hellman? Our first step is to choose some number n we are going to do modular arithmetic by. What number should we choose? All of our secrets that we want to keep hidden are going to represented as some number modulo n, so we presumably want to make it pretty big to prevent guessing, but do we have to care about anything else? It turns out that this number needs to be prime. This is because when this is the case, the integers modulo n form a finite field. We will soon see what this means and why it is important. Now that we have n chosen, we need to choose some integer to be our base, which we’ll call g. Now that we have these numbers, we can calculate the modular exponentiation of this number. Each party chooses some secret number, let’s say it’s a for Alice and b for Bob. These will be our private keys They then calculate g^a % n and g^b % n respectively, which will be our public keys. Here’s the kicker: it turns out that (g^a % n)^b % n = (g^b % n)^a % n, no matter what a and b are, so if Alice sends Bob her public key and vice versa, all Alice and Bob need to do is take the other person’s public key, take it to the power of their private key, and they’ll end up with the same number. That is, if Alice sends Bob g^a %n and Bob sends Alice g^b % n, then Alice can calculate (g^b % n)^a % n (Bob’s public key to the power of Alice’s private key), and Bob can calculate (g^a % n)^b % n, which gives them both the same number! Since they only ever share their public keys, this ends up being secure, because it turns out that finding this number without access to at least one of the private keys is really hard.

Why does that turn to be the case? The fact that those numbers end up being the same certainly isn’t obvious. It turns out that this is a property of finite cyclic groups, something from abstract algebra that these integers modulo n end up being. When you think about the problem in terms of finite cyclic groups, why this is the case is actually not a particularly difficult to see (which, for the sake of not turning this into an introductory group theory course, you’ll have to trust me on). This totally seperate field of math ends up giving us this really nice property that makes this whole thing possible, in a way that is elegant and totally non-obvious unless you do this mathematical analysis. This is also related to why n has to be prime. If this isn’t the case, then there are actually a bunch of numbers that you can also use! That is, if n is prime, the only solution to (g^a % n)^b % n = (g^b % n)^x % n is where x=a, but if n is not prime, there’s going to be a bunch of of other numbers that satisfy this! This makes your secret a lot easier to guess, which is obviously bad. It turns out that this is true due to a result in group theory called Lagrange’s theorem.

If you’re still confused after this, I would check out the excellent wikipedia article on the subject, which I’ve taken a lot from. You don’t need to understand the abstract algebra (like the group theory) to understand most cryptographic algorithms, but the reason the work, and why they made certain descisions when making them are all due to this math. Other cryptographic protocols are also steeped in mathematics, some examples are RSA and AES.

Speed round: Things I know less about but are math-heavy

Simulations/High Performance Computing

Do you ever wonder what people actually use supercomputers for? The answer is almost always scientific simulations. This can be of many different things, from simulations of individual particles, universes, how different vehicle designs experience drag, etc. We have lots of equations that describe how the physical world works, often with a huge degree of accuracy, but the amount of calculations required to model significant systems in the real world is immense, so these huge computer systems have to be created to accommodate them.

How do these simulations work? Perhaps unsurprisingly at this point, more often than not it involves modeling the objects of study with some sort of matrix representation, and then encoding the equations that govern how the objects interact as linear-algebraic operations on these matrices. Often designing these systems requires interpreting and implementing a computational representation of things like differential equations.

Data Science

The world is drowning in data. How do we deal with this? Perhaps unfortunately the answer has not been “stop collecting so much data,” but rather develop statistical techniques to understand the properties of this data. What exactly this entails I think varies a lot, varying from more or less stock statistical analysis (Computing the mean/variance/standard distribution, those sorts of things), to machine learning, to more advanced statistics, the details of which I will leave out here due to my unfamiliarity. My outsider’s impression is that a lot of it is machine learning, though. It seems like a big hammer you can smash your problems with that will probably provide something approaching correct and useful data.

Networking/Queuing theory

People love to use the internet.[citation needed] [3] How do we make sure it works as well and as fast as possible (at least on a technical level)? There are 2 levels of analysis that are used that I’m going to point out here. There’s the network theoretic analysis, which views the entire network as a graph (in the graph theory sense), and then analyzes the connections and distances from point to point in that way. The way a computer network is structured is called the network topology. There’s also the level of the individual messages (called packets), and thinking about how to best send and receive those messages, structure the protocols which they are defined by, and structure your network to lead to the least loss and most performance. This idea is called packet switching, and is studied using something called queuing theory. Queuing theory is also used a lot in determining how networks of computer servers should be constructed.

Signal Processing

Turns out that a lot of things that we use all the time (radio signals, wifi/cellular, audio, images, electricity in general, etc.) can be viewed as a sort of signal. Who knew? This might seem like a niche topic, but you might be surprised how often it comes up. Say that you are making a guitar pedal, making a synthesizer, or writing software for making music (like a digital audio workstation), all the effects and such that you would want to use there are different ways of processing signals. Say that you’re doing more low level work and need to write the firmware for a network card, oscilloscope, or most things electronic. You’re going to have to process a lot of signals. This is hugely applicable to radio in particular, both the traditional idea of radio (like amateur radio set up your grandpa has in his shed, or what your car can pick up), and the ones that are used all the time as a backbone for wireless digital communication in general, like bluetooth, wifi, cellular data, etc. I’m not as familiar with how these work internally as much as I am how to use them (I do know Fourier transforms appear a lot …), so I’ll let the Wikipedia article for digital signal processing do most of the talking.


There is of course a huge pile of topics I haven’t even grazed here (domain theory, computer algebra systems, other applications (like economics, …), and so on). The amount of topics is truly staggering. Any time you have some sort of structure that you need to reason about within the context of computer science (that is to say, any time you use a computer), there is certainly mathematiccal construct that underlies it. Math is everywhere. It’s not always the same as what you’ve seen before, and it’s worthwhile to know how to use it.

[1] Which isn’t out yet. tee hee :3. [back to location]

[2] I want to take a second to point out how amazing and important this was. It took developing electronic circuits from this totally bespoke and more or less informal process to a rigorous science which allowed for the rest of computer science to form. Absolutely crazy to me that this was just his master’s thesis, and that he would go on to make many more huge, foundational contributions to the field. Worshipful of this man.[back to location]

[3] You don’t mind it too much presumably, given that you’re reading this.[back to location]