I'm busy working on my blog posts. Watch this space!
I Can't Get No Satisfaction (or maybe you can)
July 20, 2018
"I can't get no satisfaction. Cause I try and I try and I try and I try" - Mick Jagger
Mathematicians involved with a set of problems known as 3-SAT problems know exactly just how Mick Jagger feels. You try and you try and you try ....
But wait, we're getting a bit ahead of ourselves; let's take this gently. First of all, even though this is a relatively introductory article, we're going to assume that you have at least a very basic knowledge of Boolean algebra : the concepts of TRUE and FALSE and the operations AND, OR and NOT. You comfortable with that so far? Excellent, let's move on.
Next, let's align on some basic notation. Anybody who has done any kind of Boolean work in a maths class before will be probably more accustomed to the signs ∧ (and) ∨ (or) and ¬ (not), whereas people who came to it via computer programming may be more familiar and comfortable with && (and) || (or) and ! (not). For the avoidance of any confusion and to try and make things easier, for now we'll just stick with the words AND, OR and NOT.
Now, let's set up a quick equation:
x = a AND (a OR b)
Are there any values for a and b for which the output (x) is TRUE? A quick look-over this very simple formula should reveal that yes there are: a must be TRUE, in which case b can be either TRUE or FALSE, it doesn't matter. On the other hand, if a is FALSE then x can never be TRUE regardless of the value of b.
Not all equations can resolve to give an output of TRUE, for example take a look at this one:
x = a AND (NOT(a) AND b)
Now, regardless of what values we pick for a and b, the equation will always give the answer FALSE: try it for the different values of a and b. Whereas the slightly different equation
x = a AND (NOT(a) OR b)
can be made to resolve to TRUE (in this case, if and only if both a and b are TRUE)
Okay, off to a good start, now you're getting the general idea. We can now go to immediately creating what is known as a SATISFIABILITY problem. The steps for this are very, very simple.
step 1: Make up a Boolean logic equation with as many variables as you like (a, b, c, d, e, ...) - and you are allowed to repeat the variables in different places
step 2: Ask the question (note: this is known as a DECISION problem, because we are looking for a straight yes or no answer) - which is simply: "can you assign values (TRUE or FALSE) to each of the variables such that the whole equation equates to TRUE?"
And that's basically it!
Okay, all well and good, but how about these problems known as 2-SAT and 3-SAT problems that keep cropping up whenever there is a discussion about P versus NP. Well, there are a couple of little extra rules to be followed which are as follows:
The first rule is that whatever Boolean equation you concoct must be in what is known as a Conjunctive Normal Form (known to mathematicians as "CNF") Aaargh, this sounds horrible, but basically what this means in practice is:
a. You are only allowed to use the operators AND, OR and NOT (together with brackets to enclose things to denote precedence as you would do in a normal algebraic equation). In other words, for anyone who has ever done any electronic hardware design, you are not allowed to use the more exotic NAND, NOR, XOR operators for example.
b. The expression must be presented as an "AND of ORs" (to roughly paraphrase, the things inside brackets must be OR-ed together (you are also allowed to negate any of the elements inside the brackets with a NOT) and then the sets of brackets must then be all AND-ed together.
For example, these two equations are in Conjunctive Normal Form:
(a OR b) AND (c OR d)
(a OR NOT(b)) AND (a OR b)
NOT(a OR b)
(a AND b) OR (c and D)
are both not in Conjunctive Normal Form.
These two restrictions seem at first vastly limiting to the number of equations we can come up with, but in practice, this isn't really the case: due to a variety of techniques and mathematical jiggery-pokery (double negative law, De Morgan's laws, and the distributive law for example) which we will not go into here, so for now just trust me on this until you read up further, we can in fact turn every propositional formula into an equivalent one which is in CNF.
For example (a AND B) OR c ... which is not in CNF
can be turned into (a OR c) AND (b OR c) ... which is!
The second rule basically concerns the number of elements (known as "literals") we can have within any set of brackets. If we allow ourselves up to two literals in each set of brackets, then the problem is known as a 2-SATISFIABILITY (or 2-SAT) problem. If we allow ourselves up to three literals in each set of brackets, then - you guessed it! - it's a 3-SAT problem.
There is nothing in the world to stop us creating 4-SAT problems, 5-SAT problems - or even a MILLION-SAT problem if we have enough paper to write it down on. In practice though, 2-SAT and 3-SAT problems can be quite hard enough, thank you, so mathematicians prefer to stick to those.
Here's a 2-SAT problem:
(a OR b) AND (a OR c) AND (b OR d) AND (c OR d) AND (d OR NOT(e))
This problem has 5 variables (a, b, c, d and e), 5 "clauses" (ie five sets of things in brackets) and 10 "literals" (ie. the equation contains two a's, two b's, two c's, three d's and an e)
Importantly, though for us, each clause (set of brackets) only contains two literals (things inside brackets. Note: NOT(e) still counts as one literal). This makes it a 2-SAT problem.
Moving swiftly on, as you're getting the idea quickly, here's a 3-SAT
(a OR b OR c) AND (a OR b OR NOT(b)) AND (b OR c OR d)
This one has four variables (a,b,c,d) three clauses, nine literals, and each clause has no more than three literals. The fact that there are three literals in each clause makes this one a 3-SAT. Incidentally, there are quite a few permutations of TRUE/FALSE assignments that make this 3-SAT equate to TRUE, but the sharp-eyed will notice that if b is TRUE then the whole equation will equate to TRUE regardless of the values of the other variables.
What's this got to do with the P versus NP problem? It goes back to the decision problem which we ask of our concocted equation: is there any way, yes or no, (any way at all, the number of different ways doesn't matter) that we can assign values to the variables to make the overall equation TRUE?
If someone comes along and says "yes you can, and here are the values of the variables to prove it" then it is very, very easy to CHECK that the answer is correct. On the other hand, finding these values can be a very much more difficult matter. Now, there are currently algorithms which can solve 2-SAT problems in Polynomial time. If your appetite is whetted enough, there are lots of resources on the internet which can talk you through this: one of the more easy-going ones is here : http://www.cs.tau.ac.il/~safra/Complexity/2SAT_Handouts.pdf
However, finding values to 3-SAT questions is a whole different ball game and there are no algorithms which can solve them in Polynomial time (although admittedly there are some very powerful ones) - the 3-SAT problem is in fact NP-Complete. 3-SAT is one of mathematician Richard Karp's notable "21 NP-complete problems", and it is famously used as a starting point for proving that other problems are also NP-hard - essentially by relating the other problems to an equivalent 3-SAT problem.
It is this particular nature of 3-SAT - the fact that other initially seemingly unrelated problems can be essentially turned into 3-SAT problems that make them so well worth investigating. For example - and at first a little surprisingly - the "Graph Colouring problem" can essentially be turned into a 3-SAT problem, as indeed can the famous "Knapsack problem".
This is only meant to be an introductory article, so that's all for now. YouTube is full of great articles on 2-SAT and 3-SAT at a whole range of levels for anyone who wants to find out more.
That's enough from me for now. As Mick himself might sign off : "It's only P versus NP but I like it"