# Count to sample

Hey all!

This week’s post is about an interesting relation between counting and sampling, motivated from an enlightening paper [MD02] by Prof. Martin Dyer. More specifically, the paper introduces dynamic programming as a technique to aid in approximate counting and uniform sampling using a dart throwing approach, however, this blog post is only about the example presented in the beginning of the paper where Prof. Dyer uses the count of the solutions to the 0-1 knapsack problem to sample from the set of these solutions uniformly at random in expected polynomial time (in the number of variables). You are encouraged to read [WK16] to learn more about the knapsack problem. I hope you like this post 🙂

So, the problem at hand is to be able to produce a solution to a given 0-1 knapsack problem which has been sampled uniformly at random from the set of all such solutions. Why would one require to do so? Well, among the many applications of the ability to sample uniformly at random from combinatorial sets, the one I encounter the most is in computer simulations and producing approximations. It is well known that the general knapsack problem is hard to solve, so counting the number of solutions will only be harder, let alone the problem of sampling one of these solutions. Hence, if we can approximate the count using some efficient technique, (almost) uniform sampling will become readily available. How?

Well, to answer this question, let’s first formalize the problem a little bit. The 0-1 knapsack problem can be written as the inequality $\sum_{j=1}^n a_j x_j \leq b$ for $x_i \in \{0,1\}$ for all $i$. Here, we assume that $0 \leq a_1 \leq \dots \leq a_n \leq b$ are all integers. Note that any single linear inequality in 0-1 variables with rational coefficients can be written in this form [MD02, WL75]. Denote by $S$ the set of all solutions to this inequation. Then, the problem at hand is to sample uniformly at random from $S$. So, back to the question. What if we don’t know $|S|$?

One way of sampling from $S$ without any knowledge of its size is to exploit the fact that we can efficiently check if a given assignment of 0’s and 1’s to our variables satisfies the inequality above, i.e. whether a given 0-1 assignment is in $S$ or not. This is crucial for obvious reasons, most important of which is that we will rely on this capability to accept only those assignments that lie in $S$ and reject others in a technique popularly known as rejection sampling. Start with randomly producing a 0-1 assignment of the variables and accept it if it satisfies the inequality and repeat otherwise. Simple, but not always efficient. Why?

Suppose $S$ is very small, say $O(1)$. Then, the probability of acceptance in one iteration when we have $n$ variables is $O(1/2^n)$, which is $o(1)$. Hence, in expectation, exponentially many rejections will happen before the first assignment is sampled, making the algorithm extremely slow. Can we do better? Well, let’s try Prof. Dyer’s approach.

The main idea behind the technique I am now going to talk about is eliminating the rejections above and directly sampling a solution to the problem. Since each solution must be sampled with probability $1/|S|$, it will be good to know $|S|$. Let us say we do. Now, fix a variable, say $x_3$. What is the probability that $x_3 = 0$ in a satisfying assignment to the problem? Well, since we are sampling uniformly at random from $S$, this probability is equal to the ratio of the number of solutions in which $x_3 = 0$ to $|S|$. Hence, it will be good to know the number of solutions in which $x_3 = 0$ after which the problem becomes trivial. So, how to compute this number?

[Enter dynamic programming.]

Let $F(r,s)$ be the number of solutions to inequality $\sum_{j=1}^r a_j x_j \leq s$, where $a_j$‘s and $x_j$‘s are the same as above. Clearly, we can write $F(1,s) = 1$ if $s < a_1$ and 2 otherwise. Also, observe that $|S| = F(n,b)$. To recursively compute $F(r,s)$, note that if we set $x_r = 0$, then we have the inequality $\sum_{j=1}^{r-1} a_j x_j \leq s$, whose number of solutions is $F(r-1,s)$. However, if we set $x_r = 1$, then we have the inequality $\sum_{j=1}^{r-1} a_j x_j \leq s-a_r$, whose number of solutions is $F(r-1,s-a_r)$. Thus, we can write $F(r,s) = F(r-1,s) + F(r-1,s-a_r)$ and solve the dynamic program recursively in $O(nb)$ time.

We are not quite done yet. We still do not know the number of solutions in which $x_3 = 0$. At least not directly. However, this is where the genius in [MD02] shines. We sample an assignment from $S$ as follows. With probability $F(n-1,b)/F(n,b)$, set $x_n = 0$ and with probability $F(n-1,b-a_n)/F(n,b)$, set $x_n = 1$. Once we have this done, use the dynamic program above to recursively set the values of the other variables until we are done. The claim here is that the resulting assignment is a valid solution to the inequality and is sampled uniformly at random from $S$. How? Here is why.

To see why the resulting assignment is in $S$ is easy. Once we assign the value to a variable, the dynamic program lets us sample the value of the next variable (or the previous one, whichever way you see it) based on the number of solutions in which the next (previous) variable is assigned a particular value. In other words, say we just assigned the value to $x_4$. Then, if $x_3 = 1$ is not a valid solution, the dynamic program will take care of it by itself.

To see why the resulting assignment is uniformly random, let us compute the probability $p$ that a solution $\{ x_1 = v_1, \dots, x_n = v_n \}$ is produced for some $v_i \in \{0,1\}$ such that $\sum_{j=1}^n a_j v_j \leq b$. We can write $p = Pr \{ x_n = v_n \mid x_1 = v_1, \dots, x_{n-1} = v_{n-1} \} Pr \{x_1 = v_1, \dots, x_{n-1} = v_{n-1} \}$. Since the assignment to $x_n$ is independent of other variables, we can write this as $p = \frac{F(n-1,b-a_n)v_n + F(n-1,b)(1-v_n)}{F(n,b)}Pr \{x_1 = v_1, \dots, x_{n-1} = v_{n-1} \}$. Now, we can expand $Pr \{x_1 = v_1, \dots, x_{n-1} = v_{n-1} \}$ similarly as $Pr \{x_{n-1} = v_{n-1} \mid x_1 = v_1, \dots, x_{n-2} = v_{n-2} \} Pr \{x_1 = v_1, \dots, x_{n-2} = v_{n-2} \}$. Again, the assignment of $x_{n-1}$ is independent of all others and hence, $Pr \{ x_{n-1} = v_{n-1} \} = \frac{F(n-2,b)(1-v_{n-1}) + F(n-2,b-a_{n-1})v_{n-1}}{F(n,b)} + \frac{F(n-2,b-a_n)(1-v_{n-1}) + F(n-2,b-a_n-a_{n-1})v_{n-1}}{F(n,b)}$. Keep going on like this to obtain that the probability $p = 1/F(n,b)$, which is what we wanted to prove.

Hence, we just saw a technique using counting and dynamic programming that allows us to sample exactly from a set of objects which would be difficult to sample from in general. This technique can be extended to many other combinatorial problems, however, the dynamic program to count may not be straight forward. However, it should now be clear that if exact counts are replaced by approximate counts (through some technique), then uniform sampling becomes almost uniform sampling. An important application of this is approximate counting of the number of perfect matchings in a given graph and then sampling (almost) uniformly at random one of these matchings.

The reverse direction is, however, pretty straight forward. If we knew how to sample, we can count easily. However, an exact count may not be possible using this approach, but we can count to a precision that is arbitrarily close to the exact count by using a technique similar to rejection sampling.

I hope this post was helpful in understanding this fascinating connection between sampling and counting of combinatorial objects. It is a huge area of research and lots of interesting applications have been explored. I am trying to learn more in this area and hope to come up with more fascinating examples in the future posts. Until then, ciao!

References

[MD02] Dyer, Martin. “Approximate counting by dynamic programming.” In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pp. 693-699. ACM, 2003.

[WK16] https://en.wikipedia.org/wiki/Knapsack_problem

[WL75] Wolsey, Laurence A. “Faces for a linear inequality in 0–1 variables.” Mathematical Programming 8.1 (1975): 165-178.

# Oddtown Chronicles

Hola!

Today, I will be talking about a classic puzzle that uses linear algebra to attack a combinatorics problem. I first learned of this puzzle during our research group seminar here at UNM, where Jared presented this problem to us as an exercise. Later on, I read through a couple of formalizations for the same, and will now be presenting my take on the problem taking helpful references from [MIT307], [UC726] and [UC07]. My interest in this problem originates from the fact that upon first hearing the problem statement, it doesn’t strike to be something that can be made easier using linear algebra, however, the approach baffles my mind every time I think about it. Let’s dive in.

The problem, which is referred to as Elwyn Berlekamp Theorem in [UC726], is as follows: In Oddtown, there are $n$ citizens and $m$ clubs satisfying the rules that each club has an odd number of members and each pair of clubs shares an even number of members. We then need to show that $m \leq n$, i.e. the number of clubs cannot be more than the number of citizens.

Before proceeding to the proof, I must point out that this problem has a close connection to Fisher’s inequality [MIT307], which is similar to a slight modification of the original problem. Apart from this, it is related to the study of designs, set systems with special intersection patterns. [MIT307] shows how such a system can be used to construct a graph on $n$ vertices, which does not have any clique or independent set of size $\omega \left( n^{1/3} \right)$. I will briefly talk about these results towards the end of this post.

Let us now focus on proving that the number of clubs in Oddtown cannot exceed the number of citizens. Formally, let $\mathcal{C} = \{ C_1, \dots, C_m \}$ be the set of clubs in Oddtown and $\mathcal{P} = \{ P_1,\dots,P_n \}$ be the set of citizens (people). We start by paying attention to the fact the no two clubs are allowed to share an odd number of members. The entire problem is full of even-odd type of constraints, which suggests that we should attack this problem using some form of parity checking. The easiest such setting is to work in the field with characteristic 2. In other words, we will perform all addition and multiplication operations modulo 2 from now on.

Within this setting, for each club $C_i \in \mathcal{C}$, define its membership vector $v_i = \left[ v_{i,1},\dots,v_{i,n} \right] \in \mathbb{Z}_2^n$, where $v_{i,j} = 1$ if citizen $j$ is a member of club $i$, and 0 otherwise. Intuitively, this is similar to each club maintaining a ledger in which all the citizen names are listed and only those names are marked that correspond to member citizens for that club. These ledgers must satisfy the property that for any pair of clubs, if we compare their ledgers, the number of common members must be even (which allows no common members as well). Mathematically, we can represent this constraint using dot product of the membership vectors of the clubs.

Notice that if the number of citizens common to any given pair of clubs is even, then in our field, the dot product of the membership vectors of these clubs will be zero. Hence, in the magical world of modulo 2, all clubs have membership vectors that are orthogonal to each other.  More importantly, note that none of these vectors is identically zero since the number of members in each club must be odd. Hence, the puzzle now reduces to asking the maximum number of pairwise independent vectors that exist in $\mathbb{Z}_2^n$. This is where our friend, linear algebra, kicks in.

Form a matrix $M \in \mathbb{Z}_2^{n \times m}$ whose columns are the membership vectors of the clubs. Note that since all columns are pairwise independent and none of them are identically zero, the rank of this matrix is exactly $m$. However, we know that the rank of any matrix must never exceed the number of rows or the number of columns. Voila! We’re done. The number of rows in this matrix is $n$, which immediately proves that $m \leq n$. Try holding your right ear using your left hand while swirling your arm around the back of your head. I felt exactly like that my first time with this puzzle!

Another way to see this without forming the matrix is by noticing that the number of pairwise independent vectors in $\mathbb{Z}_2^n$ cannot be more than the dimension of $\mathbb{Z}_2^n$, which is $n$. Hence, the same result as above. No matter how you prove it, we always obtain the condition we wanted to prove. See! that’s why we must befriend linear algebra.

A small technicality. Can we have exactly $n$ clubs, if not more? The answer is, trivially, yes. Since the number of common members to any pair of clubs must be even, set it to zero! Have a dedicated club for every man and you’re done. If you call this cheating, think about this. How cool will be it to have your own clubhouse?!

Maximality

Given this construction above, we can prove some more exciting results about the fate of people in Oddtown. Specifically, we can prove that for every $n$, we can always save the Mayor’s money by building no more than two clubs that satisfy all the constraints. Furthermore, we can prove that this system is maximal, in the sense that no more clubs can be added without violating at least one of the conditions. Let’s see how.

So the task at hand is to divide the $n$ citizens into 2 clubs, say $C_1$ and $C_2$, such that (1) the number of citizens common to both the clubs is even, (2) the number of citizens in each club is odd and 3) adding even one more club will violate (1) and (2). If $n$ is even, then one way to divide the citizens is to allocate one citizen to the first club and the remaining citizens to the second club. This satisfies (1) and (2). To see if this satisfies (3), add another club, say $C_3$ to the town. Since this club must have even number of common members with the other two clubs,  the number of common members between $C_3$ and $C_1$ must be zero. This requires all the members in $C_3$ to also belong to $C_2$, which immediately gives a contradiction since (2) will require this number of common members to be even while (1) requires them to be odd. Hence, this distribution of citizens into two clubs is maximal.

What happens when $n$ is odd? Trivially, put all the members into one club and the problem is solved. The second club is not needed at all. The addition of any more clubs will not be possible because of the same argument as above. Hence, one club is maximal in this case. Thus, in both the cases, no more than two clubs are required for a maximal membership of citizens.

Fisher’s inequality

We now discuss a result that is closely related to a slight modification of the rules in Oddtown. We remove the restriction on the size of the clubs and require that every two clubs share a fixed number $k$ of members. We assume that if two clubs have exactly the same members, then they are the same. Fisher’s inequality then states that the number of non-empty clubs is at most $n$, similar to the result above. The proof of this inequality is slightly involved, although the basic principle is the same. We consider the membership vectors of the clubs and prove them to be linearly independent in some field, which in this case will be the field of real numbers $\mathbb{R}^n$.

To see how this proof works, an important observation needs to be made : There is at most one club with exactly $k$ members. Wondering why? Well, let’s assume otherwise and try to get a contradiction. Let there be at least two clubs with exactly $k$ members. Then each of these must have the same members by the condition of the problem. This contradicts the fact that these clubs are distinct (because of our assumption) and hence, we have proved that at most one club can have exactly $k$ members.

Now, with this critical observation in hand, we proceed with the proof as follows. Let $\mathcal{C} = \{ C_1,\dots,C_m \}$ be the clubs of size $|C_1|,\dots,|C_m|$, respectively (assuming each of these is non zero). The size here refers to the number of members in the clubs. Represent by $C_i \cap C_j$ the set of common members in clubs $C_i$ and $C_j$. Then, according to the given problem, $\left| C_i \cap C_j \right| = k$ for each pair $i,j \in \{ 1,\dots,m \}$ with $i \neq j$. We define the membership vectors of each club similar to the proof above as $v_i = \{ v_{i,1},\dots,v_{i,n} \}$, where $v_{i,j} = 1$ if citizen $j$ belongs to club $i$ and 0 otherwise. Clearly, in $\mathbb{R}^n$, we have the dot product of any $v_i$ and $v_j$ to be $k$, if $i \neq j$. All we have to do now is to prove that these membership vectors are linearly independent in $\mathbb{R}^n$.

To see this, assume they are not. Then, using standard practice in such proofs, assume there exist real numbers $\alpha_1,\dots,\alpha_m$ such that $\sum_{i=1}^m \alpha_i v_i = \mathbf{0}$, where $\mathbf{0}$ is the zero vector in $\mathcal{R}^n$. Then, we must also have $\parallel \sum_{i=1}^m \alpha_i v_i \parallel_2 = 0$, where $\parallel v \parallel_2$ denotes the 2-norm of the vector $v$. Hence, we have $0 = \left( \sum_{i=1}^m \alpha_i v_i \right) \cdot \left(\sum_{i=1}^m \alpha_i v_i \right)$, since the 2-norm can be written as the dot product of the vector with itself. We can rewrite this sum as $0 = k \left( \sum_{i=1}^m \alpha_i \right)^2 + \sum_{i=1}^m \alpha_i^2 \left( |C_i| - k \right)$. Using our critical observation now, each of the two terms on the right of this equation are non-negative, which implies that they are both identically zero.

Hence, we have (1) $\left( \sum_{i=1}^m \alpha_i \right)^2 = 0$, which implies $\sum_{i=1}^m \alpha_i = 0$, and (2) $\sum_{i=1}^m \alpha_i^2 \left( |C_i| - k \right) = 0$. From (2), since at most one club, say $C_r$ can have exactly $k$ members, each $\alpha_i = 0$ whenever $i \neq r$. However, then, from (1),  $\sum_{i=1}^m \alpha_i = \alpha_r = 0$, which contradicts the fact that the membership vectors are linearly dependent. Hence, Fisher’s inequality holds.

Fisher’s inequality has a number of cool applications. Two interesting examples are presented in [MIT307], where they prove the following : (1) For a fixed $k$, let $G$ be a graph whose vertices are triples $T \in {[k] \choose 3}$ and $\{ A, B \}$ is an edge if $|A \cap B| = 1$. Then $G$ does not contain any clique or independent set of size more than $k$. (2) Suppose $P$ is a set of $n$ points in the plane, not all on one line. Then pairs of points from $P$ define at least $n$ distinct lines.

I hope you liked this post of mine. My next post will likely be on another interesting puzzle, maybe from one of the CACM journals. Until then, stay tuned. Ciao!

References

[MIT307] http://math.mit.edu/~fox/MAT307-lecture15.pdf