2
$\begingroup$

The question I have, is how to generate the following collection of subsets:

Given a set $S$ of size $n$. I want to find a sequence of $k$ subsets of fixed size $m$, $0<m<n$, such that at each stage $i$ in the sequence ($1 < i \leq k$), the largest intersection between any such pair of subsets, is as small as possible.

Any guidance is appreciated.

Edit:

Given the formulation by Dima Pasechnik below, I would like to add an example of such a sequence of subsets, or codewords:

1 1 1 1 0 0 0 0
0 0 0 0 1 1 1 1
0 0 1 1 1 1 0 0
1 1 0 0 0 0 1 1
1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1

In this example the number of codewords $k=6$, their weight $m=4$, and $n=8$. As can be seen, the largest pairwise Hamming distance decreases as we continue the sequence (which is fine for me).

My interest lies in finding an algorithm that constructs such a sequence in an efficient way for any non-degenerate choice of $n,k,m$.

(I am aware that minimal intersecting subsets is a related question, but not the same.)

$\endgroup$
6
  • $\begingroup$ in the form given in your Edit, the question makes little sense; indeed, just enumerate all the $m$-subsets of the $n$-set. This is easy to do efficiently. (However, this way the guaranteed minimal distance is 2, which is not what most people want). $\endgroup$ Jan 15, 2015 at 16:08
  • $\begingroup$ @DimaPasechnik The case that I am concerned with has for example $n = 100, 1000$ or $10^6$, $m = n/2$ or $n/3$ and $k=5, 10$ or $20$ (to give an impression). So I want to avoid enumerating/searching over all $m$-subsets as there are simply too many ($n \choose m$). Intuitively, I want a sequence of subsets that are "maximally different" vis-a-vis the subsets already in the sequence. $\endgroup$
    – Jim
    Jan 15, 2015 at 18:02
  • $\begingroup$ If your $k$ is so small you may just choose your subsets at random. Say, for $m=n/2$ the intersection of two of them will be of size about $n/4$. $\endgroup$ Jan 15, 2015 at 20:55
  • $\begingroup$ @DimaPasechnik That is exactly what I want to contrast it with. I would be happy with a strong heuristic approach. $\endgroup$
    – Jim
    Jan 16, 2015 at 18:06
  • 1
    $\begingroup$ Consider variants on Hadamard matrix generation. Choose integers [1, n/2] for the first set, [1, n/4] union [n/2 + 1, 3n/4] for the second, and for the remainder, sample half the elements from each of the quarter intervals. This guarantees small intersections with the first two sets and with high probability small intersections among the latter sets. Gerhard "Probabilistic Hadamard Matrices? Hmmm, Interesting..." Paseman, 2015.01.16 $\endgroup$ Jan 16, 2015 at 18:24

2 Answers 2

5
$\begingroup$

This is a coding theory question. You want to find a binary constant weight $m$ code with $k$ codewords, and of maximal possible distance. There was a lot of research done on this.

For the specific case of $k$ small compared to $n$, one can take sufficiently many copies of a nice constant weight code with $k$ words. E.g. for $k=14$ you can take the the indicator functions of the hyperplanes of the 3-dimensional affine space over $\mathbb{F}_2$ (which has 8 points). By taking this $n'$ times, this would give you for $n=8n'$ and $m=4n'$, 14 subsets of size $m$ that have either empty intersection, or intersection of size $2n'$.

$\endgroup$
3
  • $\begingroup$ How would this generalize to $k=15, 20$ or $30$? What if $n$ is odd. I have a feeling that the example in the edit is misleading because of its specific choice of $m=n/2$. It could just as well have been $m=2+n/5$. I'm interested in something that generalizes easily to arbitrary $n,k,m$. $\endgroup$
    – Jim
    Jan 18, 2015 at 14:16
  • $\begingroup$ generalising for arbitrary $k$ is hard, as optimal codes are in general not known. $\endgroup$ Jan 18, 2015 at 15:30
  • $\begingroup$ to generalise my example to another small $k$, take a the affine space of bigger dimension; in more generality, you can play with putting together different 2-designs with appropriate parameters. $\endgroup$ Jan 18, 2015 at 15:36
0
$\begingroup$

maybe you would be interested in the "Minimal Overlapping Algorithm" presented in "Two New Combinatorial Problems involving Dominating Sets for Lottery Schemes - Werner R Grundlingh" (at page 155):

The Algorithm has been transformed in a vba code and is downloadable from the forum "http://wheels.forumcommunity.net/" using this link: http://wheels.forumcommunity.net/?act=Attach&type=post&id=401682134

$\endgroup$
1
  • $\begingroup$ The dissertation can be found here. I looked at Section 5.3 and Algorithm 4 (pseudo code). I believe line 3 in Algo 4 is where the magic happens. In the 2nd paragraph the author writes "the frequencies and overlapping function are maintained" (paraphrased). Why is that? Why would that be advantageous? @NP2P $\endgroup$
    – Jim
    Nov 26, 2015 at 17:37

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

Not the answer you're looking for? Browse other questions tagged or ask your own question.