4
$\begingroup$

I read the following problem, claimed to be in the IMO shortlist in 1988:

A test consists of four multiple choice problems, each with three options, and the students should give an unique answer to each problem. After the test, one finds that for every three students, there is one problem to which their answers are pairwise different. What is the maximum number of students who took the test?

The answer is 9, with the following construction (actually a ternary code of length $4$ and hamming distance $3$).

1:0000    2:0111    3:0222
4:1012    5:1120    6:1201
7:2210    8:2102    9:2021

The following is my question:

Let $C$ be a set of $q$-ary words of length $\ell$. What is the maximum size of $C$ such that, for any $k$ words ($k\le q$), there is a position at which the $k$ words are pairwise different?

When $k=2$, this is the usual $q$-ary code with hamming distance $1$. Corresponding to the problem above, we have $q=3$, $\ell=4$ and $k=3$.

I don't know if this problem is previously studied. I didn't find any general answer.

More specifically, can you give an answer for $q=3$, $\ell=6$ and $k=3$?

$\endgroup$

1 Answer 1

7
$\begingroup$

It is called perfect hash families in the design theory and computer science literature.

A perfect hash family PHF$(N; k, v, t)$ is an $N \times k$ array on $v$ symbols with $v \geq t$, where for every $N \times t$ subarray at least one row consists of distinct symbols. The following shows a PHF$(6; 12, 3, 3)$. $$\left[ \begin{array}{cccccccccccc} 1&0&0&2&2&2&1&1&2&1&0&2\\ 2&0&1&1&2&0&2&0&1&1&2&1\\ 2&0&2&1&2&1&0&2&2&1&1&0\\ 0&1&2&2&1&2&2&0&1&1&0&0\\ 2&0&1&2&1&1&2&2&0&1&2&1\\ 0&2&1&0&2&2&2&1&0&1&2&1 \end{array}\right]$$

For instance, if we take the second, third and fifth columns, then the the second row is $(0,1,2)$, which consists of distinct symbols. It is readily checked that for any combination of three columns, at least one row is free from identical symbols. To avoid triviality, we generally assume that there is at least one row in which all $v$ elements appear and that $k \geq v$. In the combinatorial design theory literature, the parameter $t$ is called strength.

By regarding your codewords in $C$ as the columns, you can easily see that a PHF$(l; \vert C\vert, q, k)$ is exactly the code you defined. Note that my $k$ is your $\vert C \vert$ while your $k$ is my $t$.

Perfect hash families are introduced by Kurt Mehlhorn in 1980's as an efficient tool for compact storage and fast retrieval of frequently used information, such as reserved words in programing languages and command names in interactive systems. PHFs have since found many other applications in computer science and combinatorics (See, for example,

A. Fiat, M. Naor, Broadcast encryption, Lecture Notes in Computer Science, 773:480–491, 1994.

S. R. Blackburn, M. Burmester, Y. Desmedt, P. R. Wild, Efficient multiplicative sharing schemes, Lecture Notes in Computer Science, 1070:107–118, 1996.

D. R. Stinson, T. van Trung, R. Wei, Secure frameproof codes, key distribution patterns, group testing algorithms and related structures. J. Statist. Plann. Infer., 86 (2000), 595–617.

for some use found in the last century in cryptography and related areas).

Very recent applications include compressive sensing in signal processing and covering arrays for software interacting testing in engineering.

Their existence and generalizations such as separating hash families have been extensively investigated. So if you're interested, you've got a large body of literature to dig in. Since you tagged your question as "combinatorial designs," you find informative the section on PHF in the Handbook of Combinatorial Designs. It's Section 43 of Chapter VI (page 566-568).

There are so many results on bounds on parameters, explicit constructions, and applications inside and outside of mathematics. I think the section in the Handbook of Combinatorial Designs is a good starting point. Updated tables of known PHFs can be found here:

http://www.public.asu.edu/~redoughe/phf_pages/phf_tables.html

Also, if you are curious, the equivalence between a matrix satisfying these conditions and a perfect family of hash functions in the computer theoretic sense is as follows:

A $(k,v)$-hash function is a function $h\ :\ A \rightarrow B$ between finite sets $A$ and $B$, where $\vert A \vert = k$ and $\vert B \vert = v$. The function $h$ is perfect with respect to a subset $X \subseteq A$ if $h$ is injective on $X$, that is, if $h\vert_X$ is one-to-one. Given integers $k$, $v$, and $t$ satisfying $k \geq v \geq t \geq 2$, let $H$ be a set of $N$ $(k, v)$-hash functions between finite sets $A$ and $B$, where $\vert A \vert = k$ and $\vert B \vert = v$. Then $H$ is a perfect hash family of parameters $(N; k, v, t)$ if for any $X \subseteq A$ with $\vert X \vert = t$, there exists at least one $h \in H$ such that $h\vert_X$ is one-to-one. This definition is equivalent to the one given earlier. In fact, one can obtain a perfect family of hash functions by regarding each row of a matrix representation of a PHF$(N; k, v, t)$ as a function $h$, where the value in column $i$ of the row for $h$ represents the value of $h(i)$.

$\endgroup$
11
  • $\begingroup$ This answer is perfect for me. Thank you. $\endgroup$
    – Hao Chen
    Jun 20, 2014 at 15:10
  • $\begingroup$ By the way: This PHF(6;12,3,3), is it "optimal"? Is there a PHF(6;13,3,3)? $\endgroup$
    – Hao Chen
    Jun 20, 2014 at 15:11
  • $\begingroup$ @HaoCHEN It is optimal in the sense that no PHF$(5;12,3,3)$ exists, i.e., you need at least $6$ rows to have $12$ columns with the all-distinct-symbol property. But I need to check the existing literature or prove it myself to see if it's optimal in your sense. I'm pretty sure it's the best among known PHF$(6;\vert C\vert,3,3)$, though. I'll edit my post if I find the answer to the optimality. Or you can prove it and improve this post, too! $\endgroup$ Jun 20, 2014 at 15:26
  • 1
    $\begingroup$ In fact, I just saw a construction of (5;13,3,3), found by exhaustion (it seems), and I just checked it by program. Here it is: ['000000', '011111', '000112', '001120', '022210', '101200', '112022', '120111', '202101', '210200', '220221', '221011', '221202'] It is possible that I misunderstand something. $\endgroup$
    – Hao Chen
    Jun 20, 2014 at 15:30
  • 1
    $\begingroup$ Here is the link bbs.tianya.cn/post-71-144903-1.shtml $\endgroup$
    – Hao Chen
    Jun 20, 2014 at 16:33

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

Not the answer you're looking for? Browse other questions tagged or ask your own question.