My question is, why is it the weighted average of the logarithm of the pi. What is the clear intuition here?
The intuition here is difficult to make clear. However, I think the best way to clarify is to say that
$$
S=-k\sum p_i \log(p_i)\;, \qquad (1)
$$
is the only reasonable measure of "uncertainty" of a probability distribution that can be used to make sure the assignment of probabilities is done "fairly" in light of the available information.
You can contrast $S$ with some other, less good, potential measures of broadness or uncertainty that we could attempt to maximize. For example
$$
S' = \sum_i p_i^2\;, \qquad {(2)\quad(BAD!)}
$$
could be used as a measure of "uncertainty" in a distribution. Etc. Etc. Etc. There are an infinite number of possible measures of uncertainty, but (1) is the best.
Section 11.3 of Jaynes textbook "Probability Theory: The Logic of Science" gives an argument for why the "Information Entropy" must take on the form it does.
Basically, if the "amount of uncertainty" $S$ is to be continuous, consistent with common sense in that many possibilities are "more uncertain" than fewer, and consistent among different ways to calculate the probability it has to have the form:
$$
S = -k\sum_i p_i log(p_i)\;.
$$
You can understand where this form comes from by considering that the fundamental equation that $S$ must satisfy to be a consistent measure of probability uncertainty is:
$$
S(p_1,p_2,p_3) = S(p_1, p_2 + p_3) + (p_2 + p_3)S(\frac{p_2}{p_2+p_3},\frac{p_3}{p_2+p_3})\;, \qquad(3)
$$
which says that, if we determine that event 1 occurs, we lose uncertainty $S(p_1, p_1 - \sum_i p_i)$, but $p_1 - \sum_ip_i$ of the time we have to restrict to the non-event-1 case and consider the other possibilities.
Eq (3) can be generalized to:
$$
S(p_1,\ldots,p_N) = S(w_1,\ldots,w_m)+w_1S(p_1/w_1,\ldots,p_n/w_1)+\ldots +w_mS(p_{N-n+1}/w_m,\ldots p_N/w_m)\;,
$$
where $m$ and $N$ are integers and $m<N$. Here, we have written our fundamental equation as if we have partitioned the $p_i$ into $m$ subsets of $n$ terms, where $N=nm$, but we could partition in unequal groups too if we would like.
As an example, we would want:
$$
S(1/2, 1/3, 1/6) = S(1/2, 1/2) + \frac{1}{2}S(2/3, 1/3)
$$
And, as another example of partitioning, we would want:
$$
S(1/2, 1/3, 1/6) = S(5/6, 1/6) + \frac{5}{6}S(3/5,6/15)
$$
And, as another example, we would want:
$$
S(1/4, 1/4, 1/4, 1/4) = S(1/2, 1/2) + \frac{1}{2}S(1/2, 1/2) + \frac{1}{2}S(1/2,1/2)
$$
And, as another example, we would want:
$$
S(1/N, 1/N, \ldots, 1/N) = S(w_1/N, w_2/N,\ldots) + \frac{w_1}{N}S(1/w_1,1/w_1,\ldots,1/w_1) + \frac{w_2}{N}S(1/w_2,\ldots) + \ldots\;,
$$
Defining a new set of probabilities $q_i = w_i/N$, and using the above equation, we have
$$
S(q_1, q_2, \ldots) = S(1/N, 1/N, \ldots) - \sum_j q_j S(1/w_i, 1/w_i, \ldots)\;, \qquad
$$
in general.
Now specialize to the case of all of the $q_i$ being equal, so we have $q_i = n/N$ (and $w_i=n$ and $N=nm$) to find:
$$
S(1/m, 1/m, \ldots) = S(1/N, 1/N, \ldots) - S(1/n, 1/n, \ldots)
$$
Or, with $s(n) \equiv S(1/n, 1/n, \ldots, 1/n)$ we can write:
$$
s(m) = s(nm) - s(n) \qquad (4)
$$
The unique continuous function that satisfies Eq. (4) is
$$
s(m) = k\log(m)
$$
Therefore:
$$
S(q_1, q_2,\ldots) = k\log(N) - k\sum_i q_i \log(w_i) = -k\sum_i q_i \log(q_i)
$$
Of course, we can rename the $q$'s to $p$'s and we can write:
$$
S(p_1, p_2,\ldots) = -k\sum_i p_i \log(p_i)\;.
$$