Özet:
We investigate the query complexity of additively estimating entropy of a discrete probability distribution in two settings. Let p be an unknown probability distribution on [n] := f1; 2; : : : ng, and de ne two kinds of queries: A SAMP query takes no input and returns x 2 [n] with probability p[x]; a PMF query takes as input x 2 [n] and returns the value p[x]. In the SAMP model of query complexity, the only allowed interaction with p is via SAMP queries. In the SAMP+PMF model, both SAMP and PMF queries are utilized to interact with p. In particular, we consider the task of estimating the entropy of p to within (with high probability). For the usual Shannon entropy H(p), we review the matching upper and lower bounds established by Valiant and Valiant in the SAMP model, and describe the algorithm constructed by Canonne and Rubinfeld in the SAMP+PMF model. For the R enyi entropy H (p), we analyze three di erent matching upper and lower bound pairs introduced by Acharya et al. in the SAMP model. We show that (log2 n= 2) queries are necessary to estimate the Shannon entropy H(p) in the SAMP+PMF model, matching a recent upper bound of Canonne and Rubinfeld. In addition, we prove that {u100000} n1{u100000}1= queries are necessary and su cient to estimate the R enyi entropy H (p) in the SAMP+PMF model, where > 1. This complements recent work of Acharya et al. in the SAMP model that showed O(n1{u100000}1= ) queries su ce when is an integer, but roughly n queries are necessary when is a noninteger. All of our lower bounds extend to the SAMP+CDF model, where SAMP and CDF queries (given x, return P y x p[y]) are allowed. We give a matching lower bound on estimating the support size (the number of domain elements with nonzero probability) of an unknown distribution p in the SAMP+CDF model. Lastly, we present an upper bound on additively estimating Tsallis entropy in the SAMP+PMF model.