Abstract

Jensen’s inequality is one of the fundamental inequalities which has several applications in almost every field of science. In 2003, Mercer gave a variant of Jensen’s inequality which is known as Jensen–Mercer’s inequality. The purpose of this article is to propose new bounds for Csiszár and related divergences by means of Jensen–Mercer’s inequality. Also, we investigate several new bounds for Zipf–Mandelbrot entropy. The idea of this article may further stimulate research on information theory with the help of Jensen–Mercer’s inequality.

1. Introduction

In theory of inequalities, convex functions play an important role. The definition of convex function [1] is as follows.

Let be a function, then is said to be convex if and , the inequalityholds, and is said to be strictly convex if and , (1) holds strictly. If inequality (1) holds in reversed directions, then is said to be concave, and is said to be strictly concave if the inequality (1) holds strictly in reversed direction and .

There are several important inequalities which have been established with the help of convex functions. In 2003, Mercer [2] proved the following variant of Jensen’s inequality, which is known as Jensen–Mercer’s inequality:where is a convex function, with , and with .

This inequality has been refined, extended, and generalized in different direction. Recently, Niezgoda [3] generalized this inequality by using the concept of majorization and doubly stochastic matrices. The author gave applications for separable sequences. For some other recent results related to Jensen’s and Jensen–Mercer’s inequality, we recommend [414].

In the remaining part of this paper, we present some basic terms in information theory which we deal in this paper. We start with divergence. Divergence measure is basically the measure of distance between two probability distributions. The concept of divergence measure is working efficiently to resolve different problems related to probability theory. A few divergence measures are reasonable relying upon the idea of the issue. Recently, Dragomir [15, 16], Jain [17, 18], and Taneja [19] have made many contributions in this field; they introduced different divergence measures, obtained their bounds, presented relations with other divergences, and discussed their properties.

Divergence measures have gigantic applications in assortment of fields, such as approximation of probability distributions [20, 21], biology [22], economics and political science [23, 24], analysis of contingency tables [25], signal processing [26, 27], color image segmentation [28], pattern recognition [29, 30], and magnetic resonance image analysis [31].

Some specific operational facts about divergence measure have been introduced, characterized, and applied in different fields, such as Bregman’s -divergence [32], Csiszár’s -divergence [33], Burbea–Rao’s -divergence [34], and Renyi’s like -divergence [35]. By suitable choice of the function , many divergence measures can be obtained from these generalized divergence. Because of its compact nature, Csiszár -divergence is one of the most important divergences, which is given as follows:where and are positive real -tuples and is a convex function. Here, convexity of the function ensures that .

By choosing appropriate convex function instead of function , many well-known divergences or distance functions can be acquired from (3), such as Hellinger, Renyi, Bhattacharyya, Chi-square, Kullback–Leibler, triangular discrimination, and Jefferys divergences. A concise presentation of these divergences is given as follows.

The observed data are approximated by the probability distribution in probability and statistics. This approximation leads to information loss. The primary purpose is to assess how much information is contained in the data. Approximating a distribution by for which the real distribution is results in loss of data. Kullback–Leibler divergence is the deficiency of encoding the information regarding the distribution instead of genuine distribution . Kullback–Leibler divergence [36] can be acquired by choosing , , in (3):

Kullback–Leibler divergence is nonnegative and is zero if and only if . It satisfies the two properties of metric, but , and does not obey the triangle inequality. Kullback–Leibler divergence is also called relative entropy. We can construct Shannon entropy [37] from Kullback–Leibler divergence, which is given as follows:where are positive real numbers with . Shannon entropy has been used widely in physics, particularly in many quantum soluble systems, for example, see [3841].

The extension of Kullback–Leibler divergence is Jefferys divergence [42]. It is the summation of Kullback–Leibler divergence in both directions. Jeffery's divergence can be obtained by selecting , , in (3):, and , but does not obey the triangle inequality; therefore, it is not a metric. The uses of Jefferys divergence are similar to Kullback–Leibler divergence.

Bhattacharyya divergence [43] can be obtained by choosing , , in (3):

Bhattacharya divergence also satisfies three properties of a metric like Jefferys divergence, but does not obey the triangle inequality. Bhattacharya divergence has limited range. The limited range of this divergence makes it quite attractive for a distance comparison. Hussein et al. [44] used Bhattacharyya divergence for solving track-to-track association (TTTA) problem in space surveillance.

Hellinger divergence [45] is defined aswhich corresponds to , , in (3). It satisfies all the properties of metric; therefore, Hellinger divergence is a proper metric. Hellinger divergence is used widely in data analysis, particularly when the objects being analogized are high-dimensional experiential probability distribution built from information data [46].

Total variational distance [47] can be deduced by choosing , , in (3):

This divergence is also a proper metric. Total variational distance is a basic quantity in probability and statistics. In information theory, variational distance is utilized to characterize solid typicality and asymptotic equipartition of sequences generated by sampling from a given distribution [48].

Now, by substituting , , for in (3), we can obtain Renyi divergence [49], which is given by

Renyi divergence appears as an important tool to provide proofs of convergence of minimum description length and Bayesian estimators, both in parametric and nonparametric models [50]. Some other divergences are given below which can be obtained from (3).(1) (see [47]): for , , the -divergence is given by(2) (see [42]): the formula for triangular discrimination can be deduced by selecting , which is given by(3) (see [42]): for where , the relative arithmetic-geometric divergence is given by

Zipf’s law is one of the mandatory laws in information science and is often utilized in linguistics as well. In 1932, George Zipf [51] found that one can tally how often each word shows up in the text. Therefore, if is the rank of the word and is the frequency of occurrence of that word, then , where is a constant.

There are several utilizations of Zipf’s law, such as in city populations [52], geology [53], and solar flare intensity [54]. For more details, see [55, 56].

In 1966, Benoit Mandelbrot [57] gave generalization of Zipf’s law, known as Zipf–Mandelbrot law, which gave improvement on account of the low-rank words in corpus [58] and is given as follows:where , , and . If , we obtain Zipf’s law.

If , , , , and , then the probability mass function for Zipf–Mandelbrot law is given by

The formula for Zipf–Mandelbrot entropy is given as follows:

In 2017, Khan et al. [59] used some refinements of the Jensen inequality for convex functions and monotone convex functions to obtain inequalities for Zipf–Mandelbrot and Shannon entropy. They have used two parametric Zipf–Mandelbrot laws instead of different weights in the inequalities for Shannon entropy. As a result, different parametric Zipf–Mandelbrot entropies have been obtained. In 2017, Naveed et al. [60] obtained different inequalities for these entropies by using some majorization type inequalities. In 2018, Khan et al. [55] obtained new estimations for these entropies by applying some refinements of the Jensen inequality and Taylor’s formula. In 2019, Khalid et al. [61] also gave some results related to these entropies. For more recent results related to these entropies, see [6171].

The purpose of this paper is to use Jensen–Mercer’s inequality and to give several inequalities in information theory. We obtain bounds for Csiszár divergence by using Jensen–Mercer’s inequality. Also, we give some bounds for different divergences by using particular convex functions. In addition, we establish bounds for Zipf–Mandelbrot entropy by applying Zipf–Mandelbrot laws instead of probability distributions in Kullback–Leibler and Jefferys divergences. Furthermore, we deduce new estimates for Zipf–Mandelbrot entropy associated to different parametric Zipf–Mandelbrot laws.

Theorem 1. Let be a convex function. If and are positive real numbers with such that and , then

Proof. Replacing by and substituting in (2), we obtain (17).

Theorem 2. Let and be positive real numbers with . If and are positive real numbers with such that and , then

Proof. Let , , so clearly is a convex function. Therefore, using (17) and interchanging and , we obtainNow, using the definition of Kullback–Leibler divergence, we obtain (18).

Corollary 1. Let and be positive real numbers with . If are positive real numbers such that and , then

Proof. By choosing , , in (18), we obtain (20).

Theorem 3. Let and be positive real numbers with . If and are positive real numbers such that for , , and , then

Proof. Let , , so clearly is a convex function. Therefore, using (17), we obtain (21).

Corollary 2. Let and be positive real numbers with . If are positive real numbers such that and , then

Proof. By choosing , in (21), we obtain (22).

Theorem 4. Let all the hypotheses of Theorem 3 hold, then we have the following inequality:

Proof. If , , then , so is a convex function for . Therefore, using in Theorem 1, we obtain (23).

Theorem 5. Let all the hypotheses of Theorem 3 hold, then we have the following inequality:

Proof. If , , then , so is a convex function for . Therefore, using in Theorem 1, we obtain (24).

Corollary 3. Let and be positive real numbers with . If are positive real numbers such that and , then

Proof. By choosing , , in (24), we obtainNow, using the definition of Shannon entropy, we obtain (25).

Theorem 6. Let all the hypotheses of Theorem 3 hold, then we have the following inequality:

Proof. If , , then , so is a convex function for . Therefore, using in Theorem 1, we obtain (27).

Theorem 7. Let all the hypotheses of Theorem 3 hold, then we have the following inequality:where .

Proof. For , the function , , is clearly a convex function. Therefore, using in Theorem 1, we obtain (28).

Theorem 8. Let all the hypotheses of Theorem 3 hold, then we have the following inequality:

Proof. If , , and we know that absolute function is always convex on . Therefore, using in Theorem 1, we obtain (29).

Theorem 9. Let all the hypotheses of Theorem 3 hold, then we have the following inequality:

Proof. If , , then , so is a convex function for . Therefore, using in Theorem 1, we obtain (30).

Theorem 10. Let all the hypotheses of Theorem 3 hold, then we have the following inequality:

Proof. If , , then , so is a convex function for . Therefore, using in Theorem 1, we obtain (31).

Theorem 11. Let all the hypotheses of Theorem 3 hold, then we have the following inequality:

Proof. If  = , , then , so is a convex function for . Therefore, using in Theorem 1, we obtain (32).

3. Bounds for Zipf–Mandelbrot Entropy

In this section, we present some bounds for Zipf–Mandelbrot entropy.

Theorem 12. Let and be positive real numbers with . If , , , and , , such that and , then

Proof. Let , , thenAs , . Hence, using (18) for , we obtain (33).
The following corollary is the special case of Theorem 12.

Corollary 4. Let and be positive real numbers with . If , , , and , then

Proof. By choosing , in (33), we obtain (35).
The next result gives the inequality for Zipf–Mandelbrot entropy using two different parameters.

Theorem 13. Let and be positive real numbers with . Let , , and , then

Proof. Let and , , thenAlso, and ; therefore, using (18) for and , , we obtain (36).

Theorem 14. Let and be positive real numbers with . If , , , and , , such that and , then

Proof. Substituting , , in (21) and using similar method which we used in the proof of Theorem 12, we obtain (38).
The following corollary is the special case of Theorem 14.

Corollary 5. Let and be positive real numbers with . If , , , and , then

Proof. By choosing , in (38), we obtain (39).
The next result gives the inequality for Zipf–Mandelbrot entropy using two different parameters.

Theorem 15. Let and be positive real numbers with . Let , , and , then

Proof. Substituting and , , in (21) and using similar method which we used in the proof of Theorem 13, we obtain (40).

Theorem 16. Let all the hypotheses of Theorem 14 hold, then we have the following inequality:

Proof. Let , , thenAlso, ; therefore, using (42) and (43) in (24) for , , we obtain (41).
The following corollary is the particular case of Theorem 16.

Corollary 6. Let and be positive real numbers with . If , , , and , then

Proof. By choosing , in (41), we obtain (44).

Theorem 17. Let all the hypotheses of Theorem 15 hold, then we have the following inequality:

Proof. Let and , , thenAlso, and ; therefore, using (46) and (47) in (24) for and , , we obtain (45).

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

M. Adil Khan carried out the proof of Theorems 15 and Corollaries 13 and drafted the manuscript. Husain carried out the proof of Theorems 612 and Corollary 4. Chu provided the main idea, carried out the proof of Theorems 1317 and Corollaries 5 and 6, completed the final revision, and submitted the article. All authors read and approved the final manuscript.

Acknowledgments

This research was supported by the Natural Science Foundation of China under Grants 11701176, 61673169, 11301127, 11626101, and 11601485.