Statistical Optimization of Pharmaceutical Formulations
Statistical design provides myriad advantages in the formulation of pharmaceuticals
P.K. Shiromani
President
Shirman Pharmaceutical Consulting
shirmanconsulting@yahoo.com
Because there are so many formulation/process variables a scientist must consider when developing a formulation, statistical experimental design and analysis allow both efficient and effective study of the same. This article provides several recommendations, in a succinct manner, in the use of statistical design. These recommendations are based on both my own experience and those reported in the literature. In fact, the literature is replete with examples of the successful use of this approach, a few which are cited at the end, covering a long time period. There are several advantages to statistically designed experiments, and when compared with other test methods, the results are striking. For example, one-at-a-time experimentation is 18% less costly but 190% less accurate; intuitive experimentation is 76% more costly and 55% less accurate; Bureau of Standards experimentation is 59% more costly, but 15% less accurate.
In comparison, statistically-designed experimentation is actually 15% less costly and just 10% less accurate than traditional methods.
Moreover, there are many other advantages to using the statistical design method. One chief reason is that it is strongly favored by regulatory agencies because it justifies the choice of ranges and finds a robust (optimum) region. In addition, it gives the researcher the ability to study interactions between factors. In contrast, merely studying one factor at a time does not allow the researcher to study interactions and is not scalable to production.
The statistical design method often provides a more economical use of resources, especially when many factors exist and provides a greater chance of finding optimum conditions. Finally, predictions can be made about future experiments.
There are several types of statistical design for pharmaceutical formulations, including:
- Factorial Designs: (both full and fractional factorials);
- Sequential Simplex Techniques;
- Response Surface Methodology;
- D-Optimal Techniques and
- I-Optimal Techniques.(16)
Dependent Variables; i.e., responses that can be measured, include tablet dissolution/disintegration time/hardness/friability, etc.
Create a Better Experiment
Below are some suggestions for running experiments:
- Factors must be based on experience/preliminary experiments;
- Centerpoint replicates to estimate error/significance;
- Tradeoff analysis for optimum combination; i.e., may accept a softer tablet to get higher dissolution;
- Normalize these factors to orthogonal (-1/0/+1) for interpretation;
- Contour plots are the most useful depiction of the data;
- Equally space factors for simpler design and study extremes;
- Run experiments in random order so as to eliminate influence of extraneous variables that cause “noise” in data;
- Data are analyzed using Yate’s algorithm for determining significant effects. A significant effect gives a response that is greater than twice the standard error for a dependent variable of the control batches;
- The control batch uses the mid-point values for the independent variables and represents the current process.
Factorial Design
Factorial design of experiments can be divided into two classifications: full and fractional factorial design.
The full factorial design method is characterized by:
- 23 factorial design: 3 factors and 2 levels (high +1; low -1) = 8 (2x2x2) trials;
- Graphically represented by a cube;
- Coordinates of the vertices represent individual trials;
- Area bounded by the cube is studied.
Fractional factorial design may include a five factor, orthogonal, central, composite and second-order design. The five factor is described below:
- Half-Factorial: 2n x 1⁄2, with n=5. Therefore, 16 experiments are conducted at +1 and -1 levels, two additional levels (extreme levels) at +1.547 and -1.547=10 experiments (+/-1.547 are for quadratic terms to study the curvature), and one more experiment at the zero level (midway between above levels) and therefore, 27 experiments.
- Orthogonal: independent estimation/ significance of regression coefficient – guarantees that effects of different Xs on a given Y can be independently estimated; central = equidistant from center; compo-site=linear, interaction and quadratic terms in the model (X = independent variable and Y = dependent variable).
- The Second-Order “predictor” polynomial equation: 21 terms – “overall” mean-a, 5-linear terms – X, 5-quadratic terms - X2 and 10 interaction terms-XX;
Y=ao + a1X1 + --------- a5X5 + a11X,21 + ------- a55 X25
+ a1a2X1X2 + ----------- a4a5 X4X5
Y=level of dependent variable; a=regression coefficient. (slope and indicates if the independent variable (X) exerts a large or small, positive or negative effect on a dependent variable). Such an equation is generated for each Y, relating it to the set of five Xs, (number of experiments must at least equal the number of coefficients in a chosen model).
Experimentation
The experiments are carried out as per the Yates Algorithm, an example for which is illustrated in table I, with the experimental design illustrated in table II, derived from the author’s experience7.

Experimental Conditions

Experimental Design
When conducting the experiments, keep in mind that orthogonal coding (-1, +1, etc.) of the Xs allows the direct comparison of the magnitudes of the regression coefficients. Therefore, apply the “F” statistic to each regression coefficient and evaluate its significance. Be sure to perform the “0” or base experiment at the beginning, middle and end of experimental runs. Perform 27 experiments in a random order and measure responses on the resulting tablets (e.g., hardness, dissolution, etc.). Carry out statistical analysis and get mean values for each of the dependent variables. Finally, carry out computerized regression analysis on the data to determine the fit to the second order model.
Statistical Analysis
An important part of the planning stage is to estimate the experimental error, which is a measure of the variability inherent in the study. A large variability makes it difficult to obtain a suitable mathematical model. To obtain an estimate of this error, complete experiments need to be replicated.
Predictions will be only be as good as the fit of the data to the equations generated; i.e., the Index of Determination, the R-square value, should be greater than 90%. A low value indicates that the particular dependent variable does not follow a second order model. If the number of parameters in the equation (p) to be estimated gets close to the number of observations (n) the R-square value may be misleading; in such a case use of the adjusted R-square is recommended:
R2adj = 1 – (1 - R2) (n – 1)/(n – p).The Model F Value tests whether all the included regression coefficients (other than the intercept) are zero or not. A larger F value, (smaller P value – less random chance and hence, more significant), is a better indicator of the fit of the regression equation/model.
“S” is an estimate based on degrees of freedom (df) of the square root of the variability about the fitted model; df = observations – parameters – larger df better “s” and the smaller the “s” the stronger the “predictor equation.”
In the Model Reduction-Hierarchy Principle, if the absolute value of a coefficient is smaller than twice the standard error, then the coefficient is not statistically different from zero and therefore dropped from the model.
In the Cook’s D test, a large value denotes an “influential” observation and, hence, the model must be fitted with and without the influential observation in order to assess the effect of this influential observation.
To obtain the best “predictor” equation in the Stepwise Regression (hierarchical) method, start with an equation using all factors, before sequentially eliminating terms that are less meaningful. Be sure to perform this at different levels of significance.
Dimension Reduction
Dimension Reduction Techniques focus on critical Xs and Ys and therefore have the least number of terms in the model, which simplifies the regression equation.
The first technique, the Spearman Correlation Matrix, can determine if any pair of variables (Ys) have correlations close to = +/-1, which indicates strong positive/negative association. If there is a correlation, measure only one Y and not both. If one Y is unrelated to all other dependent variables then it should be measured. The Spearman Correlation Matrix examines two variables at a time.
The second, Principal Component Analysis, requires the selection of key dependent variables that best distinguish between infinite formulations in a computer optimization. It should be the criteria upon which one selects a formulation (e.g., dissolution and not friability). This key variable should alone be constrained for a faster selection of an optimum formulation.
Some variables (e.g., tablet weight, thickness and friability) may not contribute anything to overall variability and hence would not help in distinguishing between formulations. Principal component analysis examines all variables simultaneously and not just two at a time.
Contour Plots
Finally, Contour Plots (topographical plots akin to maps) are drawn by a computer and allow the representation of a three-dimensional situation in two dimensions. The Contour Plot demonstrates the contribution of X, XX and X2 (the latter “curvature” effects) on Y.
The symbol OPTIMUM corresponds to the predicted response at the recommended response. It is seen from this plot that the effect of a decrease in magnesium stearate from this predicted optimum formulation increases ejection force while an increase in magnesium stearate decreases hardness, tablet dissolution at 10 minutes and capsule dissolution at 10 minutes, thus justifying the selection of the optimum formulation.

Fig 1: Contour Plot
Conclusions
Statistical optimization enables a pharmaceutical scientist to define a formulation with optimum characteristics. A large amount of data can be generated from a limited number of experiments, which facilitate an in-depth understanding of the formulation and its manufacturing process. Statistical optimization can also provide solutions to larger-scale manufacturing problems, which occasionally arise.
Importantly, statistical optimization experimentation and analysis provides strong assurances to Regulatory Agencies regarding superior product quality.
References
1. G. E. P. Box, and K. B. Wilson, J. Roy. Statistic Soc., B, 13, 1(1951).
2. J. B. Schwartz, J. R. Flamholz, and R. H. Press, J. Pharm. Sci., 62, 1165(1973).
3. N. R. Bohidar, F. A. Restaino, and J. B. Schwartz, J. Pharm. Sci., 64, 966(1975).
4. H. M. El-Banna, N. Boraie and H. A. El-Shibini, Pharmazie, 36, 11(1981).
5. D. A. Doornbos, Pharmaceutisch Weekblad, 3, 549(1981).
6. M. L. Jozwiakowski, ‘Presentation – Use of Experimental Design Techniques in Early Formulation and Process Development’ Arden House Conference, Jan. 30, 1984.
7. N. R. Bohidar, J. F. Bavitz, and P. K. Shiromani, Drug Dev. Ind. Pharm., 12, 1503(1986).
8. Z. T. Chowhan, and A. A. Amaro, Drug Dev. Ind. Pharm., 14, 1079(1988).
9. T. Schofield, J. F. Bavitz, C. M. Lei, L. Oppenheimer, and P. K. Shiromani, Drug Dev. Ind. Pharm., 17, 959(1991).
10. B. Iskandarani, J. H. Clair, P. Patel, P. K. Shiromani, and R. E. Dempski, Drug Dev. Ind. Pharm., 19, 2089(1993).
11. R. Fassihi, J. Fabian, and A. M. Sakr, Pharm. Ind., 57, 1039(1995).
12. P. K. Shiromani, and J. Clair, Drug Dev. Ind. Pharm., 26, 357(2000).
13. B. Rambali, L. Baert, and D. L. Massart, Int. Jr. Pharmaceutics, 220, 149(2001).
14. V. Kannan, R. Kandarapu, and S. Garg, Pharmaceutical Technology, Feb., 74(2003).
15. H. Tye, DDT, 9, 485(2004).
16. B. Jones and L. Creighton, R&D, 20, Vol 46, Sept. 2004.
