Variance reduction

Monte Carlo is important but can be considered a straightforward technique. Its main issue is variance. To try to get rid of it, what can be done is to increase the number of samples N. However, each time we want to reduce variance two times, we need four fold the number of samples (for the blunder to diminish linearly, the quantity of tests needs to increase exponentially). Because of that great deal of research was invested into discovering whether change could be decreased by some other mean than simply expanding N. Luckily, mathematics can help answering this question. Monte Carlo integration typically has an error variance of the form σ^2/N. Even though we get a better answer by sampling with a larger value of N, the computing time grows with N. Sometimes it is conceivable to decrease σ instead. To do this, we construct a new Monte Carlo problem with the same answer as our original one but with a lower σ. This leads to an entire branch in Monte Carlo research focused on what's known as variance reduction methods. When do we need variance reduction? It is mostly needed in simulations of highly complex models with high computation cost per replication, and in simulations of rare events with too many replications in crude Monte Carlo. There are many applications of these phenomena in a variety of fields, for example: physics, finance, statistics, reliability or systems biology.

= Variance reduction techniques = In this chapter can be found description for some of the popular classic variance reduction techniques, mainly focusing on antithetic variables and common random numbers (CRN). CRN is commonly utilized in a simulation study in which we want to compare performance measures of two distinct systems. Every single other method (for instance antithetic variables, stratification) is valuable in the case that we want to simulate a performance measure of only a single system. These methods all enhance effectiveness by sampling the input values more strategically. Next let’s consider conditioning and control variates. These methods take advantage of closed form solutions to problems similar to the given one. Another major method is importance sampling. Like some of the other methods, importance sampling also changes where we take the sample values, but rather than distributing them in more balanced ways it purposely oversamples from some regions and then corrects for this distortion by reweighting. It is accordingly an extreme reformulation of the issue and can be dubious to do well. Other group of methods consists of heuristic variance reduction techniques. Amongst them we can find some of those popular techniques: Weighting in lieu of absorption, Splitting, Forced collisions, Russian roulette, etc. Underlying idea here is that we want to choose from a probability distribution that we want to use rather than using the one that the physical data indicates. Last group we will mention here are more modern methods which include Quasi Monte Carlo and Multilevel Monte Carlo Estimator techniques.

Common Random Numbers
Common random numbers (CRN) is the simplest and presumably the most widely used method for increasing the efficiency of comparisons via simulation. It is intuitively appealing and easy to implement in either a custom-made simulation or in a simulation package. In its simplest form, CRN just requires that the systems under study be simulated with the same stream of random numbers. Naturally, this appears to be reasonable since it guarantees that the frameworks are thought about under same conditions. Key idea in the method is that on the off chance X and Y are two random variables, then Var(X−Y) = Var(X) + Var(Y) − 2Cov(X, Y). Thus, for the situation that X and Y are positively correlated, i.e. Cov(X,Y)>0, the variance of X − Y will be smaller than in the case that X and Y are independent. In general, the use of common random numbers leads to positive correlation of the outcomes of a simulation of two systems. As a consequence, it is better to use common random numbers instead of independent random numbers when we compare two different systems.

Example 1
Assume that a limited number of N jobs must be handled on two indistinguishable machines. The processing times of the jobs are random variables with some common distribution function F. We want to compare the completion time of the last job, Cmax, under two different policies. In the LPTF policy, we always choose the remaining job with the longest processing time first, in the SPTF policy we choose the remaining job with the shortest processing time first. In Table 1 and Table 2 we compare for N = 10 and F(x) = 1 − e^(-x) the estimators and confidence intervals for E(C SPTF max − C LPTF max ) when we do not, resp. do, use CRN. We conclude that in this example the use of CRN reduces the standard deviation of the estimator and hence also the length of the confidence interval with a factor 5.

Example 2
When we want to use CRN, the problem of synchronization can arise: How can we achieve that the same random numbers are used for the generation of the same random variables in the two systems? In the previous example, this synchronization problem did not arise. However, to illustrate this problem, consider the following situation. In a G/G/1 queueing system the server can work at two different speeds, v1 and v2. Aim of the simulation is to obtain an estimator for the difference of the waiting times in the two situations. We want to use the same realizations of the interarrival times and the sizes of the service requests in both systems (the service time is then given by the sizes of the service request divided by the speed of the server). If we use the program of the discrete event simulation of Section 3 of the G/G/1 queue, then we get the synchronization problem because the order in which departure and arrival events take place depends on the speed of the server. Hence, also the order in which interarrival times and sizes of service requests are generated depend on the speed of the server. The synchronization problem can be solved by one of the following two approaches: 1. Use separate random number streams for the different sequences of random variables needed in the simulation. 2. Assure that the random variables are generated in exactly the same order in the two systems. For the example of the G/G/1 queue, the first approach can be realized by using a separate random number stream for the interarrival times and for the service requests. The second approach can be realized by generating the service request of a customer already at the arrival instant of the customer.

Antithetic variables
The method of antithetic variables is a frequently suggested technique for reducing the variability of estimators in computer simulation experiments. In practice the method does not appear to work as well as it might and hence is not used as often as it should. The idea in the technique of antithetic variables is as follows: It makes use of the fact that if U is uniformly distributed on (0, 1) then so is 1 − U and furthermore U and 1 − U are negatively correlated. The key idea is that, if W1 and W2 are the outcomes of two successive simulation runs, then Var(W1 + W2/2)=1/4Var(W1)+1/4Var(W2)+1/2Cov(W1, W2). Hence, in the case that W1 and W2 are negatively correlated the variance of (W1 + W2)/2 will be smaller than in the case that W1 and W2 are independent. The question remains how we can achieve that the outcome of two successive simulation runs will be negatively correlated. From the fact that U and 1 − U are negatively correlated, we may expect that, if we use the random variables U1,...,Um to compute W1, the outcome of the first simulation run, and after that 1 − U1,...,1−Um to compute W2, the outcome of the second simulation run, then also W1 and W2 are negatively correlated. Intuition here is that, e.g., in the simulation of the G/G/1 queue large realizations of the Ui’s corresponding to large service times lead to large waiting times in the first run. Using the antithetic variables, this gives small realizations of the 1−Ui’s corresponding to small service times and hence leading to small waiting times in the second run.

Stratification
Now, let’s just briefly mention the thought behind stratified sampling which is to split up the domain D of X into separate regions, take a well-defined sample of points from each such region, and combine the results to estimate E(f(X)). Naturally, on the off chance that every area gets a lot of points then we ought to get an improved answer. Figure 1 shows two little instances of stratified domains. We may have the capacity to improve the situation still by oversampling inside the vital strata and under sampling those in which f is almost consistent (similar to methods of importance sampling).

Figure 1

= Measuring variance reduction efficiency = Using variance reduction can sometimes bring tremendous enhancements in contrast with conducting plain Monte Carlo. It isn’t exceptional for the value σ^2 to be reduced thousand times (or more). Unfortunately, it is likewise possible for a variance reduction technique to bring a very modest improvement, perhaps equivalent to diminishing σ^2 by only 10%. What is more regrettable, few methods will raise σ^2 in unwanted circumstances. The estimation of a variance reduction relies upon more than the change in σ^2. It likewise relies upon the PC's running time, perhaps the memory expended, and significantly, the human time taken to program and test the code. Suppose for simplicity, that a baseline method is unbiased and estimates the desired quantity with variance σ0^2/n, at a cost of nc0, when n function evaluations are used. To get an error variance of τ^2 we need n = σ0^2/τ^2 and this will cost c0σ0^2/τ^2. Here we are assuming that cost is measured in time and that overhead cost is small. On the off chance that an alternative fair-minded method has variance σ1^2/n and cost nc1 under these conditions, then it will cost us c1σ1^2/τ2 to achieve the same error variance τ^2 that the baseline method achieved. The efficiency of the new method, relative to the standard method is E = c0σ0^2/ c1σ1^2. At any fixed level of accuracy, the old method takes E times as much work as the new one. The efficiency has two factors, σ0^2/σ1^2 and c0/c1. The first is a mathematical property of the two methods that we can often handle theoretically. The second is more complicated. It can depend heavily on the algorithms used for each method. It can also depend on details of the computing environment, including the computer hardware, operating system, and implementation language. Note that numerical outcomes for c0/c1 acquired in one setting don't really apply to another. There is no settled guideline for how substantial a productivity enhancement must be to make it worth utilizing. In a few settings, for example, rendering PC illustrations for movies, where a large number of CPUs are kept occupied for a considerable length of time, a 10% enhancement (i.e., E = 1.1) brings important reserve funds. In different settings, for example, an irregular calculation, a 60-overlap gain (i.e., E = 60) which transforms a one-minute pause into a one second pause, may not justify the expense of programming a more complicated solution. Computation costs so much less than human effort that we commonly require substantial proficiency gains to counterbalance the time spent programming up a variance decrease. The force to search out a productivity enhancement may possibly come when we end up hanging tight quite a while for an outcome, as, when we have to put our whole Monte Carlo figuring inside a loop representing numerous variations of the issue. A very slow computation costs more than just the computer’s time. It may waste time for those waiting for the answer (workforce). Also, slow computations reduce the number of alternatives that one can explore and thus not giving as options to choose from which would be preferable in many situations. The effectiveness increase important to justify using a method is less if the programming exertion can be amortized over numerous applications. The edge is very high for a one-time program, lower for something that we are adding to our personal library, lower still for code to share with a few associates and even lower for code to be put into a library or simulation tool for general use. Important is to note that all of the methods mentioned in previous chapter are capable of a great range of efficiency improvements.

= Recommended reading =

https://web.maths.unsw.edu.au/~zdravkobotev/variancereductionCorrection.pdf http://sas.uwaterloo.ca/~dlmcleis/s906/chapt4.pdf https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-mathematical-foundations/variance-and-standard-deviation https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-in-practice/variance-reduction-methods https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-in-practice/introduction-quasi-monte-carlo https://www0.gsb.columbia.edu/mygsb/faculty/research/pubfiles/4261/glasserman_yao_guidelines.pdf

= References =