Revision as of 07:12, 22 January 2023 by Kock06 (talk | contribs) (Created page with "Introduction Both companies and sales representatives want to know, when customers pay their invoice. Companies need to know it to manage likvidity better while sales represen...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction Both companies and sales representatives want to know, when customers pay their invoice. Companies need to know it to manage likvidity better while sales representatives needs to know, if there are going to be late paying cutomers to discuss future payment plans and paying policy with the company, and also to know the amount of negative bonus he would get for delayed customer payment.

Problem definition and Method

I have dataset of old and new customer invoices. Based on the payment delay in the past, I will simulate the payment delay in the future using Monte Carlo Simulation in excel.


Dataset called B2B - Invoice Payment Date Prediction Model has data of (not only, but mainly) invoice payments of customers. The column of this dataset are:

business_code - company code of the account,

cust_number - customer number given to all the customers,

name_customer - name of the customer,

clear_date - the date on which the customer clears an invoice, or in simple terms, they make the full payment,

business_year - period that a company uses for accounting purposes and preparing financial statements,

doc_id - It is also an unique identifier of an invoice (primary key),

document_create_date - the date on which the invoice document was created,

document_create_date_norm - normalised date of the invoice document, from my findings of the dataset, documents issued on weekends are shifted to working days,

due_in_date - the date on which the customer is expected to clear an invoice,

invoice_currency - the currency of the invoice amount in the document for the invoice,

document type - it represents the type of document. eg D1 represents invoice, from my findings, I can say that this dataset has only two types are represented – RV and X2, which means Sales Order Billing [which according to (Lazzari, 2019) means for my purpose the same as sales invoice] and AP conversion (Your Finance Book – Editorial Staff, 2022), which is not useful for my use case.

posting_id - key indicator to identify whether an AR item is invoice, deduction, credit memo based on its value,

baseline_create_date - the date on which the Invoice was created,

cust_payment_terms - Business terms and agreements between customers and accounts on discounts and days of payment,

invoice_id- an unique number assigned when a seller creates an invoice,

is_open_invoice - indicator of whether an invoice is open or closed, isOpen = 1 means that the invoice is open (SkywalkerHub, 2021).

Another column not mentioned by SkywalkerHub (2021) is posting date, accorining to Kagan (2021), post day is a date, on which transaction is received to banking account, which might usually by 1-3 days from transaction date. Area business is also one of the columns. However, it has only blank values. Another two columns are baseline create date and total open amount, of which I will only use the total open amount as the invoice value.

For my purpose, I decided to use columns clear_date, document_create_date, due_in_date, invoice_currency, document type, total_open_amount and isOpen.

Then, I deleted rows that bring no values to my goal, therefore rows with document type X2, I also created an exchange rate conversion for invoices payed in canadian dollars to convert them to use dollars (the currency used in more than 90 % of the invoices). I also divided the dataset info open and close invoices.

I used close invoices to find out distribution of data. I used python library distfit, which uses goodness of fit test to determin, which probality distribusion fits the data the best (2020).

I have found out, that t-student distribution with 1.399233237012562 degrees of freedom (and loc = 0.08842619452260836, scale = 2.306302358006622).

For further study of data, I have calculated a median and average of delay for different payment categories for reference purposes, also an average value of invoices per month, which seems to be similar across month (I do not see any seasonality trends).

Median value for different payment conditions show, that invoices which should be payed on the date of invoice issue are meadianly delayed for 39 days, whereas invoices with payment condition with more than 90 days to pay are meadianly paied 16 days before due date. Other categories showed median of 0, 1 or 2. Average delay is in accordance with median and suprisingly shows, that the biggest average is again for the invoices with 0 days from issue to payment, the second biggest one is with invoices that should be payed in advance, with payment condition of 60 and more days, the invoices are in average payed even before the due date. I will introduce a different method of invoice delay based on those findings.

I have used a random variable, which determines, if covid-19 pandemic with strict measures hit. If it hits, I it adds 20 % delay to payment, because businesses (customer) are stuggling with their own suppliers and customers during that time. I decided to use this on time period of 1 month, since I think that that’s a reasonable interval to consider. I use the date of issue of invoice to determine, whether the invoice is in the covid-19 timeframe or not.

After simulation of due date delay, I calculated average due date delay, median, 1rs quantile and 3rd quantile. At the end, I used the rounded average after consideration of pandemic situation to predict, how late the invoices with payment condition will be payed. I made an exception with invoices with payment condition of 60-89 and 90+, because both of those categories have negative average of payment delay and median of 0 and –16 respectively. I decided to calculate those using the 1rst quantile, while also considering the pandemic situation and rounding the numbers.


Results of the simulation is estimate of payment delay of now open invoices.


The issue of delayed payment of invoices and it’s prediction can be solved by many methods. One of them is using monte carlo simulation. A necessary condition of using monte carlo simulation is knowing the distribution of data that we want to predict. That can be achieved, in ideal case, just by looking at histogram and immidiatly recognising a normal distribution. Otherwise, usage of automated tools that test many distributions at once can be only recommended, since it would be very difficult to find out the distribution by testing one after another manually. Then, consideration about important facts surrounding payment of invoices have to be made. In my case, it was payment condition and the possible negative impact of outburst of covid-19 pandemic on companies (customers).


Distfit. Contents. 2020. Dostupné z:

Kagan, Julia. Post Date. 27. October 2021. Dostupné z:

Lazzari, Zach. Dost Sales Order vs. Sales Invoice. 4. February 2019. Dostupné z:

SkywalkerHub. Dataset info. 17 June 2021. Dostupné z:

Your Finance Book – Editorial Staff. What Is Document Type In SAP – Relationship With Number Range. 30. Mai 2022. Dostupné z:


The simulation can be downloaded here: