Four (4) Different Ways to Calculate DCF Based ‘Equity Cash Flow (ECF)’ – Part 2 of 4

This represents Part 2 of a 4-part series relative to the calculation of Equity Cash Flow (ECF) using R.  If you missed Part 1, be certain read that first part before proceeding. The content builds off prior described information/data.

Part 1 previous post is located here.
‘ECF – Method 2’ is defined as follows: 
The equation appears innocent enough, though there are many underlying terms that require definition for understanding of the calculation. In words, ‘ECF – Method 2’ equals free cash Flow (FCFF) minus after-tax Debt Cash Flow (CFd).

Reference details of the 5-year capital project’s fully integrated financial statements developed in R at the following link.  The R output is formatted in Excel.  Zoom for detail.
The first order of business is to define the terms necessary to calculate FCFF.

Next, pretax Debt Cash Flow (CFd) and its components are defined as follows:

  The following data are added to the ‘data’ tibble from the prior article relative to the financial statements.
data <- data %>%
  mutate(ie    = c(0, 10694, 8158, 527, 627, 717 ),
         np    = c(31415, 9188, 13875,  16500, 18863, 0),
         LTD   = c(250000, 184952, 0, 0, 0, 0),
         cpltd = c(0, 20550, 0, 0, 0, 0),
         ni    =  c(0, 47584,  141355,  262035, 325894, 511852),
         bd    =  c(0, 62500,  62500,   62500,   62500,   62500),
         chg_DTL_net = c(0, 35000,  55000,  35000, -25000, -100000),
         cash  = c(30500,  61250, 92500, 110000, 125750, 0),
         ar    = c(0, 61250,  92500,  110000,  125750, 0),
         inv   = c(30500, 61250, 92500, 110000,  125750, 0),
         pe    = c(915, 1838, 2775, 3300, 3773, 0),
         ap    = c(30500, 73500, 111000, 132000, 150900, 0),
         wp    = c(0, 5513, 8325, 9900, 11318, 0),
         itp   = c(0, -819.377,  9809,  34923, 60566, 0),
         CapX  = c(500000,0,0,0,0,0),
         gain  = c(0,0,0,0,0,162500),
         sp  = c(0,0,0,0,0,350000))
View tibble.

All of the above calculations are defined in the below R function ECF_2. ‘ECF – Method 2’ R function
ECF_2 <- function(a) {
  ECF2 <-      tibble(T_        = a$T_,
                       ie       = a$ie,
                       ii       = a$ii,
                       Year     = c(0:(length(ii)-1)),
                       ni       = a$ni,
                       bd       = a$bd,
                       chg_DTL_net = a$chg_DTL_net,
                       gain     = - a$gain,
                       sp       = a$sp,
                       ie_AT    = ie*(1-a$T_),
                       ii_AT    = - ii*(1-a$T_),
                       gcf      = ni + bd + chg_DTL_net + gain + sp 
                                + ie_AT + ii_AT,
                       OCA      = a$cash + a$ar + a$inv + a$pe,
                       OCL      = a$ap + a$wp + a$itp,
                       OWC      = OCA - OCL,
                       chg_OWC  = OWC - lag(OWC, default=0),
                       CapX     = - a$CapX,
                       FCFF1    = gcf + CapX - chg_OWC,
                       N        = a$LTD + a$cpltd + a$np,
                       chg_N    = N - lag(N, default=0),
                       CFd_AT   = ie*(1-T_) - chg_N,   
                       ECF2     = FCFF1 - CFd_AT )
  ECF2 <- rotate(ECF2)

Run the R function and view the output.

R Output formatted in Excel
Method 2

ECF Method 2‘ agrees with the prior results from ‘ECF Method 1‘ each year.  Any differences are due to rounding error.

This ECF calculation example is taken from my newly published textbook, ‘Advanced Discounted Cash Flow (DCF) Valuation using R.’  It is discussed in far greater detail along with development of the integrated financials using R as well as numerous, advanced DCF valuation modeling approaches – some never before published. The text importantly clearly explains ‘why’ these ECF calculation methods are mathematically exactly equivalent, though the individual components appear vastly different.

Reference my website for further details.

Next up, ‘ECF – Method 3’ …

Brian K. Lee, MBA, PRM, CMA, CFA


Four (4) Different Ways to Calculate DCF Based ‘Equity Cash Flow (ECF)’ – Part 1 of 4

Over the next several days, I will present 4 different methods of correctly calculating Equity Cash Flow (ECF) using R.  The valuation technique of discounted cash flow (DCF) estimates equity value (E) as the present value of forecasted ECF.  The appropriate discount rate for this flow definition is the cost of equity capital (Ke).

‘ECF – Method 1’ is defined as follows: 


Note: ECF is not simply ‘dividends.’  A common misconception is that discounted dividends (DIV) provide equity value.  An example of this is the common ‘dividend growth’ equity valuation model found in many corporate finance texts.  All ‘dividend growth’ models that discount dividends (DIV) at the cost of equity capital (Ke) are incorrect unless forecasted 1) marketable securities (MS) balances are zero, and 2) there is no issuance or repurchase of equity shares.

The data assumes a 5-year year hypothetical capital project.  A single revenue producing asset is purchased at the end of ‘Year 0‘ and is sold at the end of ‘Year 5.’  The $500,000 asset is purchased assuming 50% debt and 50% equity financing.   

Further, the data used to estimate ECF in this example are taken from fully integrated pro forma financial statements and other relevant data assumptions including the corporate tax rate.  This particular example only requires financial data from integrated pro forma  income statements and balance sheets.  These 2 pro forma financial statements are shown below with the relevant data rows highlighted.

The above link provides access to a PDF of all financial statement pro forma data and is easily zoomable for viewing purposes.

The relevant data used to calculate ECF are initially placed in a tibble.


data <- tibble(Year = c(0:5),
              div  = c(0, 2379, 7068, 13102, 16295, 1249876),
              MS   = c(0, 0, 7226, 350948, 698648, 0),
              ii   = c(0, 0, 0, 253, 12283, 24453),
              pic  = c(250000, 250000, 250000, 250000, 250000, 0), 
              T_   = c(0.25, 0.40, 0.40, 0.40, 0.40, 0.40))


An R function is created to rotate the data in standard financial data presentation format (each data line item occupies a single row instead of a column)

rotate <- function(r) {
  p <- t(as.matrix(as_tibble(r)))

View the  rotated data


An R function reads in the appropriate data, performs the necessary calculations, and outputs the data.  The R output is then placed in a spreadsheet to formatting purposes. 

‘ECF – Method 1’ R function
ECF_1 <- function(a) {
  ECF1 <-     tibble( T_             = a$T_,
                      pic            = a$pic,
                      chg_pic        = pic - lag(pic, default=0),
                      MS             = a$MS,
                      ii             = a$ii,
                      Year           = c(0:(length(T_)-1)),
                      div            = a$div,
                      net_new_equity = -chg_pic,   
                      chg_MS         = MS - lag(MS, default=0),
                      ii_AT          = -ii*(1-T_),
                      ECF1           = div + net_new_equity 
                                       + chg_MS + ii_AT )
  ECF1 <- rotate(ECF1)

View R Output

ECF_method_1 <- ECF_1( data)

Excel formatting applied to R Output

It is quite evident there is far more than just dividends (DIV) involved in the proper calculation of ECF.  Use of a ‘dividend growth’ equity valuation model in this instance would result in significant model error.

This ECF calculation example is taken from my newly published textbook, ‘Advanced Discounted Cash Flow (DCF) Valuation using R.’  It is discussed in far greater detail along with development of the integrated financials using R as well as numerous, advanced DCF valuation modeling approaches – some never before published.

Reference my website for further details.

Next up, ‘ECF – Method 2’ …

Brian K. Lee, MBA, PRM, CMA, CFA

New R textbook for machine learning

Mathematics and Programming for Machine Learning with R -Chapter 2 Logic

Have a look at the FREE attached pdf of Chapter 2 on Logic and R from my recently published textbook,

Mathematics and Programming for Machine Learning with R: From the Ground Up, by William B. Claster (Author)
~430 pages, over 400 exercises.Mathematics and Programming for Machine Learning with R -Chapter 2 Logic
We discuss how to code machine learning algorithms in R but start from scratch. The first 4 chapters cover Logic, Sets, Probability, Functions. I am sharing Chapter 2 here on Logic and R here and will also probably release chapters 9 and 10 on Math for Neural Networks shortly. The text is on sale at Amazon here:

I will try to add an errata page as well.

The useR! 2021 (virtual) conference: 5-9 JULY, 2021

useR! conferences are non-profit conferences organized by community volunteers for the community, supported by the R Foundation. Attendees include R developers and users who are data scientists, business intelligence specialists, analysts, statisticians from academia and industry, and students.

The useR! 2021 conference will be the first R conference that is global by design, both in audience and leadership. Leveraging a diversity of experiences and backgrounds helps us to make the conference accessible and inclusive in as many ways as possible and to grow the global community of R users giving new talents access to this amazing ecosystem. Being virtual makes the conference more accessible to minoritized individuals and we strive to leverage that potential. We are paying special attention to the needs of people with a disability to ensure that they can attend and contribute to the conference as conveniently as possible. Going fully virtual and global allows us to reimagine what an R conference can offer to presenters and attendees from across the globe and from diverse backgrounds.

We have an awesome lineup of keynotes:

Paul Murrell (R Core), Edzer Pebesma (Statistics), Heidi Seibold (Research Software Engineering), Jeroen Ooms (R Programming), Meenakshi Kushwaha (R in Action), Catherine Gicheru and Katharine Hayhoe (Communications), Jonathan Godfrey, Kristian Lum, Achim Zeileis and Dorothy Gordon (Responsible programming).

We will have a full day of tutorials in four tracks and different time zones. The 22 selected tutorials will be taught in different languages (English, Spanish or French) and they are aimed at beginner, intermediate and advanced users. Have a look at the tutorial schedule.

You can register from the link on the registration page.
We determined our fee structure based on two important points: * We want everyone to be able to participate! * We would appreciate it if you pay what you can!
We have been able to extend our Early Bird window until registration closes thanks to great support from paying participants and our sponsors.
Fees are adapted to your country of residence. The rate for academic participants also applies to non-profit organizations and government employees, and the student rate also applies to retired people. Freelancers are encouraged to select the rate they feel applies best to them.

Additionally, fees are waived if your employer does not have funds to cover the conference fees. There is no need to convince anyone — you only have to tell us that you don’t have the supporting funds. If you are from a “Low Income” country, your fees are automatically waived!

We encourage both active users of R and those curious about R to attend useR! 2021 conference.

Looking forward to meeting you at the conference!

The useR! 2021 organizing committee

Zoom talk on “Building dashboards in R/shiny (and improve them with logs and user feedback)” from the Grenoble (FR) R user group

The next talk of Grenoble’s R user group will be on May 25, 2021 at 5PM (FR) and is free and open to all:

Building dashboards in R/shiny (and improve them with logs and user feedback)

        The main goal of this seminar is to demonstrate how R/Shiny app developers can collect data from the visitors of their app (such as logs or feedback) to improve it. We will use as example the CaDyCo project, a dynamic and collaborative cartography project hosted by USMB and UGA.

Hope to see you there!

EdOptimize – An Open Source K-12 Learning Analytics Platform

Important Links

  1. Open-source code of our platform –
  2. Live Platform Analytics Dashboard –
  3. Live Curriculum Analytics Dashboard –
  4. Live Implementation Analytics Dashboard –

Data from EdTech platforms have a tremendous potential to positively impact student learning outcomes. EdTech leaders are now realizing that learning analytics data can be used to take decisive actions that make online learning more effective. By using the EdOptimize Platform, we can rapidly begin to observe the implementation of digital learning programs at scale. The data insights from the EdOptimize Platform can enable education stakeholders to make evidence-based decisions that are aimed at creating improved digital learning systems, programs, and their implementation.

EdOptimize Platform is a collection of 3 extensive data dashboards. All 3 Dashboards are devloped using R Shiny and all the code right from the beginning with data simulation and data processing is R native . These dashboards contain many actionable learning analytics that we have designed from our years of work with various school districts in the US.

Here are the brief descriptions of each of the dashboards:

  1. Platform Analytics: To discover trends and power users in the online learning platform. You can use the user behavior data in this platform to identify actions that can increase user retention and engagement. See the dashboard in action here:
  2. Curriculum Analytics: To identify learning patterns in the digital curriculum products. Using this dashboard, you can locate content that needs change and see classroom pacing analytics. You can also look at assessment data, item analysis, and standards performance of the curriculum users. See the dashboard in action here:
  3. Implementation Analytics: To track the implementation of the digital programs in school districts. This dashboard will help districts make the most out of their online learning programs. See the dashboard in action here:

    Data Processing Workflow :

    To learn more about the platform in detail please head over to –

    About Playpower Labs
Playpower Labs is one of the world’s leading EdTech consulting companies. Our award-winning research team has worked with many different types of educational data. Examples include event data, assessment data, user behavior data, web analytics data, license and entitlement data, roster data, eText data, item response data, time-series data, panel data, hierarchical data, skill and standard data, assignment data, knowledge structure data, school demographic data, and more.

If you need a professional help with this platform or any other EdTech data project, please contact our Chief Data Scientist Nirmal Patel at [email protected] He will be happy to have a conversation with you!

Zoom talk on “Version control and git for beginners” from the Grenoble (FR) R user group

The next talk of Grenoble’s R user group will be on May 06, 2021 at 5PM (FR) and is free and open to all:

Version control and git for beginners

    Git is a version control system whose original purpose was to help groups of developers work collaboratively on big software projects. Git manages the evolution of a set of files – called a repository – in a sane, highly structured way. Git has been re-purposed by the data science community. In addition to using it for source code, we use it to manage the motley collection of files that make up typical data analytical projects, which often consist of data, figures, reports, and, yes, source code. In this presentation aimed at beginners we will try to give you an understanding on what git is, how it is integrated in Rstudio and how it can help you make your projects more reproducible, enable you to share your code with the world and collaborate in a seamless manner. (abstract strongly inspired by the intro from

Hope to see you there!

NPL Markets and R Shiny

Our mission at NPL Markets is to enable smart decision-making through innovative trading technology, advanced data analytics and a new comprehensive trading ecosystem. We focus specifically on the illiquid asset market, a market that is partially characterized by its unstructured data.

  Platform Overview

NPL Markets fully embraces R and Shiny to create an interactive platform where sellers and buyers of illiquid credit portfolios can interact with each other and use sophisticated tooling to structure and analyse credit data. 

Creating such a platform with R Shiny was a challenge, while R Shiny is extremely well suited to analyzing data, it is not entirely a generalist web framework. Perhaps it was not intended as such, but our team at NPL Markets has managed to create an extremely productive setup. 

Our development setup has a self-build ‘hot reload’ library, which we may release to a wider audience once it is sufficiently tested internally. This library allows us to update front-end and server code on the fly without restarting the entire application.

In addition to our development setup, our production environment uses robust error handling and reporting, preventing crashes that require an application to restart.

More generally, we use continuous integration and deployment and automatically create unit tests for any newly created R functions, allowing us to quickly iterate.

If you would like to know more about what we built with R and Shiny, reach out to us via our website

The Rhythm of Names

One of the fundamental properties of English prosody is a preference for alternations between strong and weak beats. This preference for rhythmic alternation is expressed in several ways:

      • Stress patterns in polysyllabic words like “testimony” and “obligatory” – as well as nonce words like “supercalifragilisticexpialidocious” – alternate between strong and weak beats.
      • Stress patterns on words change over time so that they maintain rhythmic alternation in the contexts in which they typically appear.
      • Over 90% of formal English poetry like that written by Shakespeare and Milton follows iambic or trochaic meter, i.e., weak-strong or strong-weak units.
      • Speakers insert disyllabic expletives in polysyllabic word in places that create or reinforce rhythmic alternation (e.g., we say “Ala-bloody-bama” or “Massa-bloody-chusetts” not “Alabam-blood-a” or “Massachu-bloody-setts”)
This blog examines whether the preference for rhythmic alternation affects naming patterns. Consider the following names of lakes in the United States:
      1. Glacier Lake
      2. Guitar Lake*
      3. Lake Louise
      4. Lake Ellen*
The names (a) and (c) preserve rhythmic alternation whereas the asterisked (b) and (d) create a stress clash with two consecutive stressed syllables.

As you can see, lake names in the US can either begin or end with “Lake”. More than 90% end in “Lake” reflecting the standard modifier + noun word order in English. But the flexibility allows us to test whether particular principles (e.g., linguistic, cultural) affect the choice between “X Lake” and “Lake X.” In the case of rhythmic alternation, we would expect weak-strong “iambic” words like “Louise” and “Guitar” to be more common in names beginning than ending in “Lake.”

To test this hypothesis, I used the Quanteda package to pull all the names of lakes and reservoirs in the USGS database of placenames that contained the word “Lake” plus just one other word before or after it. Using the “nsyllable” function in Quanteda, I whittled the list down to names whose non-“lake” word contained just two syllables. Finally, I pulled random samples of 500 names each from those beginning and ending with Lake, then manually coded the stress patterns on the non-lake word in each name.

Coding details for these steps follow. First, we’ll load our place name data frame and take a look at the variable names in the data frame, which are generally self-explanatory:


Next, we’ll filter to lakes and reservoirs based on the FEATURE_CLASS variable, and convert names to lower case. We’ll then flag lake and reservoir names that either begin or end with the word “lake”, filtering out those in neither category:

temp <- filter(placeNames, FEATURE_CLASS %in% c("Lake","Reservoir"))
temp$FEATURE_NAME <- tolower(temp$FEATURE_NAME)
temp$first_word <- 0
temp$last_word <- 0
temp$first_word[grepl("^lake\\b",temp$FEATURE_NAME)] <- 1
temp$last_word[grepl("\\blake$",temp$FEATURE_NAME)] <- 1
temp <- filter(temp, first_word + last_word > 0)
We’ll use the ntoken function in the Quanteda text analytics package to find names that contain just two words. By definition given the code so far, one of these two words is “lake.” We’ll separate out the other word, and use the nsyllable function in Quanteda to pull out just those words containing two syllables (i.e., “disyllabic” words). These will be the focus of our analysis.
temp$nWords <- ntoken(temp$FEATURE_NAME, remove_punct=TRUE)
temp <- filter(temp, nWords == 2)
temp$num_syl <- nsyllable(temp$otherWord)
temp <- filter(temp, num_syl == 2)
temp$otherWord <- temp$FEATURE_NAME
temp$otherWord <- gsub("^lake\\b","",temp$otherWord)
temp$otherWord <- gsub("\\blake$","",temp$otherWord)
temp$otherWord <- trimws(temp$otherWord)
  Given the large number of names with “lake” either in first or last position plus a two syllable word (30,391 names), we’ll take a random sample of 500 names beginning with “lake” and 500 ending with “lake”, combine into single data frame, and save as a csv file.
lake_1 <- filter(temp, first_word == 1) %>% sample_n(500)
lake_2 <- filter(temp, last_word == 1) %>% sample_n(500)
lakeSample <- rbind(lake_1,lake_2)
write.csv(lakeSample,file="lake stress clash sample.csv",row.names=FALSE)

I manually coded each of the disyllabic non-“lake” words in each name for whether it had strong-weak (i.e., “trochaic”) or weak-strong (“iambic”) stress. This coding was conducted blind to whether the name began or ended in “Lake.” Occasionally, I came across words like “Cayuga” that the nsyllable function erred in classifying as containing two syllables. I dropped these 23 words – 2.3% of the total – from the analysis (18 in names beginning with “Lake” and 5 in names ending in “Lake”).

Overall, 90% of the non-lake words had trochaic stress, which is consistent with the dominance of this stress pattern in the disyllabic English lexicon. However, as predicted from the preference for rhythmic alternation, iambic stress was almost 5x more common in names beginning than ending with “Lake” (16.4% vs. 3.4%, x2 = 44.81, p < .00001).

Place names provide a rich resource for testing the potential impact of linguistic and cultural factors on the layout of our “namescape.” For example, regional differences in the distribution of violent words in US place names are associated with long-standing regional variation in attitudes towards violence. Large databases of place names along with R tools for text analytics offer many opportunities for similar analyses.

Tutorial: Cleaning and filtering data from Qualtrics surveys, and creating new variables from existing data

Hi fellow R users (and Qualtrics users),

As many Qualtrics surveys produce really similar output datasets, I created a tutorial with the most common steps to clean and filter data from datasets directly downloaded from Qualtrics.

You will also find some useful codes to handle data such as creating new variables in the dataframe from existing variables with functions and logical operators.

The tutorial is presented in the format of a downloadable R code with  explanations and annotations of each step. You will also find a raw Qualtrics dataset to work with.

Link to the tutorial:

This dataset comes from a Qualtrics survey with an experiment format (control and treatment conditions), but the codes can be applicable to non-experimental datasets as well, as many cleaning steps are the same.