Downloading Quarterly Financial Reports from Bovespa â
Financial statements of companies traded at B3 (formerly Bovespa), the Brazilian stock exchange, are available in its website. Accessing the data for a single company is straightforwardd. In the website one can find a simple interface for accessing this dataset. An example is given here. However, gathering and organizing the data for a large scale research, with many companies and many quarters, is painful. Quarterly reports must be downloaded or copied individually and later aggregated. Changes in the accounting format thoughout time can make this process slow, unreliable and irreproducible.
Package GetITRData
provides a R interface to all financial statements available in the website. It not only downloads the data but also organizes it in a tabular format and allows the use of inflation indexes. Users can simply select companies and a time period to download all available data. Several information about current companies, such as sector and available quarters are also at reach. The main purpose of the package is to make it easy to access quarterly financial statements in large scale research, facilitating the reproducibility of corporate finance studies with Bovespa data.
Installation
The package is available in CRAN (release version) and in Github (development version). You can install any of those with the following code:
# Release version in CRAN install.packages('GetITRData') # Development version in Githubdevtools::install\_github('msperlin/GetITRData')
How to use GetITRData
The starting point of GetITRData
is to find the official names of companies in Bovespa. Function gitrd.search.company
serves this purpose. Given a string (text), it will search for a partial matches in companies names. As an example, letâs find the official name of Petrobras, one of the largest companies in Brazil:
library(GetITRData)gitrd.search.company('petrobras')## ## Reading info file from github## Found 24012 lines for 603 companies [Actives = 457 Inactives = 146 ]## Last file update: 2017-09-19## ## Found 1 companies:## PETROBRAS | situation = ATIVO | first date = 1998-09-30 | last date - 2017-06-30
Its official name in Bovespa records is PETROBRAS. Data for quarterly statements is available from 1998 to 2017. The situation of the company, active or canceled, is also given. This helps verifying the availability of data.
The content of all available quarterly statements can be accessed with function gitrd.get.info.companies
. It will read and parse a .csv file from my github repository. This will be periodically updated for new quarterly statements. Letâs try it out:
df.info <- gitrd.get.info.companies()## ## Reading info file from github## Found 24012 lines for 603 companies [Actives = 457 Inactives = 146 ]## Last file update: 2017-09-19str(df.info)## Classes 'tbl_df', 'tbl' and 'data.frame': 24012 obs. of 10 variables:## $ id.company : int 16330 16330 16330 16330 16330 16330 16330 16330 16330 16330 ...## $ name.company: chr "521 PARTICIPAĂĂES S/A" "521 PARTICIPAĂĂES S/A" "521 PARTICIPAĂĂES S/A" "521 PARTICIPAĂĂES S/A" ...## $ main.sector : chr NA NA NA NA ...## $ sub.sector : chr NA NA NA NA ...## $ segment : chr NA NA NA NA ...## $ id.file : int 47368 42591 40601 37569 32706 30654 27521 22269 20368 17649 ...## $ dl.link : chr "http://www.rad.cvm.gov.br/enetconsulta/frmDownloadDocumento.aspx?CodigoInstituicao=2&NumeroSequencialDocumento=47368" "http://www.rad.cvm.gov.br/enetconsulta/frmDownloadDocumento.aspx?CodigoInstituicao=2&NumeroSequencialDocumento=42591" "http://www.rad.cvm.gov.br/enetconsulta/frmDownloadDocumento.aspx?CodigoInstituicao=2&NumeroSequencialDocumento=40601" "http://www.rad.cvm.gov.br/enetconsulta/frmDownloadDocumento.aspx?CodigoInstituicao=2&NumeroSequencialDocumento=37569" ...## $ id.date : Date, format: "2015-03-31" "2014-09-30" ...## $ id.type : chr "after 2011" "after 2011" "after 2011" "after 2011" ...## $ situation : chr "ATIVO" "ATIVO" "ATIVO" "ATIVO" ...
This file includes several information that are gathered from Bovespa: names of companies, sectors, dates quarterly statements and, most importantly, the links to download the files. The resulting dataframe can be used to filter and gather information for large scale research such as downloading financial data for a specific sector.
Downloading financial information for ONE company
All you need to download financial data with GetITRData
are the official names of companies, which can be found with gitrd.search.company
, the desired starting and ending dates and the type of financial information (individual or consolidated). Letâs try it for PETROBRAS:
name.companies <- 'PETROBRAS'first.date <- '2005-01-01'last.date <- '2006-01-01'type.statements <- 'individual'df.reports <- gitrd.GetITRData(name.companies = name.companies, first.date = first.date, last.date = last.date, type.info = type.statements)## ## Reading info file from github## Found 24012 lines for 603 companies [Actives = 457 Inactives = 146 ]## Last file update: 2017-09-19## ## Downloading data for 1 companies## Type of financial statements: individual## First Date: 2005-01-01## Laste Date: 2006-01-01## Inflation index: none## ## Downloading none data using BETS Done## ## ## WARNING: For quarters before 2009, the cash flow statements are not available## ## Starting processing stage:## PETROBRAS## Available quarters: 2005-09-30 2005-06-30 2005-03-31## ## ## Processing PETROBRAS, Quarter = 2005-09-30 Downloading | Reading file## Processing PETROBRAS, Quarter = 2005-06-30 Downloading | Reading file## Processing PETROBRAS, Quarter = 2005-03-31 Downloading | Reading file## Warning in max.default(structure(numeric(0), class = "Date"), na.rm =## FALSE): no non-missing arguments to max; returning -Inf
The resulting object is a tibble
, a data.frame type of object that allows for list columns. Letâs have a look in its content:
str(df.reports)## Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 10 variables:## $ company.name: chr "PETROBRAS"## $ company.code: int 9512## $ type.info : chr "individual"## $ min.date : Date, format: "2005-03-31"## $ max.date : Date, format: "2005-09-30"## $ n.quarters : int 3## $ assets :List of 1## ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 191 obs. of 5 variables:## .. ..$ acc.number : chr "1" "1.01" "1.01.01" "1.01.01.01" ...## .. ..$ acc.desc : chr "Ativo Total" "Ativo Circulante" "Disponibilidades" "Caixa e Bancos" ...## .. ..$ acc.value : int 147026464 41052623 15146496 1498194 13648302 9450575 3488650 4955111 1090180 -83366 ...## .. ..$ ref.date : Date, format: "2005-09-30" ...## .. ..$ company.name: chr "PETROBRAS" "PETROBRAS" "PETROBRAS" "PETROBRAS" ...## $ liabilities :List of 1## ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 160 obs. of 5 variables:## .. ..$ acc.number : chr "2" "2.01" "2.01.01" "2.01.01.01" ...## .. ..$ acc.desc : chr "Passivo Total" "Passivo Circulante" "EmprĂ©stimos e Financiamentos" "Financiamentos" ...## .. ..$ acc.value : int 147026464 44603494 1154258 1039841 114417 0 5205735 7429086 1156875 935603 ...## .. ..$ ref.date : Date, format: "2005-09-30" ...## .. ..$ company.name: chr "PETROBRAS" "PETROBRAS" "PETROBRAS" "PETROBRAS" ...## $ income :List of 1## ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 100 obs. of 5 variables:## .. ..$ acc.number : chr "3.01" "3.02" "3.03" "3.04" ...## .. ..$ acc.desc : chr "Receita Bruta de Vendas e/ou Serviços" "DeduçÔes da Receita Bruta" "Receita LĂquida de Vendas e/ou Serviços" "Custo de Bens e/ou Serviços Vendidos" ...## .. ..$ acc.value : int 37870949 -9778549 28092400 -15030559 13061841 -4270217 -1221668 -895230 -899 -894331 ...## .. ..$ ref.date : Date, format: "2005-09-30" ...## .. ..$ company.name: chr "PETROBRAS" "PETROBRAS" "PETROBRAS" "PETROBRAS" ...## $ df.cashflow :List of 1## ..$ :'data.frame': 0 obs. of 5 variables:## .. ..$ acc.desc : Named list()## .. ..$ acc.value : logi ## .. ..$ acc.number : chr ## .. ..$ company.name: chr ## .. ..$ ref.date :Class 'Date' num(0) ## .. ..- attr(*, "na.action")=Class 'omit' Named int [1:3] 1 2 3## .. .. .. ..- attr(*, "names")= chr [1:3] "1" "2" "3"
Object df.reports
only has one row since we only asked for data of one company. The number of rows increases with the number of companies, as we will soon learn with the next example. All financial statements for the different quarters are available within df.reports
. For example, the income statements for all desired quarters of PETROBRAS are:
df.income.long <- df.reports$income[[1]]str(df.income.long)## Classes 'tbl_df', 'tbl' and 'data.frame': 100 obs. of 5 variables:## $ acc.number : chr "3.01" "3.02" "3.03" "3.04" ...## $ acc.desc : chr "Receita Bruta de Vendas e/ou Serviços" "DeduçÔes da Receita Bruta" "Receita LĂquida de Vendas e/ou Serviços" "Custo de Bens e/ou Serviços Vendidos" ...## $ acc.value : int 37870949 -9778549 28092400 -15030559 13061841 -4270217 -1221668 -895230 -899 -894331 ...## $ ref.date : Date, format: "2005-09-30" "2005-09-30" ...## $ company.name: chr "PETROBRAS" "PETROBRAS" "PETROBRAS" "PETROBRAS" ...
The resulting dataframe is in the long format, ready for processing. In the long format, financial statements of different quarters are stacked. In the wide format, we have the columns as dates and rows as account names/ids. If you want the wide format, which I believe is most common in financial analysis, you can use function gitrd.convert.to.wide
. See an example next:
df.income.wide <- gitrd.convert.to.wide(df.income.long)knitr::kable(df.income.wide )
acc.number | acc.desc | company.name | 2005-03-31 | 2005-06-30 | 2005-09-30 |
---|---|---|---|---|---|
3.01 | Receita Bruta de Vendas e/ou Serviços | PETROBRAS | 31355183 | 35425584 | 37870949 |
3.02 | DeduçÔes da Receita Bruta | PETROBRAS | -8788723 | -9321322 | -9778549 |
3.03 | Receita LĂquida de Vendas e/ou Serviços | PETROBRAS | 22566460 | 26104262 | 28092400 |
3.04 | Custo de Bens e/ou Serviços Vendidos | PETROBRAS | -12052044 | -14530594 | -15030559 |
3.05 | Resultado Bruto | PETROBRAS | 10514416 | 11573668 | 13061841 |
3.06 | Despesas/Receitas Operacionais | PETROBRAS | -2869703 | -5249799 | -4270217 |
3.06.01 | Com Vendas | PETROBRAS | -858170 | -820899 | -1221668 |
3.06.02 | Gerais e Administrativas | PETROBRAS | -768830 | -880185 | -895230 |
3.06.02.01 | Honor.Diretoria e Cons. Administração | PETROBRAS | -984 | -886 | -899 |
3.06.02.02 | De Administração | PETROBRAS | -767846 | -879299 | -894331 |
3.06.03 | Financeiras | PETROBRAS | -53787 | -480281 | -282706 |
3.06.03.01 | Receitas Financeiras | PETROBRAS | 525452 | 106753 | 272290 |
3.06.03.02 | Despesas Financeiras | PETROBRAS | -579239 | -587034 | -554996 |
3.06.04 | Outras Receitas Operacionais | PETROBRAS | 0 | 0 | 0 |
3.06.05 | Outras Despesas Operacionais | PETROBRAS | -2104923 | -3155693 | -1956858 |
3.06.05.01 | Custos Explot.p/Extração Petróleo/Gås | PETROBRAS | -107010 | -101527 | -334116 |
3.06.05.02 | Custo c/ Pesq. Desenv. TecnolĂłgico | PETROBRAS | -192741 | -221813 | -247456 |
3.06.05.03 | TributĂĄrias | PETROBRAS | -185581 | -290086 | -114519 |
3.06.05.04 | VariaçÔes MonetĂĄrias e Cambiais LĂquidas | PETROBRAS | -117821 | -921626 | -401757 |
3.06.05.05 | Outras Despesas/Receitas Oper. Liquidas | PETROBRAS | -1501770 | -1620641 | -859010 |
3.06.06 | Resultado da EquivalĂȘncia Patrimonial | PETROBRAS | 916007 | 87259 | 86245 |
3.07 | Resultado Operacional | PETROBRAS | 7644713 | 6323869 | 8791624 |
3.08 | Resultado NĂŁo Operacional | PETROBRAS | -151498 | -64670 | 1064 |
3.08.01 | Receitas | PETROBRAS | 1256 | 8805 | 450552 |
3.08.02 | Despesas | PETROBRAS | -152754 | -73475 | -449488 |
3.09 | Resultado Antes Tributação/ParticipaçÔes | PETROBRAS | 7493215 | 6259199 | 8792688 |
3.10 | Provisão para IR e Contribuição Social | PETROBRAS | -1847762 | -1151342 | -3002751 |
3.11 | IR Diferido | PETROBRAS | -538132 | -408725 | -111709 |
3.12 | ParticipaçÔes/ContribuiçÔes Estatutårias | PETROBRAS | 0 | 0 | 0 |
3.12.01 | ParticipaçÔes | PETROBRAS | 0 | 0 | 0 |
3.12.01.01 | Partic. de Empregados e administradores | PETROBRAS | 0 | NA | NA |
3.12.02 | ContribuiçÔes | PETROBRAS | 0 | 0 | 0 |
3.13 | ReversĂŁo dos Juros sobre Capital PrĂłprio | PETROBRAS | 0 | 0 | 0 |
3.15 | Lucro/PrejuĂzo do PerĂodo | PETROBRAS | 5107321 | 4699132 | 5678228 |
Downloading financial information for SEVERAL companies
If you are doing serious research, it is likely that you need financial statements for more than one company. Package GetITRData
is specially designed for handling large scale download of data. Letâs build a case with 5 randomly selected companies:
set.seed(5)my.companies <- sample(unique(df.info$name.company), 5)first.date <- '2005-01-01'last.date <- '2006-01-01'type.statements <- 'individual'df.reports <- gitrd.GetITRData(name.companies = my.companies, first.date = first.date, last.date = last.date, type.info = type.statements)## ## Reading info file from github## Found 24012 lines for 603 companies [Actives = 457 Inactives = 146 ]## Last file update: 2017-09-19## ## Downloading data for 5 companies## Type of financial statements: individual individual individual individual individual## First Date: 2005-01-01## Laste Date: 2006-01-01## Inflation index: none## ## Downloading none data using BETS Done## ## ## WARNING: Cant find available dates for BV LEASING## WARNING: For quarters before 2009, the cash flow statements are not available## ## Starting processing stage:## BANESTES S/A## Available quarters: 2005-09-30 2005-06-30 2005-03-31## CIA. IGUAĂU DE CAFĂ SOLĂVEL## Available quarters: 2005-09-30 2005-06-30 2005-03-31## MUNDIAL## Available quarters: 2005-09-30 2005-06-30 2005-03-31## TELEMAR PARTICIPAĂĂES S.A## Available quarters: 2005-09-30 2005-06-30 2005-03-31## ## ## Processing BANESTES S/A, Quarter = 2005-09-30 Downloading | Reading file## Processing BANESTES S/A, Quarter = 2005-06-30 Downloading | Reading file## Processing BANESTES S/A, Quarter = 2005-03-31 Downloading | Reading file## Warning in max.default(structure(numeric(0), class = "Date"), na.rm =## FALSE): no non-missing arguments to max; returning -Inf## ## Processing CIA. IGUAĂU DE CAFĂ SOLĂVEL, Quarter = 2005-09-30 Downloading | Reading file## Processing CIA. IGUAĂU DE CAFĂ SOLĂVEL, Quarter = 2005-06-30 Downloading | Reading file## Processing CIA. IGUAĂU DE CAFĂ SOLĂVEL, Quarter = 2005-03-31 Downloading | Reading file## Warning in max.default(structure(numeric(0), class = "Date"), na.rm =## FALSE): no non-missing arguments to max; returning -Inf## ## Processing MUNDIAL, Quarter = 2005-09-30 Downloading | Reading file## Processing MUNDIAL, Quarter = 2005-06-30 Downloading | Reading file## Processing MUNDIAL, Quarter = 2005-03-31 Downloading | Reading file## Warning in max.default(structure(numeric(0), class = "Date"), na.rm =## FALSE): no non-missing arguments to max; returning -Inf## ## Processing TELEMAR PARTICIPAĂĂES S.A, Quarter = 2005-09-30 Downloading | Reading file## Processing TELEMAR PARTICIPAĂĂES S.A, Quarter = 2005-06-30 Downloading | Reading file## Processing TELEMAR PARTICIPAĂĂES S.A, Quarter = 2005-03-31 Downloading | Reading file## Warning in max.default(structure(numeric(0), class = "Date"), na.rm =## FALSE): no non-missing arguments to max; returning -Inf
And now we can check the resulting tibble
:
head(df.reports )## # A tibble: 4 x 10## company.name company.code type.info min.date## ## 1 BANESTES S/A 1155 individual 2005-03-31## 2 CIA. IGUAĂU DE CAFĂ SOLĂVEL 3336 individual 2005-03-31## 3 MUNDIAL 5312 individual 2005-03-31## 4 TELEMAR PARTICIPAĂĂES S.A 18678 individual 2005-03-31## # ... with 6 more variables: max.date , n.quarters ,## # assets , liabilities , income , df.cashflow
Every row of df.reports
will provide information for one company. Metadata about the corresponding dataframes such as min/max dates is available in the first columns. Keeping a tabular structure facilitates the organization and future processing of all financial data. We can use tibble df.reports
for creating other dataframes in the long format containing data for all companies. See next, where we create dataframes with the assets and liabilities of all companies:
df.assets <- do.call(what = rbind, args = df.reports$assets)df.liabilities <- do.call(what = rbind, args = df.reports$liabilities)df.assets.liabilities <- rbind(df.assets, df.liabilities)
As an example, letâs use the resulting dataframe for calculating and analyzing a simple liquidity index of a company, the total of current (liquid) assets (Ativo circulante) divided by the total of current short term liabilities (Passivo Circulante), over time.
library(dplyr)## ## Attaching package: 'dplyr'## The following objects are masked from 'package:stats':## ## filter, lag## The following objects are masked from 'package:base':## ## intersect, setdiff, setequal, unionmy.tab <- df.assets.liabilities %>% group_by(company.name, ref.date) %>% summarise(Liq.Index = acc.value[acc.number == '1.01']/ acc.value[acc.number == '2.01'])my.tab## # A tibble: 12 x 3## # Groups: company.name [?]## company.name ref.date Liq.Index## ## 1 BANESTES S/A 2005-03-31 0.7985816## 2 BANESTES S/A 2005-06-30 0.8393188## 3 BANESTES S/A 2005-09-30 0.9151590## 4 CIA. IGUAĂU DE CAFĂ SOLĂVEL 2005-03-31 2.3459918## 5 CIA. IGUAĂU DE CAFĂ SOLĂVEL 2005-06-30 2.7291873## 6 CIA. IGUAĂU DE CAFĂ SOLĂVEL 2005-09-30 2.1644541## 7 MUNDIAL 2005-03-31 0.7903105## 8 MUNDIAL 2005-06-30 0.8555824## 9 MUNDIAL 2005-09-30 0.9052814## 10 TELEMAR PARTICIPAĂĂES S.A 2005-03-31 1.5650393## 11 TELEMAR PARTICIPAĂĂES S.A 2005-06-30 0.9419386## 12 TELEMAR PARTICIPAĂĂES S.A 2005-09-30 1.3275712
Now we can visualize the information using ggplot2
:
library(ggplot2)p <- ggplot(my.tab, aes(x = ref.date, y = Liq.Index, fill = company.name)) + geom_col(position = 'dodge' )print(p)
As we can see, CIA IGUACU is the company with highest liquidity, being able to pay its short term debt with the current assets in all quarters. We can certainly do a lot more interesting studies based on this dataset.
Exporting financial data
The package includes function gitrd.export.ITR.data
for exporting the financial data to an Excel file. Users can choose between the long and wide format. See next:
my.basename <- 'MyExcelData'my.format <- 'xlsx' # only supported so fargitrd.export.ITR.data(data.in = df.reports, base.file.name = my.basename, type.export = my.format, format.data = 'long')
The resulting Excel file contains all data available in df.reports
.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...