Merging excel files from a realistic point of view in RStudio.

Harsh Ruchir Edirisinghe
2 min readDec 1, 2020
Photo by Lance Grandahl on Unsplash

A typical data analyst or any other data professional would have come across an instance where they had to import a csv file or any other format to do a task. However ,I came across a scenario where I needed to merge multiple excel files which could contain multiple sheets along with the possibility of not only dealing with xlsx files.

The following section of code would be useful to a person who has folders which contain txt files or csv files or xlsx files which needs to be merged .

Prerequisites : None

The code can be simply explained from the below following points:

  1. Install the packages & run the necessary libraries.
  2. Set your working directory ( the location of the files ).
  3. Load the files as a list.
  4. The main step : Run the list through a loop , if the files are of a xlsx format, it passes through the first set of the if function ,where all the sheets of a particular excel get combined. If the files are of a csv format , the 2nd section of the code runs to read the csv files. The same logic follows with any txt files.
  5. The next step involves using a function to create a merged dataframe.
  6. Finally , we can export the dataframe as an xlsx file which is up to the user’s discretion.
#Combining excel files of the same format .#This code can be used to combine excel files across all 3 file formats - csv,txt & xlsx
#The Code with !!! indicates that it needs to be changed according to the user’s requirement.
#Packages to installinstall.packages(“tidyverse”)
install.packages(“readxl”)
install.packages(“writexl”)
#Libraries to runlibrary(writexl)
library(readxl)
library(tidyr)
#Set your working directory (the location of the files to be merged)setwd(“!!! file’ location”)#Example of a location — “C:\\Users\\Mike\\Documents\\Source Data”#Load the files ( ensure they are either only one of the 3 formats : csv, txt & xlsx )my_files <- list.files(pattern = “!!! *.csv”)
my_files
#The first step involved in merging the files ( the below code would automatically account for all the sheets in every excel)if (grepl('.*xlsx',my_files)[1]){
merged = lapply(my_files,function(i){
x<-excel_sheets(i) %>% map_df(~read_xlsx(i,.))

})

}else if (grepl('.*csv',my_files)[1]){

merged = lapply(my_files,function(i){
x<-read.csv(i)

})

} else if (grepl('.*txt',my_files)[1]){

merged = lapply(my_files,function(i){
x<-read.delim(i)

})

}
#The second step involved in merging the files (the below code will aggregate all the information)Merged = do.call(“rbind.data.frame”,merged)#The final step involves exporting the file as an excel in xlsx format to any directorywrite_xlsx(Merged,” #Location for the exported file\\name of the exported file.xlsx”)

References :

AnalyticoHub — https://www.youtube.com/watch?v=Cf3s90FbXUc&t

Jonathan Ng — https://www.youtube.com/watchv=3TUyp4ZMu88&t

--

--

Harsh Ruchir Edirisinghe

Highly enthusiastic about the field of data science and how it can transform the world by solving any dilemma ,regardless of the size and context of it . Skille