Using find, sed, and xargs to replace filename in document

Posted on Wednesday, August 20, 2014


I have been setting up a system using git and rsync.  For larger files, or folders that don't change often, I am using rsync in place of git.  git was never meant to move large binary files as I found out.   My sojurn into this is documented at http://www.whiteboardcoder.com/2014/08/gitignore-and-calculating-tools.html

I am at a point, now, where I am engineering a solution for all my ISO files.  I have several Linux and Windows ISOs always on hand, for convenience.  Each ISO is in its own directory, for example Ubuntu 12.04 server is in folder linux_isos/Ubuntu/Server/12.04/.  In this particular folder I have the ISO file ubuntu-12.04.5-server-amd64.iso.   (For windows ISOs I have the ISO and a key code file for registrering it after an install)



I set turned my ISO parent directory into a git repository with this .gitignore file.


#Ignore all those dumb .ds_store file
.DS_Store
._*
#Generic files to ignore
*.swp
*.lock
*~
*.out

#not-git files and folders
#in this case just ignore .iso files
*.iso


This will ignore all the .iso files, but capture the key files and rsync files I am going to create in each folder.




I could create one rsync script that grabs every ISO, but that would kill my network and my storage.  Instead I want to have an rsync script per ISO file.  That way if I need to graph my windows 8 ISO I can simply go to the correct directory and run my script, which will pull down my file from my remote server.

           



Rsync script


Here is the rsync script I came up with, feel free to use it and tweak it.



#!/bin/bash

#Different Remote location
url=www.example.com
#url=www.example2.com

#Check for an override name
name=""
if [ $2 ]
then
  name="$2@"
fi

#=======================================
#
#Only spot you should be changing anything
#Array contains subfolders and other array contains files/folders to rsync
#I don't think bash supports arrays of arrays so i did it this way
loc="/not-git/rsync/08_ISOs/LINUX/"
folders=("")
files=("ubuntu-12.04.5-server-amd64.iso")
#
#===============================================

flags="-avzr"

if [ "$1" == 'push' ]
then
  echo "Push it"
  #Need to make the directories
  #for folder in $folders
  for i in "${!folders[@]}"
   do
     ssh "$name"$url mkdir -p $loc${folders[$i]}
     rsync $flags "${folders[$i]}${files[$i]}" "$name"$url:$loc${folders[$i]}
   done
else
  echo "Pull it"
  #Create local folder if it is not present
  for i in "${!folders[@]}"
   do
     if [ "${folders[$i]}" == '' ]
     then
       rsync $flags "$name"$url:$loc${folders[$i]}${files[$i]} .
     else
       mkdir -p ${folders[$i]}
       rsync $flags "$name"$url:$loc${folders[$i]}${files[$i]} ${folders[$i]}
     fi
   done
fi


You need to change the highlighted areas.  You need to put your url instead of www.example.com.


url=www.example.com


Update the loc variable to be the directory you will rsync to on your remote server


loc="/not-git/rsync/08_ISOs/LINUX/"


Change files to the actual name of the ISO you want to rsync


files=("ubuntu-12.04.5-server-amd64.iso")





I named my script .rsync-not-get so that it would be hidden from normal views.

After this file is set up run it using


     > ./.rsync-not-get push patman


This script will rsync (Push) the file to the remote server.  The second variable is a user name.  You can leave it off if your username on the system you are pushing to is the same the one you are on.

Using git to save the repository to a remote server will commit all the script but will skip over the ISO files due to the .gitignore file.


After cloning the git repo you can run the script from the other side and rsync (pull) the file from the remote server.


     > ./.rsync-not-get pull patman







Dozens of scripts


OK job well done!  We are almost done.  All that is left to do is copy the script to each ISO directory and change the files variable to the name of the ISO in that directory.

Well I have at least a dozen if not two-dozen ISO files.   This seems like a good excuse to figure out how to replace one part of my text file.

First I am going to make a generic rsync file where I replace the file name with XXXXX.  To make it simpler to replace.


loc="XXXXX"



With sed you can do it easily enough. Here is an example.


     >  sed -i 's/XXXXX/FILENAME.iso/g' .rsync-not-git


Or in OS X (you need an extra '' BSD based Unix systems require it for some reaon. As noted at http://stackoverflow.com/questions/21228347/using-sed-in-terminal-to-replace-text-in-file [1]


     >  sed -i '' 's/XXXXX/FILENAME.iso/g' .rsync-not-git




This sed command saves me from opening the file and editing it but not much else, I want a command I can run in any directory.  I want the one-line to find the name of the .iso file and put that name into my script.

Here is that one liner


     >  find . -iname "*.iso" -exec basename {} \; | xargs -I '{}' sed -i  's/XXXXX/{}/g' .rsync-not-git


Or in OS X (which requires the extra '')


     >  find . -iname "*.iso" -exec basename {} \; | xargs -I '{}' sed -i '' 's/XXXXX/{}/g' .rsync-not-git



This works great copy the generic upload script (with XXXXX) and from each folder with an ISO file in it run this program.




But can we do one better?  Can I copy generic script into each folder that contains an ISO file and then update each script with one command line call?



Copy the .rsync-not-git file to the base folder and make it generic (replace file name with XXXXX).

Run the following to copy it to every directory that has a .iso file.


     >  find $PWD -iname "*.iso" -exec dirname {} \; | xargs -I '{}' cp .rsync-not-git {}



That solves one of the problems… What about updating them all with  one liner?

… A few hours go by …

OK I give up I can't get it to work as a simple one liner.






Bash Script


Now for a simple script (that will copy the rsync script located in the base directory to each directory and update the text)



     >  vi makescripts.sh


And place the following in it.  (I renamed my default rsync script .rsync-not-git-DEFAULT


for folder in `find $PWD -iname "*.iso" -exec dirname {} \;`; do
  cp .rsync-not-git-DEFAULT $folder/.rsync-not-git
  pushd .
  cd $folder
  find . -iname "*.iso" -exec basename {} \; | xargs -I '{}' sed -i  's/XXXXX/{}/g' .rsync-not-git
  popd
done


Or in the case of OS X



for folder in `find $PWD -iname "*.iso" -exec dirname {} \;`; do
  cp .rsync-not-git-DEFAULT $folder/.rsync-not-git
  pushd .
  cd $folder
  find . -iname "*.iso" -exec basename {} \; | xargs -I '{}' sed -i '' 's/XXXXX/{}/g' .rsync-not-git
  popd
done




Then run it


     >  ./makescripts.sh


All the individual scripts have been created and updated with the correct iso file name.

That worked just fine now I can push and pull each iso file.  But I have one last need… to run all the scripts in sequence and push  up all the ISOs in one go.






Bash script to run all the Bash Scripts

  

     >  vi runallpush.sh


And place the following in it (changing the username to your own)



#!/bin/bash

for folder in `find $PWD -iname ".rsync-not-git" -exec dirname {} \;`; do
  echo $folder
  #cp .rsync-not-git $folder
  pushd .
  cd $folder
  bash ./.rsync-not-git push patman
  popd
done



Then run it



     >  ./runallpush.sh


This may take a while depending on how many files you have and how fast your system is.



In my case I took 81 minutes to transfer 21 GiB over a 1 Gigabit network.







References
[1]        Using sed in terminal to replace text in file
                Accessed 08/2014