Copying a large set of files

Today I found myself needing to copy a very large number of files from one server to another. My standard tool for these things is usually scp, but in this case it didn't work due to the fact that i needed to copy a large subset of files within a directory. In other words, I didn't want to copy server:directory – but server:directory/*pattern*. This is definitely possible with scp, but because *pattern* returned so many files in this case, I recieved the error message Argument list too long.

I found a handy solution for this, but first let's take a look at the reason behind this error message.

The problem

What wasn't clear to me before, is that whenever you use file path patterns like the one above, whichever command you're using, it's actually replaced internally by a number of arguments representing each file that matches the pattern. So for instance, if you would run:

ls *

this would be expanded into:

ls all your files listed as separate arguments

Needless to say, that list of arguments can become very long. And in the Linux kernel there is a hard limit on the amount of data you can submit through command line arguments. If you exceed that, you get the error message Argument list too long.

find and tar

Linux Journal has a good article on this issue and goes through the possible workarounds. I used the find command, which is a versatile tool for finding a set of files and performing some task on each of them. This is the command I ended up using:

find my_directory -name '*pattern*' -print0 | xargs -0 tar -rf archive.tar

This way *pattern* will be handeled by find and isn't expanded into command line arguments. Since it would be inefficient to open a scp connection for each file that find returns, I instead used tar to append each file to an archive, one by one. The archive could then easily be copied over to my other machine with scp. Worked perfectly!

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <h2>
  • Lines and paragraphs break automatically.

More information about formatting options

Byt till svenska