Copy Activity Issue in ADF / Synapse Analytics

Recently, when trying to copy a .csv file from an FTP source to a Azure Data Lake using a Copy Activity in Azure Synapse, I had an issue where files > 16MB in size would fail. To overcome this, I took the first 13k rows and created another file from this, which resulted in a 4MB file. I tested this extra small file and it worked in the copy activity no issues. I multiplied these same 13k rows out a few times (I now knew that there was no bad data in this 13k batch) and created a file with size 16.2MB. I tested it, and it failed! I repeated the process, this time reducing the number of rows resulting in a file size of 15.9MB. Again, I tested it and it worked.

It appeared as though 16MB files were the limit for a Copy activity in Synapse. So I found a similar pipeline with a copy activity from the same source, to the same sink, which handled 30 – 40MB files. I tested my 18MB file on this copy and it worked!

 

The only difference between the 2 pipelines was this checkbox:

Disable Chunking Option for Copy Activity

This shows chunking to be disabled so that the copy activity works

Chunking is a performance optimization technique which Synapse tries to use. It parallelizes the copying of data to complete the load faster by reading the file length, breaking this down into multiple parts before trying to read them in parallel.

The Issue:

Not all FTP servers support this and therefore produces an error. You can read more about Chunking and FTPs here.

 

Try checking this option to disable chunking and test your files again. Hopefully this is the solution!

Tags: , , ,