¿Cómo puedo hacer subprocesos múltiples para enviar archivos grandes desde FTP a archivos grandes de Azure más rápido?

Aug 19 2020

Actualmente, tengo un código que descarga un archivo de FTP al disco duro local. Luego carga un archivo en fragmentos a Azure. Finalmente, borra el archivo de local y ftp. Sin embargo, este código es muy lento. Solo quería saber cómo mejorarlo.

    private async Task UploadToBlobJobAsync(FtpConfiguration ftpConfiguration, BlobStorageConfiguration blobStorageConfiguration, string fileExtension)
    {
        try
        {
               ftpConfiguration.FileExtension = fileExtension;

                var filesToProcess = FileHelper.GetAllFileNames(ftpConfiguration).ToList();
                
                var batchSize = 4;
                List<Task> uploadBlobToStorageTasks = new List<Task>(batchSize);

                for (int i = 0; i < filesToProcess.Count(); i += batchSize)
                {
                    // calculated the remaining items to avoid an OutOfRangeException
                    batchSize = filesToProcess.Count() - i > batchSize ? batchSize : filesToProcess.Count() - i;

                    for (int j = i; j < i + batchSize; j++)
                    {
                        var fileName = filesToProcess[j];
                        var localFilePath = SaveFileToLocalAndGetLocation(ftpConfiguration, ftpConfiguration.FolderPath, fileName);

                        // Spin off a background task to process the file we just downloaded
                        uploadBlobToStorageTasks.Add(Task.Run(() =>
                        {
                            // Process the file
                            UploadFile(ftpConfiguration, blobStorageConfiguration, fileName, localFilePath).ConfigureAwait(false);
                        }));
                    }

                    Task.WaitAll(uploadBlobToStorageTasks.ToArray());
                    uploadBlobToStorageTasks.Clear();
                }
        }
        catch (Exception ex)
        {
        }
    }

    private async Task UploadFile(FtpConfiguration ftpConfiguration, BlobStorageConfiguration blobStorageConfiguration, string fileName, string localFilePath)
    {
        try
        {
            await UploadLargeFiles(GetBlobStorageConfiguration(blobStorageConfiguration), fileName, localFilePath).ConfigureAwait(false);
    FileHelper.DeleteFile(ftpConfiguration, fileName); // delete file from ftp
        }
        catch (Exception exception)
        {
        }
    }

   private async Task UploadLargeFiles(BlobStorageConfiguration blobStorageConfiguration, string fileName, string localFilePath)
    {
        try
        {
            var output = await UploadFileAsBlockBlob(localFilePath, blobStorageConfiguration).ConfigureAwait(false);

            // delete the file from local
            Logger.LogInformation($"Deleting {fileName} from the local folder. Path is {localFilePath}.");

            if (File.Exists(localFilePath))
            {
                File.Delete(localFilePath);
            }
        }
        catch (Exception ex)
        {
        }
    }

    private async Task UploadFileAsBlockBlob(string sourceFilePath, BlobStorageConfiguration blobStorageConfiguration)
    {
        string fileName = Path.GetFileName(sourceFilePath);
        try
        {
            var storageAccount = CloudStorageAccount.Parse(blobStorageConfiguration.ConnectionString);
            var blobClient = storageAccount.CreateCloudBlobClient();
            var cloudContainer = blobClient.GetContainerReference(blobStorageConfiguration.Container);
            await cloudContainer.CreateIfNotExistsAsync().ConfigureAwait(false);

            var directory = cloudContainer.GetDirectoryReference(blobStorageConfiguration.Path);
            var blob = directory.GetBlockBlobReference(fileName);

            var blocklist = new HashSet<string>();

            byte[] bytes = File.ReadAllBytes(sourceFilePath);

            const long pageSizeInBytes = 10485760 * 20; // 20mb at a time
            long prevLastByte = 0;
            long bytesRemain = bytes.Length;

            do
            {
                long bytesToCopy = Math.Min(bytesRemain, pageSizeInBytes);
                byte[] bytesToSend = new byte[bytesToCopy];

                Array.Copy(bytes, prevLastByte, bytesToSend, 0, bytesToCopy);

                prevLastByte += bytesToCopy;
                bytesRemain -= bytesToCopy;

                // create blockId
                string blockId = Guid.NewGuid().ToString();
                string base64BlockId = Convert.ToBase64String(Encoding.UTF8.GetBytes(blockId));

                await blob.PutBlockAsync(base64BlockId, new MemoryStream(bytesToSend, true), null).ConfigureAwait(false);

                blocklist.Add(base64BlockId);
            }
            while (bytesRemain > 0);

            // post blocklist
            await blob.PutBlockListAsync(blocklist).ConfigureAwait(false);
        }
        catch (Exception ex)
        {
        }
    }

Respuestas

1 Blindy Aug 20 2020 at 00:25

Primero, no escriba en el disco nada que no necesite. No está del todo claro cuál es su objetivo aquí, pero no veo por qué tendría tal necesidad en primer lugar.

Dicho esto, si toma su función de envío en el vacío, lo que hace ahora es:

Lea todo su archivo grande (como usted dice) en la memoria
Para cada fragmento, asigna una matriz completamente nueva, copia el fragmento, coloca MemoryStreamencima de él y luego lo envía.

Así no es como se hace el streaming.

En su lugar, debe abrir un flujo de archivos sin leer nada, luego recorrerlo para obtener tantos fragmentos como necesite y leer cada fragmento individualmente en un búfer preasignado (no siga asignando nuevas matrices de bytes), obtenga la representación base64 si realmente lo necesita, y envíe el fragmento, luego siga repitiendo. Tu recolector de basura te lo agradecerá.