cloe's story

var fstream = require('fstream'),
    tar = require('tar'),
    zlib = require('zlib');
 
    res.writeHead(200, {
      'Content-Type'        : 'application/octet-stream',
      'Content-Disposition' : 'attachment; filename=myArchive.zip',
      'Content-Encoding'    : 'gzip'
    });
 
    var folderWeWantToZip = '/foo/bar';
 
    /* Read the source directory */
    fstream.Reader({ 'path' : folderWeWantToZip, 'type' : 'Directory' })
        .pipe(tar.Pack())/* Convert the directory to a .tar file */
        .pipe(zlib.Gzip())/* Compress the .tar file */
        .pipe(response); // Write back to the response, or wherever else...

This solution is based on an answer on StackOverflow

저작자표시

Using Node.js to download files

Technical Note/Node.js 2015. 2. 3. 18:32

How to download files using Node.js

There are three approaches to writing a file downloader app using Node - i. HTTP.get, ii. curl, iii. wget. I have created functions for all of them. To get the examples working makes sure you have the dependencies and app variables intact. Read the comments thoroughly, you will not only learn how to download files, but will also learn more about Node's child_process, File System, Buffers, and Streams. Let's start with HTTP.get.

Downloading using HTTP.get

HTTP.get is Node's built-in mechanism for making HTTP GET requests, which can also be used for downloading files using the HTTP protocol. The advantage of using HTTP.get is that you don't rely on any external programs to download files.

// Dependencies
var fs = require('fs');
var url = require('url');
var http = require('http');
var exec = require('child_process').exec;
var spawn = require('child_process').spawn;

// App variables
var file_url = 'http://upload.wikimedia.org/wikipedia/commons/4/4f/Big%26Small_edit_1.jpg';
var DOWNLOAD_DIR = './downloads/';

// We will be downloading the files to a directory, so make sure it's there
// This step is not required if you have manually created the directory
var mkdir = 'mkdir -p ' + DOWNLOAD_DIR;
var child = exec(mkdir, function(err, stdout, stderr) {
    if (err) throw err;
    else download_file_httpget(file_url);
});

// Function to download file using HTTP.get
var download_file_httpget = function(file_url) {
var options = {
    host: url.parse(file_url).host,
    port: 80,
    path: url.parse(file_url).pathname
};

var file_name = url.parse(file_url).pathname.split('/').pop();
var file = fs.createWriteStream(DOWNLOAD_DIR + file_name);

http.get(options, function(res) {
    res.on('data', function(data) {
            file.write(data);
        }).on('end', function() {
            file.end();
            console.log(file_name + ' downloaded to ' + DOWNLOAD_DIR);
        });
    });
};

The above function is probably the best way to download files using HTTP.get in Node. Make a HTTP.get request and create a writable stream using fs.createWriteStream. Since the HTTP.get's response is a stream, it has the data event, which carries the chunks of data sent by the server. One each data event, write the data to the writeable stream. Once the server finishes sending data, close the instance of fs.createWriteStream. If you are trying to use fs.write orfs.writeFile or any of their variants, they will fail for medium to large files. Usefs.createWriteStream instead for reliable results.

Downloading using curl

To download files using curl in Node.js we will need to use Node's child_process module. We will be calling curl using child_process's spawn method. We are using spawn instead of execfor the sake of convenience - spawn returns a stream with data event and doesn't have buffer size issue unlike exec. That doesn't mean exec is inferior to spawn; in fact we will use exec to download files using wget.

// Function to download file using curl
var download_file_curl = function(file_url) {

    // extract the file name
    var file_name = url.parse(file_url).pathname.split('/').pop();
    // create an instance of writable stream
    var file = fs.createWriteStream(DOWNLOAD_DIR + file_name);
    // execute curl using child_process' spawn function
    var curl = spawn('curl', [file_url]);
    // add a 'data' event listener for the spawn instance
    curl.stdout.on('data', function(data) { file.write(data); });
    // add an 'end' event listener to close the writeable stream
    curl.stdout.on('end', function(data) {
        file.end();
        console.log(file_name + ' downloaded to ' + DOWNLOAD_DIR);
    });
    // when the spawn child process exits, check if there were any errors and close the writeable stream
    curl.on('exit', function(code) {
        if (code != 0) {
            console.log('Failed: ' + code);
        }
    });
};

The way data was written to the instance of fs.createWriteStream is similar to way we did for HTTP.get. The only difference is that the data and end events are listened on the stdout object ofspawn. Also we listen to spawn's exit event to make note of any errors.

Downloading using wget

Although it says downloading using wget, this example applies to downloading using curl with the -O option too. This method of downloading looks the most simple from coding point of view.

// Function to download file using wget
var download_file_wget = function(file_url) {

    // extract the file name
    var file_name = url.parse(file_url).pathname.split('/').pop();
    // compose the wget command
    var wget = 'wget -P ' + DOWNLOAD_DIR + ' ' + file_url;
    // excute wget using child_process' exec function

    var child = exec(wget, function(err, stdout, stderr) {
        if (err) throw err;
        else console.log(file_name + ' downloaded to ' + DOWNLOAD_DIR);
    });
};

In the above method, we used child_process's exec function to run wget. Why exec and notspawn? Because we just want wget to tell us if the work was done properly or not, we are not interested in buffers and streams. We are making wget do all the dirty work of making request, handling data, and saving the file for us. As you might have guessed, this method is the fastest among the three methods I described.

So now the question is - which method is the best? The answer - whatever suits your need. The wgetmethod is probably the best is you want to save the files to the local disk, but certainly not if you want to send those files as a response to a current client request; for something like that you would need to use a stream. All the three methods have multiple options, you choice will ultimately depend on what your needs are. Happy downloading!

Further Reading

Related to this post

저작자표시

node.js 에서 spawn 과 exec 의 차이

Technical Note/Node.js 2015. 2. 3. 18:15

http://www.hacksparrow.com/difference-between-spawn-and-exec-of-node-js-child_process.html

`spawn`과 `exec`가 하는 작업은 동일하지만,

`spawn`은 스트림(stream)을 리턴하고, `exec`는 버퍼(buffer)를 리턴한다.

`spawn`은 `stdout`과 `stderr` 스트림을 포함한 객체를 리턴한다.

자식 프로세스의 표준 출력을 `stdout` 객체로 받아 처리할 수 있다.

`stdout` 객체는 `data`와 `end` 등의 이벤트를 발생한다.

`spawn`은 자식 프로세스로부터 많은 양의 데이터를 받는 경우에 유용한다.

예) 이미지 프로세싱이나 바이너리 데이터를 읽어오는 등

`exec`는 자식 프로세스 버퍼의 아웃풋을 리턴한다.

버퍼의 사이즈는 기본값은 200k이다.

만약, 자식 프로세스가 버퍼 사이즈보다 더 큰 값을 리턴하면,

"Error: maxBuffer exceeded" 오류가 나면서 프로그램이 종료될 것이다.

버퍼 사이즈를 크게 늘리면 해결할 수는 있으나,

`exec`는 큰 사이즈의 버퍼 처리를 위한 것이 아니다.

이런 경우라면 `spawn`을 사용하는 게 적합하다.

`exec`는 데이터 대신 상태 정도의 작은 결과를 출력하는 프로그램을 실행하는 용도로 사용한다.

또 하나,

`spawn`은 비동기로 실행하고, 결과도 비동기로 받는다.

`exec`는 동기로 실행하고, 결과는 비동기로 받는다.

정리: 자식 프로세스로부터 큰 바이너리 데이터를 리턴받는 경우라면 `spawn`을,

간단한 상태 메시지만 받는 것이라면 `exec`를 쓴다.

상황에 따라 `spawn` 과 `exec`를 사용한 예

http://www.hacksparrow.com/using-node-js-to-download-files.html#nodejs-curl

저작자표시

cloe's story

cloe's story

'Technical Note/Node.js'에 해당되는 글 3건

ZIP a Folder in NodeJS

ZIP a Folder in NodeJS

Using Node.js to download files

How to download files using Node.js

node.js 에서 spawn 과 exec 의 차이

zzikjh

티스토리툴바