NodeJS

Processing data: buffers, events, streams

JavaScript and NodeJS Fundamentals

Processing data: buffers, events, streams

Buffers
Events
What are streams?
Readable streams
Writable streams
Piping streams
Duplex streams
Transform streams
Backpressure

In this chapter we'll introduce primarily three classes of objects:

Buffer, which represents small chunks of binary data
EventEmitter, which can be used to track somestep by asynchronous process by emitting signals called "events"
Stream, which allows us to process big portion of data one Buffer at the time, and which tracks the process by emitting events

These are extremely common in professional NodeJS code, so even if you might not use them in your first projects, it's good to get a basic understanding for when you'll need to interact with them.

Buffers

In NodeJS, a buffer is a type of object used to work with binary data.

You can think of a buffer as a fixed-size container for raw bytes.

Here’s how to create a buffer from a string:

const buf = Buffer.from("hello")
console.log(buf)

This prints something like:

<Buffer 68 65 6c 6c 6f>

Those numbers (68, 65, 6c, etc.) are hexadecimal representations of the letters in "hello".

You can convert it back to a string like this:

console.log(buf.toString())

This prints:

hello

You can also create a buffer of a certain size filled with zeros:

const buf = Buffer.alloc(10)
console.log(buf)

This prints something like:

<Buffer 00 00 00 00 00 00 00 00 00 00>

You can write into the buffer:

buf.write("abc")
console.log(buf)

And you can access individual bytes:

console.log(buf[0]) // prints the ASCII number for 'a', which is 97

Buffers are especially useful when you need to manipulate data at a very low level.

Events

In JavaScript, an event is something that happens in your program that you can react to.

For example:

a file finishes loading
a timer goes off
a user clicks a button
a network request returns data

An event is just a signal that something happened, and you can write code to listen for those events and react to them.

In NodeJS, many objects can emit events. These objects are called EventEmitters.

Here’s an example:

const EventEmitter = require("events")

const emitter = new EventEmitter()

// Listen for an event
emitter.on("greet", () => {
  console.log("Hello! An event happened.") // this will get printed when a "greet" event gets fired
})

// Emit the event
emitter.emit("greet")

This prints:

Hello! An event happened.

Here’s what:

We create an EventEmitter object.
We tell it to run a callback whenever the "greet" event happens, using .on("greet").
Later, we trigger the "greet" event using .emit().
Our callback gets executed

You can pass data along with the event:

emitter.on("greet", 
  (name) => console.log(`Hello, ${name}!`)
)

emitter.emit("greet", "Alice") // first argument is the type of event, second argument is the data we pass with this event

This prints:

Hello, Alice!

You can register listeners for other events too:

emitter.on("goodbye", () => {
  console.log("Goodbye!")
})

emitter.emit("goodbye")

You can attach as many listeners as you like to a type of event, and you can fire many different types of event from the same emitter.

Many objects in NodeJS emit events to tell the rest of the program that something is happening.

What are streams?

Streams combine buffers and events to help us process data.

When we work with files, data from the network, or even long text, we don’t always need (or want) to load everything into memory all at once. That could be slow, or even crash the program if the data is too big.

Instead, we can process the data little by little, as it arrives or is read, kind of like drinking water through a straw instead of trying to drink the whole glass at once. This is called a stream.

In NodeJS, a stream is an object that lets you read data from a source or write data to a destination one piece at a time.

NodeJS has four main types of streams:

Readable: streams you can read data from (like reading a file) Writable: streams you can write data to (like writing to a file) Duplex: streams that are both readable and writable Transform: like duplex streams, but they can change (transform) the data as it flows

Readable streams

Let's say you have a bigfile.txt to process. You can create a readable stream with the fs module to read the file piece by piece.

const fs = require("fs")

const readableStream = fs.createReadStream(
  "bigfile.txt"
)

readableStream.on("data", (chunk) => {
  console.log("Received chunk:", chunk)
})

readableStream.on("end", () => {
  console.log("Finished reading file.")
})

readableStream.on("error", (err) => {
  console.error("Error reading file:", err)
})

What happens here?

fs.createReadStream() creates a readable stream.
Whenever a piece of the file is ready, the stream emits a data event and gives us a "chunk" of data (a Buffer). We print the chunk.
When the whole file has been read, the end event is triggered.
If there’s an error (like the file doesn’t exist), the error event is triggered.

This way, you can read giant files without loading them all into memory at once.

If we want the data to arrive in a human-readable form (instead of binary), we can specify the encoding of the stream:

const fs = require("fs")

const readableStream = fs.createReadStream(
  "bigfile.txt",
  { encoding: "utf8" } // we tell NodeJS that the file should be read as utf8
)

readableStream.on("data", (chunk) => {
  console.log("Received chunk:", chunk)
})

readableStream.on("end", () => {
  console.log("Finished reading file.")
})

readableStream.on("error", (err) => {
  console.error("Error reading file:", err)
})

The code will now print the file in human-readable form.

Writable streams

A writable stream lets you send data somewhere, chunk by chunk.

Here’s an example of writing to a target.txt file using a stream:

const fs = require("fs");

const stream = fs.createWriteStream("target.txt");

stream.on("error", (err) => {
  console.error("Error:", err);
});

stream.on("finish", () => {
  console.log("All data written.");
});

stream.write("First line\n");
stream.write("Second line\n");
stream.end("Finished writing\n");

Here’s what happens:

fs.createWriteStream() creates a writable stream.
We register handlers for the error and finish events.
We write some text to it using .write().
When we’re done, we call .end() to close the stream.
Once all buffered data has been flushed and written, the finish event is emitted. If something goes wrong, an error event is emitted.

Just like readable streams, writable streams are good for big data because they don’t need to keep everything in memory at once.

Piping streams

One of the coolest things about streams is that you can pipe them together: connect a readable stream directly to a writable stream.

const fs = require("fs")

const readable = fs.createReadStream("bigfile.txt")
const writable = fs.createWriteStream("target.txt")

readable.pipe(writable)

Here:

The readable stream reads from bigfile.txt.
The writable stream writes to copy.txt.
.pipe() sends the data directly from the readable to the writable stream.

Duplex streams

A duplex stream is both readable and writable. One example is a network socket: you can send data to it and receive data from it.

Here’s a very simple example using the net module:

const net = require("net")

const server = net.createServer((socket) => {
  socket.write("Welcome!\n")

  socket.on("data", (chunk) => {
    console.log("Received:", 
      chunk.toString()  // we convert the chunk of data from Buffer to string
    )
  })
})

server.listen(3000, () => {
  console.log("Server listening on port 3000")
})

In this example:

The socket object is a duplex stream.
You can write() to it and also listen for data events from it.

Transform streams

A transform stream is a duplex stream that also modifies the data that passes through it.

For example, you can use the built-in zlib module to compress or decompress data.

Here’s how to compress a file using a transform stream:

const fs = require("fs")
const zlib = require("zlib")

const readable = fs.createReadStream("bigfile.txt")     // create a readable stream that reads from a file
const zip = zlib.createGzip()                           // create a transform stream that compresses data
const writable = fs.createWriteStream("bigfile.txt.gz") // create a writable stream that writes to a file

readable          // take the readable stream
  .pipe(zip)      // pipe it into the transform stream to compress the data
  .pipe(writable) // then pipe it into the writable stream that saves the data to a zipped file

writable.on("finish", () => {
  console.log("File compressed.")
})

And to decompress it back:

const readable = fs.createReadStream("bigfile.txt.gz")
const unzip = zlib.createGunzip()
const writable = fs.createWriteStream("bigfile.txt")

readable.pipe(unzip).pipe(writable)

writable.on("finish", () => {
  console.log("File decompressed.")
})

Transform streams are very useful for tasks like compression, encryption, or changing file formats while streaming.

Backpressure

Sometimes a writable stream is slow at handling data. If we keep pushing data to a writable faster than it can handle, we might run into problems. This is called backpressure.

If you call the .write() method on a writable stream, it returns a boolean that informs you if the stream needs a pause; you might have to check its return value, like this:

const fs = require("fs")

const readable = fs.createReadStream("example.txt")
const writable = fs.createWriteStream("copy.txt")

readable.on("data", chunk => {               // each chunk of data we read from the readable stream...

  const canContinue = writable.write(chunk)  // ...we send it to the writable, which returns us a boolean to confirm we can continue

  if (!canContinue) { readable.pause() }     // ...if we can't, we temporarily pause reading 
})

writable.on("drain",                // the writable stream emits a "drain" event when the backpressure is gone

   () => { readable.resume() }      // so we resume reading (and writing)

)

This was an illustrative example of manually piping data from a Readable to a Writable, and managing backpressure manually.

Usually we would pipe data using the .pipe() method, which handles backpressure automatically.

So you only need to worry about backpressure when for some reason you're manually calling .write().