From e67fee0fb54ddbc8685081f61ca92224fd8225f5 Mon Sep 17 00:00:00 2001 From: Jeff Harris Date: Fri, 8 Apr 2016 11:11:58 -0700 Subject: [PATCH] doc: add topic - event loop, timers, `nextTick()` Adds a new topic that provides an overview of the event loop, timers, and `process.nextTick()` that is based upon a NodeSource "Need to Node" presentation hosted by @trevnorris: Event Scheduling and the Node.js Event Loop (https://nodesource.com/resources). PR-URL: #4936 Reviewed-By: James M Snell Reviewed-By: Calvin W. Metcalf Reviewed-By: Matteo Collina --- .../the-event-loop-timers-and-nexttick.md | 467 ++++++++++++++++++ 1 file changed, 467 insertions(+) create mode 100644 doc/topics/the-event-loop-timers-and-nexttick.md diff --git a/doc/topics/the-event-loop-timers-and-nexttick.md b/doc/topics/the-event-loop-timers-and-nexttick.md new file mode 100644 index 00000000000..fe58298d320 --- /dev/null +++ b/doc/topics/the-event-loop-timers-and-nexttick.md @@ -0,0 +1,467 @@ +# The Node.js Event Loop, Timers, and `process.nextTick()` + +## What is the Event Loop? + +The event loop is what allows Node.js to perform non-blocking I/O +operations — despite the fact that JavaScript is single-threaded — by +offloading operations to the system kernel whenever possible. + +Since most modern kernels are multi-threaded, they can handle multiple +operations executing in the background. When one of these operations +completes, the kernel tells Node.js so that the appropriate callback +may added to the `poll` queue to eventually be executed. We'll explain +this in further detail later in this topic. + +## Event Loop Explained + +When Node.js starts, it initializes the event loop, processes the +provided input script (or drops into the REPL, which is not covered in +this document) which may make async API calls, schedule timers, or call +`process.nextTick()`, then begins processing the event loop. + +The following diagram shows a simplified overview of the event loop's +order of operations. + + ┌───────────────────────┐ + ┌─>│ timers │ + │ └──────────┬────────────┘ + │ ┌──────────┴────────────┐ + │ │ I/O callbacks │ + │ └──────────┬────────────┘ + │ ┌──────────┴────────────┐ + │ │ idle, prepare │ + │ └──────────┬────────────┘ ┌───────────────┐ + │ ┌──────────┴────────────┐ │ incoming: │ + │ │ poll │<─────┤ connections, │ + │ └──────────┬────────────┘ │ data, etc. │ + │ ┌──────────┴────────────┐ └───────────────┘ + │ │ check │ + │ └──────────┬────────────┘ + │ ┌──────────┴────────────┐ + └──┤ close callbacks │ + └───────────────────────┘ + +*note: each box will be referred to as a "phase" of the event loop.* + +Each phase has a FIFO queue of callbacks to execute. While each phase is +special in its own way, generally, when the event loop enters a given +phase, it will perform any operations specific to that phase, then +execute callbacks in that phase's queue until the queue has been +exhausted or the maximum number of callbacks have executed. When the +queue has been exhausted or the callback limit is reached, the event +loop will move to the next phase, and so on. + +Since any of these operations may schedule _more_ operations and new +events processed in the `poll` phase are queued by the kernel, poll +events can be queued while polling events are being processed. As a +result, long running callbacks can allow the poll phase to run much +longer than a timer's threshold. See the [`timers`](#timers) and +[`poll`](#poll) sections for more details. + +_**NOTE:** There is a slight discrepancy between the Windows and the +Unix/Linux implementation, but that's not important for this +demonstration. The most important parts are here. There are actually +seven or eight steps, but the ones we care about — ones that Node.js +actually uses are those above._ + + +## Phases Overview: + +* `timers`: this phase executes callbacks scheduled by `setTimeout()` + and `setInterval()`. +* `I/O callbacks`: most types of callback except timers, setImmedate, close +* `idle, prepare`: only used internally +* `poll`: retrieve new I/O events; node will block here when appropriate +* `check`: setImmediate callbacks are invoked here +* `close callbacks`: e.g socket.on('close', ...) + +Between each run of the event loop, Node.js checks if it is waiting for +any asynchronous I/O or timer and it shuts down cleanly if there are not +any. + +## Phases in Detail + +### timers + +A timer specifies the **threshold** _after which_ a provided callback +_may be executed_ rather than the **exact** time a person _wants it to +be executed_. Timers callbacks will run as early as they can be +scheduled after the specified amount of time has passed; however, +Operating System scheduling or the running of other callbacks may delay +them. + +_**Note**: Technically, the [`poll` phase](#poll) controls when timers +are executed._ + +For example, say you schedule a timeout to execute after a 100 ms +threshold, then your script starts asynchronously reading a file which +takes 95 ms: + +```js + +var fs = require('fs'); + +function someAsyncOperation (callback) { + + // let's assume this takes 95ms to complete + fs.readFile('/path/to/file', callback); + +} + +var timeoutScheduled = Date.now(); + +setTimeout(function () { + + var delay = Date.now() - timeoutScheduled; + + console.log(delay + "ms have passed since I was scheduled"); +}, 100); + + +// do someAsyncOperation which takes 95 ms to complete +someAsyncOperation(function () { + + var startCallback = Date.now(); + + // do something that will take 10ms... + while (Date.now() - startCallback < 10) { + ; // do nothing + } + +}); +``` + +When the event loop enters the `poll` phase, it has an empty queue +(`fs.readFile()` has not completed) so it will wait for the number of ms +remaining until the soonest timer's threshold is reached. While it is +waiting 95 ms pass, `fs.readFile()` finishes reading the file and its +callback which takes 10 ms to complete is added to the `poll` queue and +executed. When the callback finishes, there are no more callbacks in the +queue, so the event loop will see that the threshold of the soonest +timer has been reached then wrap back to the `timers` phase to execute +the timer's callback. In this example, you will see that the total delay +between the timer being scheduled and its callback being executed will +be 105ms. + +Note: To prevent the `poll` phase from starving the event loop, libuv +also has a hard maximum (system dependent) before it stops `poll`ing for +more events. + +### I/O callbacks: + +This phase executes callbacks for some system operations such as types +of TCP errors. For example if a TCP socket receives `ECONNREFUSED` when +attempting to connect, some \*nix systems want to wait to report the +error. This will be queued to execute in the `I/O callbacks` phase. + +### poll: + +The poll phase has two main functions: + +1. Executing scripts for timers who's threshold has elapsed, then +2. Processing events in the `poll` queue. + + +When the event loop enters the `poll` phase _and there are no timers +scheduled_, one of two things will happen: + +* _If the `poll` queue **is not empty**_, the event loop will iterate +through its queue of callbacks executing them synchronously until +either the queue has been exhausted, or the system-dependent hard limit +is reached. + +* _If the `poll` queue is **empty**, one of two more things will +happen: + * If scripts have been scheduled by `setImmediate()`, the event loop + will end the `poll` phase and continue to the `check` phase to + execute those scheduled scripts. + + * If scripts **have not** been scheduled by `setImmediate()`, the + event loop will wait for callbacks to be added to the queue, then + execute it immediately. + +Once the `poll` queue is empty the event loop will check for timers +_whose time thresholds have been reached_. If one or more timers are +ready, the event loop will wrap back to the timers phase to execute +those timers' callbacks. + +### `check`: + +This phase allows a person to execute callbacks immediately after the +`poll` phase has completed. If the `poll` phase becomes idle and +scripts have been queued with `setImmediate()`, the event loop may +continue to the `check` phase rather than waiting. + +`setImmediate()` is actually a special timer that runs in a separate +phase of the event loop. It uses a libuv API that schedules callbacks to +execute after the `poll` phase has completed. + +Generally, as the code is executed, the event loop will eventually hit +the `poll` phase where it will wait for an incoming connection, request, +etc. However, after a callback has been scheduled with `setImmediate()`, +then the `poll` phase becomes idle, it will end and continue to the +`check` phase rather than waiting for `poll` events. + +### `close callbacks`: + +If a socket or handle is closed abruptly (e.g. `socket.destroy()`), the +`'close'` event will be emitted in this phase. Otherwise it will be +emitted via `process.nextTick()`. + +## `setImmediate()` vs `setTimeout()` + +`setImmediate` and `setTimeout()` are similar, but behave in different +ways depending on when they are called. + +* `setImmediate()` is designed to execute a script once the current +`poll` phase completes. +* `setTimeout()` schedules a script to be run +after a minimum threshold in ms has elapsed. + +The order in which they are execute varies depending on the context in +which they are called. If both are called in the main module then you +are bound to how fast your process go, which is impacted by other +programs running on your machine. + +For example, if we run the following script which is not within a I/O +cycle (i.e. the main module), the order in which the two functions are +executed is non-deterministic as it is based upon how fast your process +goes (which is impacted by other programs running on your machine): + + +```js +// timeout_vs_immediate.js +setTimeout(function timeout () { + console.log('timeout'); +},0); + +setImmediate(function immediate () { + console.log('immediate'); +}); +``` + + $ node timeout_vs_immediate.js + timeout + immediate + + $ node timeout_vs_immediate.js + immediate + timeout + + +However, if you move the two calls within an I/O cycle, the immediate +callback is always executed first: + +```js +// timeout_vs_immediate.js +var fs = require('fs') + +fs.readFile(__filename, () => { + setTimeout(() => { + console.log('timeout') + }, 0) + setImmediate(() => { + console.log('immediate') + }) +}) +``` + + $ node timeout_vs_immediate.js + immediate + timeout + + $ node timeout_vs_immediate.js + immediate + timeout + +The main advantage to using `setImmediate()` over `setTimeout()` is +`setImmediate()` will always be executed before any timers if scheduled +within an I/O cycle, independently of how many timers are present. + +## `process.nextTick()`: + +### Understanding `process.nextTick()` + +You may have noticed that `process.nextTick()` was not displayed in the +diagram, even though its a part of the asynchronous API. This is because +`process.nextTick()` is not technically part of the event loop. Instead, +the nextTickQueue will be processed after the current operation +completes, regardless of the current `phase` of the event loop. + +Looking back at our diagram, any time you call `process.nextTick()` in a +given phase, all callbacks passed to `process.nextTick()` will be +resolved before the event loop continues. This can create some bad +situations because **it allows you to "starve" your I/O by making +recursive `process.nextTick()` calls.** which prevents the event loop +from reaching the `poll` phase. + +### Why would that be allowed? + +Why would something like this be included in Node.js? Part of it is a +design philosophy where an API should always be asynchronous even where +it doesn't have to be. Take this code snippet for example: + +```js +function apiCall (arg, callback) { + if (typeof arg !== 'string') + return process.nextTick(callback, + new TypeError('argument should be string')); +} +``` + +The snippet does an argument check and if it's not correct, it will pass +the error to the callback. The API updated fairly recently to allow +passing arguments to `process.nextTick()` allowing it to take any +arguments passed after the callback to be propagated as the arguments to +the callback so you don't have to nest functions. + +What we're doing is passing an error back to the user but only *after* +we have allowed the rest of the user's code to execute. By using +`process.nextTick()` we guarantee that `apiCall()` always runs its +callback *after* the rest of the user's code and *before* the event loop +is allowed to proceed. To acheive this, the JS call stack is allowed to +unwind then immediately execute the provided callback which allows a +person to make recursive calls to nextTick without reaching a +`RangeError: Maximum call stack size exceeded from v8`. + +This philosophy can lead to some potentially problematic situations. +Take this snippet for example: + +```js +// this has an asynchronous signature, but calls callback synchronously +function someAsyncApiCall (callback) { callback(); }; + +// the callback is called before `someAsyncApiCall` completes. +someAsyncApiCall(() => { + + // since someAsyncApiCall has completed, bar hasn't been assigned any value + console.log('bar', bar); // undefined + +}); + +var bar = 1; +``` + +The user defines `someAsyncApiCall()` to have an asynchronous signature, +actually operates synchronously. When it is called, the callback +provided to `someAsyncApiCall ()` is called in the same phase of the +event loop because `someAsyncApiCall()` doesn't actually do anything +asynchronously. As a result, the callback tries to reference `bar` but +it may not have that variable in scope yet because the script has not +been able to run to completion. + +By placing it in a `process.nextTick()`, the script still has the +ability to run to completion, allowing all the variables, functions, +etc., to be initialized prior to the callback being called. It also has +the advantage of not allowing the event loop to continue. It may be +useful that the user be alerted to an error before the event loop is +allowed to continue. + +A real world example in node would be: + +```js +const server = net.createServer(() => {}).listen(8080); + +server.on('listening', () => {}); +``` + +When only a port is passed the port is bound immediately. So the +`'listening'` callback could be called immediately. Problem is that the +`.on('listening')` will not have been set by that time. + +To get around this the `'listening'` event is queued in a `nextTick()` +to allow the script to run to completion. Which allows the user to set +any event handlers they want. + +## `process.nextTick()` vs `setImmediate()` + +We have two calls that are similar as far as users are concerned, but +their names are confusing. + +* `process.nextTick()` fires immediately on the same phase +* `setImmediate()` fires on the following iteration or 'tick' of the +event loop + +In essence, the names should be swapped. `process.nextTick()` fires more +immediately than `setImmediate()` but this is an artifact of the past +which is unlikely to change. Making this switch would break a large +percentage of the packages on npm. Every day more new modules are being +added, which mean every day we wait, more potential breakages occur. +While they are confusing, the names themselves won't change. + +*We recommend developers use `setImmediate()` in all cases because its +easier to reason about (and it leads to code that's compatible with a +wider variety of environments, like browser JS.)* + +## Why use `process.nextTick()`? + +There are two main reasons: + +1. Allow users to handle errors, cleanup any then unneeded resources, or +perhaps try the request again before the event loop continues. + +2. At times it's necessary to allow a callback to run after the call +stack has unwound but before the event loop continues. + +One example is to match the user's expectations. Simple example: + +```js +var server = net.createServer(); +server.on('connection', function(conn) { }); + +server.listen(8080); +server.on('listening', function() { }); +``` + +Say that listen() is run at the beginning of the event loop, but the +listening callback is placed in a `setImmediate()`. Now, unless a +hostname is passed binding to the port will happen immediately. Now for +the event loop to proceed it must hit the `poll` phase, which means +there is a non-zero chance that a connection could have been received +allowing the connection event to be fired before the listening event. + +Another example is running a function constructor that was to, say, +inherit from `EventEmitter` and it wanted to call an event within the +constructor: + +```js +const EventEmitter = require('events'); +const util = require('util'); + +function MyEmitter() { + EventEmitter.call(this); + this.emit('event'); +} +util.inherits(MyEmitter, EventEmitter); + +const myEmitter = new MyEmitter(); +myEmitter.on('event', function() { + console.log('an event occurred!'); +}); +``` + +You can't emit an event from the constructor immediately +because the script will not have processed to the point where the user +assigns a callback to that event. So, within the constructor itself, +you can use `process.nextTick()` to set a callback to emit the event +after the constructor has finished, which provides the expected results: + +```js +const EventEmitter = require('events'); +const util = require('util'); + +function MyEmitter() { + EventEmitter.call(this); + + // use nextTick to emit the event once a handler is assigned + process.nextTick(function () { + this.emit('event'); + }.bind(this)); +} +util.inherits(MyEmitter, EventEmitter); + +const myEmitter = new MyEmitter(); +myEmitter.on('event', function() { + console.log('an event occurred!'); +}); +```