FILL Clause
FILL Clause
1. Function Introduction
During data queries, you may encounter scenarios where certain columns have missing data in some rows, resulting in NULL values in the result set. These NULL values can hinder data visualization and analysis. To address this, IoTDB provides the FILL clause to populate these NULL values.
- If the query includes an
ORDER BY
clause, the FILL clause is executed beforeORDER BY
. - If a
GAPFILL
(e.g.,date_bin_gapfill
function) operation exists, the FILL clause is executed afterGAPFILL
.
2. Syntax Overview
fillClause
: FILL METHOD fillMethod
;
fillMethod
: LINEAR timeColumnClause? fillGroupClause? #linearFill
| PREVIOUS timeBoundClause? timeColumnClause? fillGroupClause? #previousFill
| CONSTANT literalExpression #valueFill
;
timeColumnClause
: TIME_COLUMN INTEGER_VALUE
;
fillGroupClause
: FILL_GROUP INTEGER_VALUE (',' INTEGER_VALUE)*
;
timeBoundClause
: TIME_BOUND duration=timeDuration
;
timeDuration
: (INTEGER_VALUE intervalField)+
;
intervalField
: YEAR | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND | MILLISECOND | MICROSECOND | NANOSECOND
;
2.1 ### Filling Methods
IoTDB supports the following three methods to fill NULL values:
- PREVIOUS Fill: Uses the most recent non-NULL value from the same column to fill NULL values.
- LINEAR Fill: Applies linear interpolation using the nearest previous and next non-NULL values in the same column.
- CONSTANT Fill: Fills NULL values with a specified constant.
Only one filling method can be specified, and it applies to all columns in the result set.
2.2 Supported Data Types for Filling Methods
Data Type | Previous | Linear | Constant |
---|---|---|---|
boolean | √ | - | √ |
int32 | √ | √ | √ |
int64 | √ | √ | √ |
float | √ | √ | √ |
double | √ | √ | √ |
text | √ | - | √ |
string | √ | - | √ |
blob | √ | - | √ |
timestamp | √ | √ | √ |
date | √ | √ | √ |
Note: Columns with data types not supporting the specified filling method will remain unchanged without errors.
3. Sample Dataset
The Example Data pagepage provides SQL statements to construct table schemas and insert data. By downloading and executing these statements in the IoTDB CLI, you can import the data into IoTDB. This data can be used to test and run the example SQL queries included in this documentation, allowing you to reproduce the described results.
3.1 PREVIOUS Fill
PREVIOUS FILL
fills NULL values with the most recent non-NULL value in the same column.
3.1.1 Parameters
- TIME_BOUND (optional): Defines a forward-looking time threshold. If the time difference between the current NULL value and the previous non-NULL value exceeds this threshold, the value will not be filled. By default, the system uses the first
TIMESTAMP
column in the query result to determine the threshold.- Format: A time interval specified with integer values and units, e.g.,
1d1h
(1 day and 1 hour).
- Format: A time interval specified with integer values and units, e.g.,
- TIME_COLUMN (optional): Allows specifying the
TIMESTAMP
column used to determine the time threshold. The column is specified using its positional index (starting from 1) in the original table.
3.1.2 Examples
- Without FILL Clause:
SELECT time, temperature, status
FROM table1
WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00
AND plant_id='1001' and device_id='101';
Results:
+-----------------------------+-----------+------+
| time|temperature|status|
+-----------------------------+-----------+------+
|2024-11-27T16:38:00.000+08:00| null| true|
|2024-11-27T16:39:00.000+08:00| 85.0| null|
|2024-11-27T16:40:00.000+08:00| 85.0| null|
|2024-11-27T16:41:00.000+08:00| 85.0| null|
|2024-11-27T16:42:00.000+08:00| null| false|
|2024-11-27T16:43:00.000+08:00| null| false|
|2024-11-27T16:44:00.000+08:00| null| false|
+-----------------------------+-----------+------+
Total line number = 7
It costs 0.088s
- Using
PREVIOUS Fill
:
SELECT time, temperature, status
FROM table1
WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00
AND plant_id='1001' and device_id='101'
FILL METHOD PREVIOUS;
Results:
+-----------------------------+-----------+------+
| time|temperature|status|
+-----------------------------+-----------+------+
|2024-11-27T16:38:00.000+08:00| null| true|
|2024-11-27T16:39:00.000+08:00| 85.0| true|
|2024-11-27T16:40:00.000+08:00| 85.0| true|
|2024-11-27T16:41:00.000+08:00| 85.0| true|
|2024-11-27T16:42:00.000+08:00| 85.0| false|
|2024-11-27T16:43:00.000+08:00| 85.0| false|
|2024-11-27T16:44:00.000+08:00| 85.0| false|
+-----------------------------+-----------+------+
Total line number = 7
It costs 0.091s
- Using
PREVIOUS Fill
with a Specified Time Threshold:
# Do not specify a time column
SELECT time, temperature, status
FROM table1
WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00
AND plant_id='1001' and device_id='101'
FILL METHOD PREVIOUS TIME_BOUND 1m;
# Manually specify the time column
SELECT time, temperature, status
FROM table1
WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00
AND plant_id='1001' and device_id='101'
FILL METHOD PREVIOUS 1m TIME_COLUMN 1;
Results:
+-----------------------------+-----------+------+
| time|temperature|status|
+-----------------------------+-----------+------+
|2024-11-27T16:38:00.000+08:00| null| true|
|2024-11-27T16:39:00.000+08:00| 85.0| true|
|2024-11-27T16:40:00.000+08:00| 85.0| null|
|2024-11-27T16:41:00.000+08:00| 85.0| null|
|2024-11-27T16:42:00.000+08:00| 85.0| false|
|2024-11-27T16:43:00.000+08:00| null| false|
|2024-11-27T16:44:00.000+08:00| null| false|
+-----------------------------+-----------+------+
Total line number = 7
It costs 0.075s
3.2 LINEAR Fill
LINEAR Fill
fills NULL values using linear interpolation based on the nearest previous and next non-NULL values in the same column.
3.2.1 Linear Fill Rules
- If all previous or all subsequent values are NULL, no filling is performed.
- Columns with data types such as
boolean
,string
,blob
, ortext
are not filled, and no error is returned. - If no auxiliary time column is specified, the first
TIMESTAMP
-type column in theSELECT
clause is used by default for interpolation. If noTIMESTAMP
column exists, an error will be returned.
3.2.2 Parameters
- TIME_COLUMN (optional): Specifies the
TIMESTAMP
column to be used as an auxiliary column for linear interpolation. The column is identified by its positional index (starting from 1) in the original table.
Note: The auxiliary column used for linear interpolation is not required to be the time
column. However, the auxiliary column must be sorted in ascending or descending order for meaningful interpolation. If another column is specified, the user must ensure the result set is ordered correctly.
3.2.3 Examples
SELECT time, temperature, status
FROM table1
WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00
AND plant_id='1001' and device_id='101'
FILL METHOD LINEAR;
Result:
+-----------------------------+-----------+------+
| time|temperature|status|
+-----------------------------+-----------+------+
|2024-11-27T16:38:00.000+08:00| null| true|
|2024-11-27T16:39:00.000+08:00| 85.0| null|
|2024-11-27T16:40:00.000+08:00| 85.0| null|
|2024-11-27T16:41:00.000+08:00| 85.0| null|
|2024-11-27T16:42:00.000+08:00| null| false|
|2024-11-27T16:43:00.000+08:00| null| false|
|2024-11-27T16:44:00.000+08:00| null| false|
+-----------------------------+-----------+------+
Total line number = 7
It costs 0.053s
3.3 ### CONSTANT Fill
CONSTANT Fill
fills NULL values with a specified constant value.
3.3.1 Constant Fill Rules
- If the data type of the constant does not match the column's data type, IoTDB does not fill the result set and no error is returned.
- If the constant value exceeds the column's allowable range, IoTDB does not fill the result set and no error is returned.
3.3.2 Examples
- Using a
FLOAT
constant:
SELECT time, temperature, status
FROM table1
WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00
AND plant_id='1001' and device_id='101'
FILL METHOD CONSTANT 80.0;
Results:
+-----------------------------+-----------+------+
| time|temperature|status|
+-----------------------------+-----------+------+
|2024-11-27T16:38:00.000+08:00| 80.0| true|
|2024-11-27T16:39:00.000+08:00| 85.0| true|
|2024-11-27T16:40:00.000+08:00| 85.0| true|
|2024-11-27T16:41:00.000+08:00| 85.0| true|
|2024-11-27T16:42:00.000+08:00| 80.0| false|
|2024-11-27T16:43:00.000+08:00| 80.0| false|
|2024-11-27T16:44:00.000+08:00| 80.0| false|
+-----------------------------+-----------+------+
Total line number = 7
It costs 0.242s
W- Using a BOOLEAN
constant:
SELECT time, temperature, status
FROM table1
WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00
AND plant_id='1001' and device_id='101'
FILL METHOD CONSTANT true;
Results:
+-----------------------------+-----------+------+
| time|temperature|status|
+-----------------------------+-----------+------+
|2024-11-27T16:38:00.000+08:00| 1.0| true|
|2024-11-27T16:39:00.000+08:00| 85.0| true|
|2024-11-27T16:40:00.000+08:00| 85.0| true|
|2024-11-27T16:41:00.000+08:00| 85.0| true|
|2024-11-27T16:42:00.000+08:00| 1.0| false|
|2024-11-27T16:43:00.000+08:00| 1.0| false|
|2024-11-27T16:44:00.000+08:00| 1.0| false|
+-----------------------------+-----------+------+
Total line number = 7
It costs 0.073s
4. Advanced Usage
When using the PREVIOUS
or LINEAR
FILL methods, the FILL_GROUP
parameter allows filling within specific groups without being influenced by other groups.
Examples
- Filling Missing Values Within
device_id
The following query demonstrates how to fill missing values for each device_id
group independently, without using values from other devices:
SELECT date_bin(1h, time) AS hour_time, plant_id, device_id, avg(temperature) AS avg_temp
FROM table1
WHERE time >= 2024-11-28 08:00:00 AND time < 2024-11-30 14:30:00
group by 1, plant_id, device_id;
Results:
+-----------------------------+--------+---------+--------+
| hour_time|plant_id|device_id|avg_temp|
+-----------------------------+--------+---------+--------+
|2024-11-28T08:00:00.000+08:00| 3001| 100| 85.0|
|2024-11-28T09:00:00.000+08:00| 3001| 100| null|
|2024-11-28T10:00:00.000+08:00| 3001| 100| 85.0|
|2024-11-28T11:00:00.000+08:00| 3001| 100| 88.0|
|2024-11-29T10:00:00.000+08:00| 3001| 101| 85.0|
|2024-11-29T11:00:00.000+08:00| 3002| 100| null|
|2024-11-29T18:00:00.000+08:00| 3002| 100| 90.0|
|2024-11-30T09:00:00.000+08:00| 3002| 101| 90.0|
+-----------------------------+--------+---------+--------+
Total line number = 8
It costs 0.110s
- Without Specifying
FILL_GROUP
If the FILL_GROUP
parameter is not specified, missing values in device_id = 100
will be filled using values from device_id = 101
:
SELECT date_bin(1h, time) AS hour_time, plant_id, device_id, avg(temperature) AS avg_temp
FROM table1
WHERE time >= 2024-11-28 08:00:00 AND time < 2024-11-30 14:30:00
group by 1, plant_id, device_id
FILL METHOD PREVIOUS;
Results:
+-----------------------------+--------+---------+--------+
| hour_time|plant_id|device_id|avg_temp|
+-----------------------------+--------+---------+--------+
|2024-11-28T08:00:00.000+08:00| 3001| 100| 85.0|
|2024-11-28T09:00:00.000+08:00| 3001| 100| 85.0|
|2024-11-28T10:00:00.000+08:00| 3001| 100| 85.0|
|2024-11-28T11:00:00.000+08:00| 3001| 100| 88.0|
|2024-11-29T10:00:00.000+08:00| 3001| 101| 85.0|
|2024-11-29T11:00:00.000+08:00| 3002| 100| 85.0|
|2024-11-29T18:00:00.000+08:00| 3002| 100| 90.0|
|2024-11-30T09:00:00.000+08:00| 3002| 101| 90.0|
+-----------------------------+--------+---------+--------+
Total line number = 8
It costs 0.066s
- Specifying
FILL_GROUP
for Grouped Filling
By specifying FILL_GROUP 2
, the filling is restricted to groups based on the second column (device_id
). As a result, missing values in device_id = 100
will not be filled using values from device_id = 101
:
SELECT date_bin(1h, time) AS hour_time, plant_id, device_id, avg(temperature) AS avg_temp
FROM table1
WHERE time >= 2024-11-28 08:00:00 AND time < 2024-11-30 14:30:00
group by 1, plant_id, device_id
FILL METHOD PREVIOUS FILL_GROUP 2;
Results:
+-----------------------------+--------+---------+--------+
| hour_time|plant_id|device_id|avg_temp|
+-----------------------------+--------+---------+--------+
|2024-11-28T08:00:00.000+08:00| 3001| 100| 85.0|
|2024-11-28T09:00:00.000+08:00| 3001| 100| 85.0|
|2024-11-28T10:00:00.000+08:00| 3001| 100| 85.0|
|2024-11-28T11:00:00.000+08:00| 3001| 100| 88.0|
|2024-11-29T10:00:00.000+08:00| 3001| 101| 85.0|
|2024-11-29T11:00:00.000+08:00| 3002| 100| null|
|2024-11-29T18:00:00.000+08:00| 3002| 100| 90.0|
|2024-11-30T09:00:00.000+08:00| 3002| 101| 90.0|
+-----------------------------+--------+---------+--------+
Total line number = 8
It costs 0.089s
5. Special Notes
When using LINEAR
or PREVIOUS
FILL methods, if the auxiliary time column (used to determine filling logic) contains NULL values, IoTDB follows these rules:
- Rows with NULL values in the auxiliary column will not be filled.
- These rows are also excluded from the filling logic calculations.
Example of PREVIOUS Fill
- Query original data:
SELECT time, plant_id, device_id, humidity, arrival_time
FROM table1
WHERE time >= 2024-11-26 16:37:00 and time <= 2024-11-28 08:00:00
AND plant_id='1001' and device_id='101';
Results:
+-----------------------------+--------+---------+--------+-----------------------------+
| time|plant_id|device_id|humidity| arrival_time|
+-----------------------------+--------+---------+--------+-----------------------------+
|2024-11-27T16:38:00.000+08:00| 1001| 101| 35.1|2024-11-27T16:37:01.000+08:00|
|2024-11-27T16:39:00.000+08:00| 1001| 101| 35.3| null|
|2024-11-27T16:40:00.000+08:00| 1001| 101| null|2024-11-27T16:37:03.000+08:00|
|2024-11-27T16:41:00.000+08:00| 1001| 101| null|2024-11-27T16:37:04.000+08:00|
|2024-11-27T16:42:00.000+08:00| 1001| 101| 35.2| null|
|2024-11-27T16:43:00.000+08:00| 1001| 101| null| null|
|2024-11-27T16:44:00.000+08:00| 1001| 101| null|2024-11-27T16:37:08.000+08:00|
+-----------------------------+--------+---------+--------+-----------------------------+
Total line number = 7
It costs 0.119s
- Using
arrival_time
as the auxiliary column with a time interval (TIME_BOUND
) of 2 seconds
SELECT time, plant_id, device_id, humidity, arrival_time
FROM table1
WHERE time >= 2024-11-26 16:37:00 and time <= 2024-11-28 08:00:00
AND plant_id='1001' and device_id='101'
FILL METHOD PREVIOUS TIME_BOUND 2s TIME_COLUMN 5;
Results:
+-----------------------------+--------+---------+--------+-----------------------------+
| time|plant_id|device_id|humidity| arrival_time|
+-----------------------------+--------+---------+--------+-----------------------------+
|2024-11-27T16:38:00.000+08:00| 1001| 101| 35.1|2024-11-27T16:37:01.000+08:00|
|2024-11-27T16:39:00.000+08:00| 1001| 101| 35.3| null|
|2024-11-27T16:40:00.000+08:00| 1001| 101| 35.1|2024-11-27T16:37:03.000+08:00|
|2024-11-27T16:41:00.000+08:00| 1001| 101| null|2024-11-27T16:37:04.000+08:00|
|2024-11-27T16:42:00.000+08:00| 1001| 101| 35.2| null|
|2024-11-27T16:43:00.000+08:00| 1001| 101| null| null|
|2024-11-27T16:44:00.000+08:00| 1001| 101| null|2024-11-27T16:37:08.000+08:00|
+-----------------------------+--------+---------+--------+-----------------------------+
Total line number = 7
It costs 0.049s
Filling Details
- For
humidity
at16:39
,16:42
, and16:43
:- Since the auxiliary column
arrival_time
is NULL, no filling is performed.
- Since the auxiliary column
- For
humidity
at16:40
:- The auxiliary column
arrival_time
is not NULL and has a value of1970-01-01T08:00:00.003+08:00
. - The time difference from the previous non-NULL value (
1970-01-01T08:00:00.001+08:00
) is less than 2 seconds (TIME_BOUND
) - So the value
35.1
from the first row is used for filling.
- The auxiliary column
- For
humidity
at16:41
:- Although the auxiliary column
arrival_time
is not NULL, the time difference from the previous non-NULL value exceeds 2 seconds, so no filling is performed.
- Although the auxiliary column
- For
humidity
at16:44
:- Similarly, the time difference exceeds 2 seconds, so no filling is performed.