Improving MySQL query performance
Performance for the charts pages was getting worse and worse the more data was collected from all the sensors connected to the Raspberry Pi. It was not only taken a lot of time to get any results, but also CPU usage by the MySQL process jumped to around 90%.
High time to do some further optimization.
When creating the initial MySQL database I had added some indexes, but I had forgotten to add an index on the timestamp
-column. The queries for fetching the data for the charts filter by sensor_id
and timestamp
. The following MySQL command corrects this mistake.
create index sd_timestamp_idx on sensor_data(timestamp);
To check the difference between the two situations the following query was executed before and after adding the new index. Also explain
is used to get information about how MySQL server executes the query.
select d.timestamp
, d.value
from sensor_data d
where d.sensor_id = 1
and timestampdiff( HOUR, d.timestamp, now() ) <= 1
order by d.timestamp;
Running the query without the index to 2.01 seconds and with index 1.74 seconds (returning 124 rows). So there is some improvement, although only marginally. The output of explain
for both situations is displayed below. Detailed information on what is returned by explain
and how to read it can be found here.
Without index
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | d | ref | ts_sensorid_idx | ts_sensorid_idx | 4 | const | 41018 | Using where; Using filesort |
With index
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | d | ref | ts_sensorid_idx | ts_sensorid_idx | 4 | const | 41018 | Using where; Using filesort |
Note that the newly created index is not used and a filesort
is performed. The reason for the index on the timestamp
-column not being is because the column is used within the timestampdiff
function.
First thing to tackle is rewriting the query. Functionally the query should return all sensor readings during the past n hours for one particular sensor. The original query calculated the difference between the timestamp of the data-row and the and current timestamp. This was causing MySQL the calculate it for all rows stored in the table.
The optimized query below calculates what the timestamp an hour ago was and compares that with the timestamps stored in the table. That way we only perform one calculation and since there is no expression surrounding the stored timestamp, the index can now be used.
select d.timestamp
, d.value
from sensor_data d
where d.sensor_id = 1
and d.timestamp >= date_sub( NOW(), interval 1 hour )
order by d.timestamp;
The explain
for the new query is as follows. We can see that the index on timestamp is used. Also note that only 191 rows are being ‘touched’ to determine the final results of the query, while the original query was touching all rows (41018). Additionally filesort
is no longer performed.
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | d | range | sd_timestamp_idx, ts_sensorid_idx | sd_timestamp_idx | 4 | NULL | 191 | Using where |
Executing the query returns in 0.02 seconds (and CPU usage stay low as well for the MySQL process). This is a result I can live with…