Aug 13 2013 0

Improving MySQL query performance

Performance for the charts pages was getting worse and worse the more data was collected from all the sensors connected to the Raspberry Pi. It was not only taken a lot of time to get any results, but also CPU usage by the MySQL process jumped to around 90%.

High time to do some further optimization.

When creating the initial MySQL database I had added some indexes, but I had forgotten to add an index on the timestamp-column. The queries for fetching the data for the charts filter by sensor_id and timestamp. The following MySQL command corrects this mistake.

create index sd_timestamp_idx on sensor_data(timestamp);

To check the difference between the two situations the following query was executed before and after adding the new index. Also explain is used to get information about how MySQL server executes the query.

select d.timestamp
,      d.value
from   sensor_data d
where  d.sensor_id = 1
and    timestampdiff( HOUR, d.timestamp, now() ) <= 1
order by d.timestamp;

Running the query without the index to 2.01 seconds and with index 1.74 seconds (returning 124 rows). So there is some improvement, although only marginally. The output of explain for both situations is displayed below. Detailed information on what is returned by explain and how to read it can be found here.

Without index

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE d ref ts_sensorid_idx ts_sensorid_idx 4 const 41018 Using where; Using filesort

With index

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE d ref ts_sensorid_idx ts_sensorid_idx 4 const 41018 Using where; Using filesort

Note that the newly created index is not used and a filesort is performed. The reason for the index on the timestamp-column not being is because the column is used within the timestampdiff function.

First thing to tackle is rewriting the query. Functionally the query should return all sensor readings during the past n hours for one particular sensor. The original query calculated the difference between the timestamp of the data-row and the and current timestamp. This was causing MySQL the calculate it for all rows stored in the table.

The optimized query below calculates what the timestamp an hour ago was and compares that with the timestamps stored in the table. That way we only perform one calculation and since there is no expression surrounding the stored timestamp, the index can now be used.

select d.timestamp
,      d.value
from   sensor_data d
where  d.sensor_id = 1
and    d.timestamp >= date_sub( NOW(), interval 1 hour )
order by d.timestamp;

The explain for the new query is as follows. We can see that the index on timestamp is used. Also note that only 191 rows are being touched’ to determine the final results of the query, while the original query was touching all rows (41018). Additionally filesort is no longer performed.

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE d range sd_timestamp_idx, ts_sensorid_idx sd_timestamp_idx 4 NULL 191 Using where

Executing the query returns in 0.02 seconds (and CPU usage stay low as well for the MySQL process). This is a result I can live with…


Previous post
Sending Push Notifications from RPi using Pushover My Raspberry Pi has been humming along for quit some time now. Next thing to add to the project is a mechanism to send notifications when certain
Next post
Integrating Pushover into Raspberry Pi Project In my previous post I talked about using the Pushover service for sending notifications from the Raspberry Pi to my mobile devices. Now it is time
This blog is powered by Blot