Residential electricity current and appliance dataset for AC‑event detection from Indian dwellings

the Indian residential sector is responsible for 24% of total energy consumption (Central Electricity Authority 2020). It is predicted that the global electricity consumption of residential Air Conditioners in 2050 will be more Abstract Air Conditioners (ACs) have become a major contributor to residential electricity consumption in India. Non-intrusive Load Monitoring (NILM) can be used to understand residential AC use and its contribution to electricity consumption. NILM techniques use ground truth information along with meter readings to train disaggregation algorithms. There are datasets available for disaggregation, but no dataset is available for a hot tropical country like India especially for AC event detection. Our dataset’s primary objective is to help train NILM algorithms for AC event detection and compressor operations. The dataset comprises of home-level electrical current consumption and manually tagged AC ground truth (ON/OFF status) data at 1-min interval, indoor environment temperature and relative humidity readings at 5-min interval and dwelling, AC and household characteristics. The data was collected from 11 homes located in a composite climate zone-Hyderabad, India for 19 summer days (May) 2019. The dataset consists of 1.6 million data points and 450 AC cycles with each cycle having a runtime of more than 60 min (> 2000 compressor ON/OF cycles). Public availability of such a dataset will allow researchers to develop, train and test NILM algorithms that recognize AC and identify compressor operations.

than triple of that in 2016 (International Energy Agency 2018). Ownership of ACs in the Indian residential sector could increase further due to the demographic shifts towards cities, increase in the standard of living, and declining prices of Air Conditioner units (Hu et al. 2019). The 'Indian energy security scenario 2047' suggests that the number of residential AC units will increase from 21.8 million in 2017 to approximately 68.9 million in 2027 and to 1046 million units by 2047 (International Energy Agency 2018; Debnath et al. 2020). Studies suggest that by 2037, AC consumption in India would increase by 4.3 times than in 2017-18 (Government of India 2019). Occupant behaviour also has a significant effect on the overall energy consumption of AC (ANNEX 2019; Brounen et al. 2012). This has led to major research interests in understanding the impact of AC usage on economics, environment, social development, and sustainability (Xu et al. 2018;Yang and Cao 2018).
AC monitoring helps in determining how much AC consumption contributes to the overall household electricity consumption. It also helps in identifying and analyzing AC usage patterns that can help to identify energy savings potential (Ali et al. 2021;Garg et al. 2021). However, individual AC monitoring for every home requires additional investment to build hardware for sub-metering and reliable communication channels that collect data from multiple sub-meters. NILM attempts to solve this problem by utilizing data from existing loggers installed at the meter level and applying disaggregation algorithms that will classify the individual appliances based on their load signatures. Researchers require access to datasets recorded in the field to develop, train and test these disaggregation algorithms. Since it is not feasible for every researcher to record their own dataset, the creation of open-access datasets promotes NILM research .
REDD (Kolter and Johnson 2011) was the first publicly available dataset for research in energy disaggregation. The data set contains power consumption data: voltage/current from 6 homes for several weeks. Many datasets on NILM have since been released. BLUED (Blued 2011), Smart (Barker et al. 2012, PLAID (Gao et al. 2014), and Dataport (Parson et al. 2015) are datasets of USA households capturing current and voltage data at the home and appliance level. Similar datasets were created from UK homes such as UK-DALE  and REFIT (Firth et al. 2017). UK-DALE collected data from 5 homes at 16 kHz frequency for a period of 2.5 years and REFIT was prepared by collecting data from 20 homes at an 8-s interval for a 2-year period. AMPds (Makonin 2016), AMPds2 , and RAE (Makonin et al. 2018) are publicly available datasets from homes in Canada. Other open energy disaggregation datasets include Tracebase (Reinhardt et al. 2012) from Germany, GREEND (Monacchi et al. 2014) from Italy, ECO (Beckel et al. 2014) from Switzerland, DRED (Uttama Nambi et al. 2015) from Holand, and ENERTALK (Shin et al. 2019) from Korea. The availability of diverse datasets helps researchers understand electricity and appliance usage patterns for different countries. The usage patterns vary across countries due to differences in occupant comfort, lifestyle, and outdoor weather conditions. For a tropical country like India energy datasets I-BLEND (Rashid et al. 2019), COMBED (Batra et al. 2014a) and iAWE (Batra et al. 2014b) are available. I-BLEND contains 52 months of data from 7 buildings at a 1-min interval. Similarly, COMBED captured data from 6 commercial buildings for 1 month at a sampling rate of 30 s. The dataset contains energy utilized by transformers, chillers, UPS and lifts. IAWE recorded 6-s appliance data and 1 Hz aggregated data from one home for 73 days. Of the available building and residential datasets, 15 datasets have claimed to have disaggregated AC load out of which only 1 dataset (iAWE) is from India. This dataset has ground truth of 10 appliances collected only for a single home. Hence there is a need for a larger dataset with diverse usage and AC types: split and inverter type.
Our dataset contains monitored data from 11 homes in Hyderabad, India a city with composite climate for a 19-day period. For each home there is a record of phase-wise electrical current at 1-min interval. The dataset also contains indoor temperature and humidity of the room with AC at 5-min interval thereby generating 1.6 million data points and 450 AC cycles (> 2000 compressor ON/OFF cycles). The dataset has 7 parameters-Timestamp, phase-wise current (mA) (3-phases), AC status, temperature, and humidity. This dataset can be used to train NILM algorithms for AC. Additionally household survey and outdoor weather data is also part of the dataset. Table 1 summarizes the household information. The income group is categorized into Low-Income (LIG), Middle-Income (MIG), and High-income (HIG) groups based on the annual household income. The categorization is done according to PMAY (Pradhan Mantri Awas Yojna) Scheme (Ministry of Housing and Urban Affairs, Government of India 2019). Figure 1 shows the schematic overview of the data acquisition system. A mobile application was also developed to connect the mobile to Garud and EnviLog loggers. The application collects data from the loggers and uploads it to the database server using Wi-Fi or a cellular network. A web application has been designed to view the data uploaded to the database server. The web application also facilitates report generation providing details of per house energy consumption on a daily, weekly, and monthly basis.

Methodology
A detailed monitoring framework was developed for data collection that included lowcost current loggers (Garud) and temperature-humidity logger (Envilog). Garud, shown in Fig. 2a, a current consumption logger recorded electric current at 1-min intervals  (Tejaswini et al. 2019). It consists of three CT clamps that can be installed on the main circuit board having a 1-phase or 3-phase connection. Garud devices were installed by trained electricians at the main circuit board whereas EnviLog, as shown in Fig. 2b, was installed by the researcher in the room with the AC.

Hardware setup
Garud is a battery-powered logger designed at our research lab using low-cost BLE (Bluetooth Low Energy) that operates in the 2.4 GHz ISM band using GFSK (Gaussian Frequency Shift Keying). BLE is supported by mobile phones and tablets, making it an ideal solution for interfacing the logger to an Android application and capturing data at different intervals based on user's choice. An on-board memory chip of 4 MB capacity was added to hold data for up to 12 months. Internal circuitry of Garud is shown in Fig. 3 and it consists of the following subsystems:  • Current transformer (XH-SCT-T10) is a 50A to 0.33 V equivalent voltage output current transformer rated for input current from up to 100A. • Analog to Digital Converter (MCP3919) from Microchip is used to convert analog voltage from the current transformer to digital signals that can simultaneous sample 3-phase current. Its low power consumption (< 6 mA at 3.3 V) makes it an appropriate component for a low power analog data acquisition device. • Microcontroller with Bluetooth Low Energy Subsystem (NRF52832) from Nordic semiconductors is a multi-protocol System on Chip (SoC) that supports Bluetooth 5 and Bluetooth mesh. It can achieve a very low power consumption as it has an onchip adaptive power management system. • Flash Memory (W25Q32JVSSIQ) from Winbond Electronics is interfaced with the microcontroller to log data. It has 4 MB storage with 66 MB/s continuous data transfer and over 20-year data retention. • Real-time Clock (MCP7940N) tracks time using internal counters for hours, minutes, seconds, days, months, years, and day of the week. • Boost Converter (MAX17220-MAX17225) is a family of ultra-low quiescent current boost (step-up) DC-DC converters with a 225 mA/0.5A/1A peak inductor current. It helps in keeping the device running at a very low voltage range up to 0.8 voltage from two AA Batteries.
Garud is connected to the DB box/main meter using three CT clamps as shown in Fig. 4. The number of CTs connected depends on whether the homes run on 3-phase electricity or single phase.
EnviLog is a custom-built relative humidity and temperature logger that uses bluetooth for communication and data transfer. It uses lithium coin batteries and can store 12,000 temperature and humidity records. The specifications of Garud and EnviLog are presented in Table 2.

Application for data collection
An android mobile application as shown in Fig. 5 was developed to manage the homes under survey and upload data files to the server. JavaScript was used for server-side scripting. Each house was given a unique ID to keep the details anonymous. Garud is uniquely identified with a 16-digit MAC ID and 4-digit device ID and Envilog is identified with an 8-digit ID. The application supported two steps: the installation step and the monitoring step. The installation section listed the home IDs where devices were not yet installed. The monitoring section listed the home IDs where installation was complete. The monitoring section was used to download data from Garud and Envilog and upload it to the server as Comma Separated Values (CSV) files.

Data records
The RESIDE-AC dataset is available as well-structured and easy-to-use .CSV (Comma Separated Values) files. The data set contains 19 summer days data from 10/05/2021 IST to 28/05/2021 IST for 8 single-AC homes and 3 homes with more than one AC.  Envilog records. However, the start time of Garud and Envilog is not the same and therefore the timestamps will not match exactly. The dataset is available in 3 directories namely Garud, Envilog, and Household Information. 'Garud' and 'Envilog' folders include 11 files, one for each home. The naming convention for these files is <Gxx> and <Exx> where 'xx' is a number between 01 and 11. 'xx' indicates the house ID that is used as a key to identify each home uniquely. Household characteristics, Building details and AC information are present as separate CSV files named 'HomeDetails' , 'BuildingDetails' and ' ACDetails' in the 'Household information' folder. Data related to daily outdoor temperature and humidity during the specified period is present in a separate file 'weatherdata.csv' . The weather data is collected from the official government website (Open Data Telangana 2017). The file contains minimum and maximum temperature in °C and minimum relative humidity and maximum relative humidity in %. Each Garud file consists of 5 fields: "datetime", 3-phase current: "R", "Y", "B", "AC status" captured at 1-min interval. The "datetime" column contains the IST time when the data was recorded. The 3-phase current columns represent the phase current in milli-Amperes. For a single-phase house Phase-R column is filled with current values and the remaining 2 columns hold the value 0. For a three-phase house Phase-R, Y and B are filled with current values. The AC status field contains the value 0 indicating AC OFF and 1 indicating AC ON. Envilog sensors are placed in the bedroom with most frequently used AC. Each Envilog file consists of 3 fields: "datetime", "temperature" and "relative humidity" captured at 5-min interval. The datetime column contains the IST time of the record. Temperature and humidity fields represents the indoor room parameters in °C and %RH.
The monitored dwellings were either stand-alone homes or flats located in a low-rise apartment building (< 6 story). All the dwellings were constructed using Reinforced Cement Concrete (RCC) frame structure with burnt clay bricks as infill. The windows were single glazed with fixed external shading. Additional details related to number of rooms, area of the dwellings and age of the dwelling are present in the 'buildingdetails' file. The 'DwellingDetails' file consists of the following fields: House ID (key), number of occupants, income group, ownership, annual electricity consumption calculated from electricity bills, phase connection type (single-phase/three-phase) and appliance list. The appliance columns indicate the number of appliances owned. The appliances owned were: washing machine, refrigerator, microwave/oven, television, water pump (Indian residences use water pumps that pump water from underground water storage tank to roof-top tanks for gravity based water supply system), Electric Geyser (electrical resistance heating appliance to heat water for domestic usage in winter), Dessert cooler (device that cools air in an entire room through evaporation of water using wet grass). The device is driven by an electric fan) and Electric inverter with battery backup (Backup devices which supply AC power to appliances during power outages).
In Indian homes majority of the Air Conditioners are present as individual units in the bedroom and living rooms and operate using thermostat in contrast to centralized AC systems that are common in the West. The average size of bedroom whose AC was monitored is 20 sq m. The ' ACDetails' file contains additional information such as AC tonnage, type of AC, age of the AC and BEE star rating.

AC tagging
AC ON/OFF tag values were assigned manually to each record by observing the increase in current values. We confirmed that it was AC load by verifying the drop in room temperature 5-min after the increase in current consumption. Similarly, when there was a decrease in AC current consumption and a gradual increase in temperature for continuous 15-min period, we indicated the activity as AC OFF. In the case of a three-phase connection, the phase on which the primary AC is running was selected for tagging. In the case of a single-phase connection, the default phase was selected. Figure 6 shows the graph of AC consumption along with tags for a 1-h period.

Technical validation
To validate that the logging system is running as expected, several test routines were carried out. One test checked that the electricity consumption values (mA) captured by Garud are similar to values observed in the monthly electricity bills. Though our dataset was for 19 days, 1 year current data was captured and compared to the monthly electricity bills. Before deployment, the devices were checked for precision by testing against accurate meters. Yokogawa (WT330) was used for laboratory reference and Wattnode (WNC-3Y-480-MB) was used as a field reference meter. An experiment was conducted to measure Garud's precision by comparing its readings with Yokogawa (WT330). The experiment was conducted by varying the load connected across Garud and the two standard meters. The current was varied from 100 mA to 12 A. Yokogawa (WT330) was further used for calibrating Garud and its results are displayed as a scatter plot in Fig. 7.
A similar accuracy test was performed on Envilog by comparing its readings with iButton and HOBO UX100-003. For this test, the sensors were placed inside a box and the temperature was varied by changing the set point of the air-conditioner. The temperature was varied from 19 to 30 °C. Since Envilog did not use a real-time clock, we may observe a possible time shift during the study for a couple of minutes. The temperature and humidity graph plots in Fig. 8 show the results of conducted experiment.
To verify data correctness, we visualized the average AC usage hours for all 11 homes. Figure 9 shows the fraction of time AC is used in the given 1 h interval for 1 day averaged across 11 homes. It can be observed that the usage hours are high during early mornings hours and late nights due to the use of AC. The figure also shows that the most common AC usage window in all houses is during sleep hours which are generally between 10:00 pm and 07:00 am with an average runtime of 6.2 h. It is also observed that there is minor usage of AC during the afternoon hours between 02:00 pm and 06:00 pm.

Conclusion
India being a hot tropical country, AC is a major load contributor. The demand for number of residential AC units in India is predicted to increase from 21.8 million in 2017 to 1046 million units by 2047. In contrast to the centralized ACs that are common in the west, India has different types of ACs (split AC and window AC) installed in individual rooms. Our dataset contains monitored data from 11 homes in Hyderabad, India a city with composite climate for a 19-day period. The dataset comprises of 450 AC cycles and more than 2000 compressor ON/OF cycles. Availability of such a dataset will help researchers train and test NILM algorithms to recognize AC events.