[Incident Report] Corruption of Some User Data_June 21, 2024

Summary

On June 19, 2024, at 9:08 am, a part of user data was corrupted due to an operational error made by our company. We sincerely apologize for any inconvenience and concern this may cause to our users.

This report provides an overview of the problem, a timeline of the operational error, the cause of the problem, and measures to prevent recurrence.
In light of this incident, we will implement a fundamental review of all operational and verification flows as a preventive measure to minimize potential operational risks and prevent recurrence. Until this action is taken, we will not release any new features or campaigns.

We will make company-wide efforts to make improvements so that we can regain the trust of our customers.

Impact of this incident

Due to an operational error, the following information is stored in the database of "CatsMe!
*No information was leaked to outside parties due to this incident.

  • Information on registered cats (age, sex, number of cats, etc.)
  • AI results and records
  • Log-in records

Timeline of operational errors

Period: June 19, 2024 09:09~June 20, 2024 21:00 *All times are listed in Japan time.

Wednesday, June 19, 2024

09:08 Occurrence of work error
The database update process was handled by a team of engineers.
During this process, the database information for development and verification was partially overwritten by the database information in the production environment.
Part of the user data is corrupted.

09:15 The problem is discovered.
It was discovered that information in the production database had been unintentionally changed.
Backup data is confirmed.

10:28 Error due to double editing
While the database was being overwritten in the system, an attempt was made to restore the backup data.
The overwriting and restoration operations overlap, and the database change history is updated with insufficient restoration of user data.

10:43 Investigation started.
The problem is reported to the entire company, and several engineers begin investigating.

10:56 Service was stopped.
CatsMe! is moved to emergency maintenance.
Restrict access from all users.

11:00 The Countermeasure team is set up.

12:34 Cooperation with cloud service provider for CatsMe!

15:32 Identifying the extent of corrupted data.

15:48 Request for an investigation to a cloud service provider.

23:36 Primary investigation
Received primary investigation results from the cloud service provider.
Investigation into the cause of the problem started in parallel with the data recovery work.

Thursday, June 20

04:01 Secondary survey
Received secondary investigation results from the cloud service provider.
Identification of the cause.

05:23 Identification of the area where data recovery is possible.

08:00 Service restoration work begins.

Friday, June 21

06:00 Service restoration work completed.

06:45 Emergency maintenance completed.

07:00 Notification of apology

Causes of the incident

There are three main reasons for this incident.

  • The server load was high for a week due to more users than expected, especially since the site was introduced in many media outlets, starting with the Reuters article on June 13.
  • Multiple people accessed the server simultaneously while the server load was high.
  • The server backup system was incomplete in the event of an emergency.

Recurrence prevention measures and implementation status

Based on the reasons for this incident, the following measures to prevent recurrence have been developed and thoroughly communicated within us and are being implemented.

Recurrence Prevention Measures (1)

[Summary]
The operation staff shall monitor the server load every business day and share it with their superiors and the company weekly. If the server load increases, the company will immediately inform the employees and discuss and implement measures to reduce the server load.

[Status of implementation].
The above operation system has already been communicated internally. The operation will be started from today.

Recurrence Prevention Measures (2)

[Summary]
The person in charge of operations is to keep track of the server log-in status of the other persons in charge of operations and instruct them not to log in excessively. The person in charge of operations must obtain permission from the person in charge before using the server.

[Implementation Status]
The above operation system has already been communicated within the company. The operation will be started from today.

Prevention of recurrence (3)

[Summary]
Establish an in-house server backup system led by the person in charge of operations. The person in charge of operations should prepare a manual of backup procedures in case of emergency and conduct training as appropriate so that the person in charge of operations can smoothly restore the server in an emergency.

[Status of Implementation]
The operation manager is currently leading an internal review of the backup system. As soon as this incident is resolved, manuals for backup procedures and training will be conducted within the company.

Once again, we are very sorry for any inconvenience and concern this incident may have caused.

We will make company-wide efforts to make improvements so that we can take care of our customers' important information and support their happy lives with their beloved cats.
Thank you for your continued patronage of CatsMe!

June 21, 2024
Carelogy Inc. CEO
Go Sakioka