Production Server filling up with Redo Logs resulting in space issue. Deep research!

Situation: We have a production portal which connects to our Oracle  database to retrieve some data. We have been getting ORA-00257  archiver error on the portal quite often which makes us look bad. Most of the servers are UNIX based. One of our teammates has been looking at the site and letting us know of this errorso that we fix it before some user looks at it.

Reason for the error: This error usually shows up when the partition or hard disk which is used by Oracle for writing the redo logs is full.

How to check: Connect to the Oracle DB server and check the disk space on it using df -h in case of UNIX. Pretty obviously the partition which has 100% used is the hard disk that Oracle uses to write the redo logs. In case there are more than one partitions that show 100% usage, you need to figure out the redo log writing location from the show parameter DB_RECOVERY_FILE_DEST command in Sqlplus.

After figuring out the destination or location where Oracle is filling up the redo logs you can backup/move some of the older redo logs to a different location to free up space for the issue to be solved immediately.

Not solved yet?: In our case, Oracle kept filling up the redo log files again and again and we had to clean it up again. We went through various articles on oracle errors to figure the root cause of this issue. We first doubted the Tape Drive because we had a recent tape drive failure and this might be linked to it.

We looked into various stored scripts on the server which ran on a daily basis and looked at the logs. Looking at the logs for the RMAN scripts on the database server pointed us toward the RMAN-00571 and ORA-19502 errors which were related to space issues too.

These scripts were written probably by an previous DBA.They archive all the redo logs and deletes them at the end of the day. These were not able to successfully complete due to space not being sufficient on the hard disk.

Solution: Space clean up and making sure the archiving and deleting process are running properly at the end of the day so that the next redo log writing process has enough space to use.

Conclusion: Don’t just look for a direct solution to a problem in the IT field. There might be more than one cause for a single problem. One problem leads to another and then to another.