Oracle E-Business Suite customer was reporting ORA-04031 errors being returned by the database. There were a number of ORA-04031 shared pool errors generated in the alert by background processes:
ORA-04031: unable to allocate 168 bytes of shared memory (“shared pool”,”select t.ts#,t.file#,t.block…”,”Heap0: KGL”,”kglob”)
I checked the shared pool and saw that the majority of it was used by KGH: NO ACCESS, which according to oracle Metalink note 451960.1 is:
KGH: NO ACCESS refers to granules that are in transit with ASMM i.e memory being reassigned from the Shared Pool to the Database Buffer Cache and vice-versa.
Flushing the shared pool didn’t seem to have any impact on the size of KGH: NO ACCESS, nor did it prevent sessions from encountering the ORA-04031 error. Restarting the database was the only way to stop the ORA-04031 error from impacting the users.
Looking at Metalink for KGH: NO ACCESS, all things pointed to Automatic Shared Memory Management as the culprit, which is a feature introduced in Oracle 10g to make the management of Oracle’s shared memory easier. When Oracle is started, it grabs a chunck of memory known as the SGA, which it subdivides into several smaller pools that it uses for caching and buffering. ASSM monitors the sub-pools inside of the SGA and automatically resizes the pools according to how much demand there is. It would essentially take some memory away from one pool and assign it to another pool based on usage. I checked the resize operations of the SGA by looking at v$sga_dynamic_components and v$sga_resize_ops and saw that there were a large number of resize operations that failed during the time that the customer was experiencing the issue. The resize operations occurred at a very rapid rate (several times per second) which indicates that Oracle was having a hard time moving memory around. Looked at the size of the SGA and saw that it was 600M, which was perhaps too small for what Oracle was trying to do.
I found a number of notes that talk about frequent resizes of the SGA that had references to bugs 7189722 and 6528336. Note 6528336.8 deals with bug 6528336 and states that it is fixed in 10.2.0.4. Note 7189722.8 deals with unpublished bug 7189722 and states that the version confirmed as being affected is 10.2.0.4 and that it will be fixed in a future 11.2 release. Note 406317.1 deals specifically with ORA-04031 errors and subsequent ORA-00600 errors as a result of the ORA-04031 errors. The customer is currently running 10.2.0.4, which suggests that they’re most likely hitting up against bug 7189722, for which there is no patch.
Metalink suggests a few workarounds for the issue. The first is to keep ASSM on, but set minimum sizes for the various pools. The second is to disable ASMM by setting the sga_target initialization parameter to 0, and then define a size for each pool in the SGA. And the third workaround is keep ASMM on and set the hidden initialization parameter _memory_broker_stat_interval to 999, which controls how frequently the SGA can be resized, which by default is 30 seconds. Note 742599.1 provides a pretty good overview.
Our first attempt to address the issue was to increase the SGA to the maximum allowed on a Red Hat 5 PAE kernel, which is about 2400M. The intent was to give Oracle more room in which to perform the resize operations. It worked OK for a while, but ultimately all it did was delay the issue, and we began seeing the same issue a month or so afterwards.
Our next step is to disable ASMM altogether and sizing the various pools manually. In addition, we’re also discussing ways to increase the size of the SGA on Red Hat 5 32bit. I would have been fairly easy under Red Hat 4 to bump up the SGA to about 3600M using the hugemem kernel, however, that kernel is not available in RH5, so the only option would be to use VLM. I’ll save that for another post 🙂