MOS上一片关于ASM Rebalance很好的文章-白红宇

MOS上一片关于ASM Rebalance很好的文章

阅读量：2494 次

发布时间：2019-05-11

本文共 14396 字，大约阅读时间需要 47 分钟。

ORA-15041 IN A DISKGROUP ALTHOUGH FREE_MB REPORTS SUFFICIENT SPACE (文档 ID 460155.1)

--------------------------------------------------------------------------------

修改时间:2013-1-3类型:BULLETIN

In this Document

Purpose

Scope

Details

Capacity of disks within ASM diskgroup are different

ASM instance is shutdowned normal/immediate before a rebalance is completed

Disk is DROPPING / HUNG state

After an add disk command the rebalance is still in place

--------------------------------------------------------------------------------

Applies to:

Oracle Server - Enterprise Edition - Version 10.1.0.2 to 11.2.0.4 [Release 10.1 to 11.2]

Information in this document applies to any platform.

***Checked for relevance on 02-Jan-2013***

Purpose

ORA-15041 is the most common space error reported when disk free space is not sufficient

to complete file allocation request. Due to imbalanced space distribution, ORA-15041 can still

be encountered although ASM views reports sufficient free space. This article is intended to

help the reader to understand the most common reasons for ORA-15041 error although sufficient

free space is reported in ASM views.

Error: ORA-15041 (ORA-15041)

Text: diskgroup space exhausted

---------------------------------------------------------------------------

Cause: The diskgroup ran out of space.

Action: Add more disks to the diskgroup, or delete some existing files.

Scope

This article is intended to help the reader to understand the most common reasons

for ORA-15041 error in ASM diskgroups.Details

ASM spreads file extents evenly accross all the disks disks on a diskgroup accordingly with

capacity, free space within diskgroup and its redundancy type. Total free Mega-bytes in an

ASM diskgroup is reported in FREE_MB column but the maximum space that can be allocated

actually changes with the type of redundancy as summarized below:

Redundancy type Max.Space ---------------- -------------------------

External FREE_MB of diskgroup

Normal 1/2 FREE_MB of diskgroup

High 1/3 FREE_MB of diskgroup

v$ASM_DISK.FREE_MB column is simply a sum of free megabyte space reported in each v$asm_disk.

On the other hand USABLE_FILE_MB in V$ASM_DISKGROUP also indicates the amount of free space,

adjusted for mirroring, that is available for new allocations.

Although FREE_MB and USABLE FILE_MB columns reports sufficient free space, an ORA-15041 error

can still be encountered due to imbalanced free space between disks. The reason for this is

that one disk lacking sufficient free space makes it impossible to do any allocation in a

disk group because every file must be evenly allocated across all disks per ASM stripping

policy.

Unbalanced disk configuration and certain operations on ASM disks can create this type of

problem. The problem frequently should be resolved after a succesfull rebalance as far

as all disks have the same storage capacity and there is no underlying hardware problems.

The most common reasons can be classified as follows:

1- Capacity of disks within ASM diskgroup are different

2- ASM instance is shutdowned normal/immediate before a rebalance is ompleted.

3- Disk is DROPPING / HUNG state

4- After an add disk command the rebalance is still in place

Capacity of disks within ASM diskgroup are different

When an asm diskgroup is having the disks in different capacity, one disk lacking free

space makes it impossible to do any allocation in the other disks as well.

This is expected behaviour because every file must be evenly allocated across

all disks. Rebalancing and allocation attempts to make the percentage of allocated

space about the same on every disk. File allocation may fail with an ORA-15041

in case of imbalanced space distribution.

Extents are allocated evenly accross the disks accordingly with the capacity

(TOTAL_MB) of disks. If all ASM disks are of the same size (e.g. 10 disks, 50GB

each), ASM allocator places extents on each disk in a sequence. The first disk

allocation is chosen randomly, but all subsequent disks for extent allocation

are chosen to evenly spread each file across all disks and to evenly fill all

disks.

On the other hand, if ASM disks are not of the same size (e.g. disk 1 is 10GB,

disk 2 is 50GB and disks 3-10 are 10GB), ASM allocator will place one extent on

disk 1, five extents on disk 2, one extent on disk 3 and so on. This is to

ensure balanced disk utilization.

Extent allocation also differs with the type of redundancy. If redundancy is

NORMAL/HIGH, no allocation is possible when free space in any of the disks

is not sufficient for the requested allocation size. On the other hand, in

an external redundancy diskgroup, ASM distributes the extents evenly across

the disks accordingly with the capacity (TOTAL_MB) of disks and the allocation

continue till there exists at least two disks having enough space to complete

the allocation.

The following test demonstrates the space allocation behaviour according

to redundancy type when there exists disks with different capacity in the

diskgroups.

DG_EXT: External redundancy diskgroup with 3 disks: 447M, 447M, 70M bytes.

DG_NOR: Normal redundancy diskgroup with 3 disks: 447M, 447M, 70M bytes.

ASM/Database instance version is 10.2.0.3.

In order to test the file allocation on different redundancy types

(external/normal), files with different sizes are created on diskgroups.

In order to ensure the same amount of free space left after each file creation,

size of the files created on external redundancy diskgroup is double

the amount of the file sizes created on normal redundancy diskgroup. The

following table demonstrates the free space after each file creation and the

point where ORA-15041 error is encountered on each type of diskgroup.

FREE_MB(0): Initial free space at each disk.

FREE_MB(1): After 200M/100M files are created external/normal redundancy

diskgroup.

FREE_MB(2): After 100/50M files are created external/normal redundancy

diskgroup.

FREE_MB(3): After 500M/- file is created on external redundancy diskgroup.

NAME TOTAL_MB FREE_MB(0) FREE_MB(1) FREE_MB(2) FREE_MB(3)

+200M/+100M +100M/+50M +500M/-

------------ ---------- ---------- ------------ ---------- ----------

DG_EXT_0001 447 423 330 283 50 **

DG_EXT_0000 447 421 326 279 48 **

DG_EXT_0002 70 66 51 43 8 **

DG_NOR_0001 447 395 301 ORA-15041 * X

DG_NOR_0000 447 395 301 ORA-15041 * X

DG_NOR_0002 70 18 1 ORA-15041 * X* File allocation (50MB) on normal redundancy diskgroup fails with ORA-15041

when there is no more than 50M at a single disk. This is mainly because normal

redundancy diskgroup can’t allocate primary/secondary extents of file on

separate disks due to insufficient space. ** On the other hand, file allocation (100M + 500M) succeeds for external

redundancy diskgroup as this type of redundancy only stripes data over

available free space in all disks. Further tests show that file creation in

external redundancy is possible as long as there is some space in at least

two disks in the diskgroup.

ASM instance is shutdowned normal/immediate before a rebalance is completed

A rebalance can be stopped if ASM instance is shutdowned and it is expected

that rebalance should resume after the instance is restarted. However, due to

a known issue (Unpublished Bug 5089819) if ASM instance is shutdowned with

normal/immediate option, rebalance doesn't kick off again upon a new startup.ASM instance requires to either do a shutdown abort or restart rebalance manually.NAME GROUP_NUMBER TOTAL_MB FREE_MB

------------------------------ ------------ ---------- ----------

DG_NORMAL 2 894 386

NAME GROUP_NUMBER TOTAL_MB FREE_MB

------------------------------ ------------ ---------- ----------

DG_NORMAL_0000 2 447 193

DG_NORMAL_0001 2 447 193

A new disk is being added:

alter diskgroup dg_normal add disk '/dev/hdb10';

select * from v$asm_operation;GROUP_NUMBER OPERA STAT POWER SOFAR EST_WORK EST_RATE

------------ ----- ---- ---------- ---------- ---------- ----------

2 REBAL RUN 1 75 194 181

ASM instance is shutdowned:

SQL> shutdown immediate

ASM diskgroups dismounted

ASM instance shutdown

SQL>

Upon startup, there is no relabalance operation going on and free

space in asm disks is not balanced.

NAME GROUP_NUMBER TOTAL_MB FREE_MB

------------------------------ ------------ ---------- ----------

DG_NORMAL 2 1341 781

NAME GROUP_NUMBER TOTAL_MB FREE_MB

------------------------------ ------------ ---------- ----------

DG_NORMAL_0000 2 447 240

DG_NORMAL_0001 2 447 239

DG_NORMAL_0002 2 447 302

Normally, we should be able create a file with approx. 350-400M as

asm diskgroup reports sufficient space.SQL> alter tablespace test1 add datafile '+DG_NORMAL' size 370m;*

ERROR at line 1:

ORA-01119: error in creating database file '+DG_NORMAL'

ORA-17502: ksfdcre:4 Failed to create file +DG_NORMAL

ORA-15041: diskgroup space exhaustedSQL> alter diskgroup dg_normal rebalance power 11;

SQL> alter tablespace test1 add datafile '+DG_NORMAL' size 370m;

Tablespace altered

A new rebalance remedies this situation. Diskgroup has the following free

space figures after 370MB file is created.

NAME GROUP_NUMBER TOTAL_MB FREE_MB

------------------------------ ------------ ---------- ----------

DG_NORMAL 2 1341 70NAME GROUP_NUMBER TOTAL_MB FREE_MB

------------------------------ ------------ ---------- ----------

DG_NORMAL_0000 2 447 24

DG_NORMAL_0001 2 447 22

DG_NORMAL_0002 2 447 24 Disk is DROPPING / HUNG state

Free space in an asm diskgroup can be imbalanced if a drop disk fails for any

reason (lack of space, disk crash, etc.). Disks may be stuck in the DROPPING

state in this case.

The most common reasons for DROPPING state are that a careless drop disk command

is submitted on a diskgroup runing with full capacity or dropping the disk reduces

the amount of available disk space to less than that required for all the existing

extents. After a drop disk command, a rebalance is triggered and completed however

there exits disks at DROPPING state in this case.

It is not possible to allocate space from diskgroup any more as no free space is

also reported in v$asm_diskgroup. To resolve the problem, you can either add more

disks to provide extra space or undrop the disk to roll back the drop.

- Add more disks

Adding more disks provides starts a rebalance implicity and provides extra space for the rebalance

to complete. Once the data is copied out of the dropping disks, they will be expelled out of the diskgroup.

alter diskgroup add disk 'path';- Undrop the disk

when an undrop command is issued, it simply rolls back the drop. If the disks

dropping has not gone too far, ASM will be able to re-integrate the disks back into

the diskgroup. UNDROP DISKS triggers a rebalance implicitly which rolls back the drop

and make the space again available to diskgroup. Space should be balanced between disks

once the command is completed.alter diskgroup undrop disks; While disks are runing near to capacity, imagine a drop disk brings the

disk state to HUNG. Drop disk can't be completed as due to lack of space as

current extents can't be fit into the remaining disks.

NAME TOTAL_MB FREE_MB STATE

------------------------------ ---------- ---------- --------

DG2_0001 447 97 NORMAL

DG2_0000 447 90 NORMAL

DG2_0002 447 97 NORMAL

NAME TOTAL_MB FREE_MB TYPE

------------------------------ ---------- ---------- ------

DG2 1341 284 EXTERNalter diskgroup DG2 drop disk DG2_0002;

While rebalance is runing, disk state stays at DROPPING but it changes to

HUNG after rebalance is completed.NAME TOTAL_MB FREE_MB STATE

------------------------------ ---------- ---------- --------

DG2_0001 447 7 NORMAL

DG2_0000 447 6 NORMAL

DG2_0002 447 271 DROPPING

After rebalance is complete, disk state is HUNG as disk can't be expelled out

from the diskgroup. NAME TOTAL_MB FREE_MB STATE

------------------------------ ---------- ---------- --------

DG2_0001 447 0 NORMAL

DG2_0000 447 0 NORMAL

DG2_0002 447 284 HUNG

alter diskgroup DG2 undrop disks;

Undrop disk triggers a new rebalance implicitly and resolves the problem.

This state can also be resolved with ADD DISK by providing extra space for

the rebalance to complete. Once the data is copied out of the dropping disks

they will be expelled out of the diskgroup. After an add disk command the rebalance is still in place

When a disk is added to a disk group its space is not immediately available

for allocation. Since every file must be evenly allocated, extents must be

rebalanced off other disks to the new disk to make space evenly available.

Free space will be available in the course of time while rebalance is

progressing. Since rebalance takes a while, users may not be able to allocate

files and could get out of space errors (ORA-15041).

As a workaround, WAIT option with ADD disk command can be used. If the WAIT

option given with add disk, the command doesn't return until rebalance is

complete. This may provide more intuitive to users who run disks with near

to full capacity.

The following test shows how free space at each disk change while rebalance is

going.

NAME GROUP_NUMBER TOTAL_MB FREE_MB

------------------------------ ------------ ---------- ----------

TEST_DG 2 894 48PATH GROUP_NUMBER TOTAL_MB FREE_MB

-------------------- ------------ ---------- ----------

/dev/hdb8 2 447 24

/dev/hdb9 2 447 24

Two more 457MB disks are added to diskgroup.

alter diskgroup test_dg add disk '/dev/hdb10','/dev/hdb11';Free space is reported immediately in v$asm_diskgroup however it is imbalanced

as rebalance is not completed yet.NAME TOTAL_MB FREE_MB(1) FREE_MB(2) ... FREE_MB(3)

------------- ---------- ---------- ---------- ----------

TEST_DG 1788 888 887 885PATH TOTAL_MB FREE_MB(1) FREE_MB(2) ... FREE_MB(3)

-------------- ---------- ---------- ---------- ----------

/dev/hdb8 447 36 144 221

/dev/hdb9 447 36 144 220

/dev/hdb10 447 409 300 223

/dev/hdb11 447 407 299 221

Free space is getting balanced while the rebalance is progressing. Till the

rebalance is completed large file allocations may fail with ORA-15041 errors

although free space is reported in v$asm_diskgroup.

As a workaround, WAIT option can be used with add disk command. When the WAIT

option is used, add disk command doesn't return until rebalance is complete.

This may provide more intuitive results when running disks with near to full

capacity.

@However if we tried to do unbalanced disk groups

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/19423/viewspace-1130911/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/19423/viewspace-1130911/

你可能感兴趣的文章

Hive Beeline使用

查看>>

Centos6安装图形界面(hdp不需要，hdp直接从github上下载数据即可)

查看>>

CentOS7 中把yum源更换成163源

查看>>

关于yum Error: Cannot retrieve repository metadata (repomd.xml) for repository:xxxxxx.

查看>>

linux下载github中的文件

查看>>

HDP Sandbox里面git clone不了数据（HTTP request failed）【目前还没解决，所以hive的练习先暂时搁置了】

大数据领域两大最主流集群管理工具Ambari和Cloudera Manger

Hive语句是如何转化成MapReduce任务的

查看>>

Hive创建table报错：Permission denied: user=lenovo, access=WRITE, inode="":suh:supergroup:rwxr-xr-x

查看>>

Hive执行job时return code 2排查

查看>>

hive常用函数及数据结构介绍

查看>>

Hive面试题干货（亲自跟着做了好几遍，会了的话对面试大有好处）

查看>>

力扣题解-230. 二叉搜索树中第K小的元素（递归方法，中序遍历解决）

查看>>

力扣题解-123. 买卖股票的最佳时机 III（动态规划）

查看>>