DBA Sensation

January 31, 2013

SQL 2008 Active-active cluster on windows 2008 R2 Share storage migration

Filed under: 3. MS SQL Server — Tags: , , , , , — zhefeng @ 11:16 am

This is my POC doc before we migrate our production sql 2008 cluster to new storage. it has two parts: first part simply install the cluster on 2 ESX VM images by using vmfs shared disks. Second part is the storage migration part.

====Part 1: build sql 2008 cluster

IP Address mapping:
10.165.36.78 vmsql2008cls01
10.165.36.79 vmsql2008cls02
10.165.36.80 vmsql2008cls
10.165.36.81 vmsql2k8clsvip1
10.165.36.82 vmsql2k8clsvip2
10.165.36.83 vmsql2008clsdtc

Initializing the LUN and labelling them:
G: OLTP (OLTP data volume)
H: OLAP (OLAP data volume)
I: Quorum (Quorum Disk)
R: msdts

1. enable “application server” roles in OS “Server Manager”
check “.NET Framework 3.5.1”, “Distributed Transactions – Incoming Remote Transactions”, “Distributed Transactions – Outgoing Remote Transactions”

2. Add “failover clustering” feature

3. setup cluster in cluster manager, cluster IP/name: vmsql2008cls 10.165.36.80

4. setup clustered MSDTC service
Open Cluster manaager, In “services and applications” add “DTC” service, use name “vmsql2008clsdtc”, ip “10.165.36.83”

5. Install first instance. Run under command prompt: Setup.exe /ACTION=PrepareFailoverCluster (note: you have to use this setup option and run it on BOTH nodes)

1). pickup instance features and shared features (except reporting services)

2). Default instance id: MSSQLSERVER

3). Cluster Security Policy – “Use service SIDs”

4). Server Configuration – use the same account for all sql server services “pgdev\sevice.pg.prod”, keep all the default startup type (especially for dtabase engine “manual”)

6. Run “setup” to finish cluster on ONLY active node (owns the storage)
Run under command prompt: setup.exe /ACTION=CompleteFailoverCluster
1). SQL Server Network Name for first instance: vmsql2k8clsvip1
2). pickup “OLTP” disk
3). ip address for virtual: 10.165.36.81
4). done with installation. Varify it in cluster admin

7. Install 2nd instance.
1). run Setup.exe /ACTION=PrepareFailoverCluster on BOTH nodes
2). run setup.exe /ACTION=CompleteFailoverCluster on ONLY active node
3). virtual IP and network name:
10.165.36.82 vmsql2k8clsvip2
4). pick up “OLAP” disk
5). done with installation

====Part 2: migrating from old LUNs to new LUNs

8. Add new disks on Node1 (V, W, X, Y) and plan their mapping drives:
V -> G
W -> H
X -> I
Y -> R

9. in cluster manager, “add disk” to add all new disks to storage catalog

10. move the new disks to corresponding service or applications except Quorum disk (right click on disk, “more actions” -> “move this resource to another service or application”)

11. Stop your clustered application(s) (the virtual SQL instance in this case) and copy data from old drives to new drives (include msdtc drive)
1). In Failover Cluster Management, take all SQL instances and MSDTC service offline. This will stop SQL services and release any open handles on the SQL data files so we can copy the data to the new drive as below order:
G -> V
H -> W
R -> Y

2). When the data copy is complete, change the drive letter of an existing data drive to a temporary letter. Set the drive letter of the new drive to the original drive letter of your original storage. (If the original data drive was G:, the new drive should be G: or SQL will not be able to find its files and thus will not start).
So set old ones to temp drive letter first:
G -> O
H -> P
R -> Q

12. Then Map new drives to old drive letters:
V -> G
W -> H
Y -> R

13. change the dependency for each application and service. For example, for first instance ‘sql server’, right click “properties” -> dependencies -> change to “cluster disk 5” (the new disk)

14. delete MSDTC service and recreate
1). delete the resource “MSDTC” in dtc service catalog
2). add resource “MSDTC” back to catalog
right click on dtc sevice icon on the left -> add a resource > more resource -> “2- Add Distribute Transaction Coordinator”
3). add dependency for the new dtc service
right click on “microsoft Distributed Transaction Coordinator” -> ” properties” -> “Dependencies” -> add virtual server name And new cluster disk as depenndencies for this service.
4). bring it online.

15. Move quorum
1). right click on the cluster name under “failover cluster manager” -> “more actions” -> “configure cluster quorum settings”
2). choose default Quorum configuration ” Node and disk Majority”
3). check new cluster disk for “storage witness”,
4). confirm and done with reconfigure
5). no need to change the drive letter since it’s quorum, so the new quorum is “X”

16. bring all old cluster disk offline
that’s: O, P, Q, I

17. bring all clustered application(s) online

18. Verify services and applications are online and functional, and failover works

19. delete old disks in cluster manager, delete partitions as well.

Reference:
1. How to Move a Windows 2008 Cluster to New SAN Storage
http://www.systemcentercentral.com/BlogDetails/tabid/143/IndexID/54853/Default.aspx

August 28, 2012

attached database with single MDF file

Filed under: 3. MS SQL Server — Tags: , , — zhefeng @ 12:15 pm

i need to attach some database and don’t want to copy over the log files, here is how:
EXEC sp_attach_single_file_db @dbname=’TestDb’,
@physname=N’C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\TestDb.mdf’

After run it, sql server will make a log file for you.

August 15, 2012

Use RMAN to restore/recover the database to remote host

Filed under: [backup and recovery] — Tags: , , , — zhefeng @ 10:10 am

0. background
The purpose for this testing is to try restore/recover the database to remote host with rman backup
i am using 1 vmware linux box, single ORCL oracle instance on ASM storage (to make things more complicated:))
On the source db, i have a user “jehan”, a table “test” , with 3 rows as below:
SQL> select * from jehan.test;

COL1
———-
good
best
worst

1. backup source db
[oracle@myrh5 trace]$ rman target /
Recovery Manager: Release 11.2.0.1.0 – Production on Tue Aug 14 17:56:25 2012
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL (DBID=1270514474)

RMAN> backup database;

Starting backup at 14-AUG-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=155 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00007 name=+DATA/orcl/ds01.dbf
input datafile file number=00002 name=+DATA/orcl/sysaux01.dbf
input datafile file number=00001 name=+DATA/orcl/system01.dbf
input datafile file number=00006 name=+DATA/orcl/cms01.dbf
input datafile file number=00003 name=+DATA/orcl/undotbs01.dbf
input datafile file number=00005 name=+DATA/orcl/example01.dbf
input datafile file number=00004 name=+DATA/orcl/users01.dbf
channel ORA_DISK_1: starting piece 1 at 14-AUG-12
channel ORA_DISK_1: finished piece 1 at 14-AUG-12
piece handle=+DATA/orcl/backupset/2012_08_14/nnndf0_tag20120814t175642_0.387.791315803 tag=TAG20120814T175642 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:01:45
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 14-AUG-12
channel ORA_DISK_1: finished piece 1 at 14-AUG-12
piece handle=+DATA/orcl/backupset/2012_08_14/ncsnf0_tag20120814t175642_0.386.791315911 tag=TAG20120814T175642 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 14-AUG-12

2. delete source db “ORCL” in dbca

3. startup target database with no mount by using default init.ora file (note, you don’t need to have target db created beforehand)
[oracle@myrh5 trace]$ export ORACLE_SID=ORCL
[oracle@myrh5 trace]$ rman target /

Recovery Manager: Release 11.2.0.1.0 – Production on Wed Aug 15 09:51:08 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

connected to target database (not started)

RMAN> set dbid 1270514474;
RMAN> startup nomount

startup failed: ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file ‘/home/u01/app/oracle/product/11.2.0/dbhome_1/dbs/initORCL.ora’

starting Oracle instance without parameter file for retrieval of spfile
Oracle instance started

Total System Global Area 158662656 bytes

Fixed Size 2211448 bytes
Variable Size 92275080 bytes
Database Buffers 58720256 bytes
Redo Buffers 5455872 bytes

4. restore the spfile from backup to pfile
RMAN> restore spfile to pfile ‘/home/u01/app/oracle/product/11.2.0/dbhome_1/dbs/initORCL.ora’ from ‘+DATA/orcl/backupset/2012_08_14/ncsnf0_tag20120814t175642_0.386.791315911′;
Starting restore at 15-AUG-12
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=98 device type=DISK

channel ORA_DISK_1: restoring spfile from AUTOBACKUP +DATA/orcl/backupset/2012_08_14/ncsnf0_tag20120814t175642_0.386.791315911
channel ORA_DISK_1: SPFILE restore from AUTOBACKUP complete
Finished restore at 15-AUG-12

5. make the path for auditing according to pfile parameters (*.audit_file_dest=’/home/u01/app/oracle/admin/orcl/adump’), you have to do this otherwise the rman can’t start database with nomount
mkdir -p /home/u01/app/oracle/admin/orcl/adump

Note: also if you want to put the control file in different path, modify the pfile for the paths now

6. now start database with no mount with pfile (which will provide the correct control file location)
RMAN> startup nomount pfile=’?/dbs/initORCL.ora’;

7. now restore the controlfile
RMAN>restore controlfile from ‘+DATA/orcl/backupset/2012_08_14/ncsnf0_tag20120814t175642_0.386.791315911’;
Starting restore at 15-AUG-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=11 device type=DISK

channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
output file name=+DATA/orcl/control01.ctl
output file name=+DATA/orcl/control02.ctl
Finished restore at 15-AUG-12

8. now we have the control files, we can start start database mount
RMAN> set dbid 1270514474;

executing command: SET DBID

RMAN> alter database mount;

database mounted
released channel: ORA_DISK_1

9. Let the restore begin!
RMAN> restore database;

Starting restore at 15-AUG-12
Starting implicit crosscheck backup at 15-AUG-12
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=11 device type=DISK
Crosschecked 4 objects
Finished implicit crosscheck backup at 15-AUG-12

Starting implicit crosscheck copy at 15-AUG-12
using channel ORA_DISK_1
Finished implicit crosscheck copy at 15-AUG-12

searching for all files in the recovery area
cataloging files…
cataloging done

List of Cataloged Files
=======================
File Name: +data/ORCL/archivelog/2012_08_14/thread_1_seq_315.385.791317055
File Name: +data/ORCL/archivelog/2012_08_14/thread_1_seq_316.384.791317069
File Name: +data/ORCL/archivelog/2012_08_14/thread_1_seq_317.383.791317085
File Name: +data/ORCL/archivelog/2012_08_14/thread_1_seq_318.382.791317099
File Name: +data/ORCL/archivelog/2012_08_14/thread_1_seq_319.381.791317113
File Name: +data/ORCL/archivelog/2012_08_14/thread_1_seq_320.380.791317127
File Name: +data/ORCL/archivelog/2012_08_14/thread_1_seq_321.379.791317139
File Name: +data/ORCL/archivelog/2012_08_14/thread_1_seq_322.378.791317153
File Name: +data/ORCL/archivelog/2012_08_14/thread_1_seq_323.377.791317171
File Name: +data/ORCL/archivelog/2012_08_14/thread_1_seq_324.376.791334027
File Name: +data/ORCL/BACKUPSET/2012_08_14/ncsnf0_TAG20120814T175642_0.386.791315911

using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to +DATA/orcl/system01.dbf
channel ORA_DISK_1: restoring datafile 00002 to +DATA/orcl/sysaux01.dbf
channel ORA_DISK_1: restoring datafile 00003 to +DATA/orcl/undotbs01.dbf
channel ORA_DISK_1: restoring datafile 00004 to +DATA/orcl/users01.dbf
channel ORA_DISK_1: restoring datafile 00005 to +DATA/orcl/example01.dbf
channel ORA_DISK_1: restoring datafile 00006 to +DATA/orcl/cms01.dbf
channel ORA_DISK_1: restoring datafile 00007 to +DATA/orcl/ds01.dbf
channel ORA_DISK_1: reading from backup piece +DATA/orcl/backupset/2012_08_14/nnndf0_tag20120814t175642_0.387.791315803
channel ORA_DISK_1: piece handle=+DATA/orcl/backupset/2012_08_14/nnndf0_tag20120814t175642_0.387.791315803 tag=TAG20120814T175642
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:03:25
Finished restore at 15-AUG-12

Note: if you want to restore the datafile to different location, have to do the path mapping like this (not for temp tablespace datafile), after restore done, also run “SWITCH DATAFILE ALL;” for updating rman catalog in control file:
set newname for datafile 1 to “/u01/oradata/system01.dbf”;
set newname for datafile 2 to “/u01/oradata/sysaux01.dbf”;

10. recover the database
RMAN> recover database;

Starting recover at 15-AUG-12
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 315 is already on disk as file +DATA/orcl/archivelog/2012_08_14/thread_1_seq_315.385.791317055
archived log for thread 1 with sequence 316 is already on disk as file +DATA/orcl/archivelog/2012_08_14/thread_1_seq_316.384.791317069
archived log for thread 1 with sequence 317 is already on disk as file +DATA/orcl/archivelog/2012_08_14/thread_1_seq_317.383.791317085
archived log for thread 1 with sequence 318 is already on disk as file +DATA/orcl/archivelog/2012_08_14/thread_1_seq_318.382.791317099
archived log for thread 1 with sequence 319 is already on disk as file +DATA/orcl/archivelog/2012_08_14/thread_1_seq_319.381.791317113
archived log for thread 1 with sequence 320 is already on disk as file +DATA/orcl/archivelog/2012_08_14/thread_1_seq_320.380.791317127
archived log for thread 1 with sequence 321 is already on disk as file +DATA/orcl/archivelog/2012_08_14/thread_1_seq_321.379.791317139
archived log for thread 1 with sequence 322 is already on disk as file +DATA/orcl/archivelog/2012_08_14/thread_1_seq_322.378.791317153
archived log for thread 1 with sequence 323 is already on disk as file +DATA/orcl/archivelog/2012_08_14/thread_1_seq_323.377.791317171
archived log for thread 1 with sequence 324 is already on disk as file +DATA/orcl/archivelog/2012_08_14/thread_1_seq_324.376.791334027
archived log file name=+DATA/orcl/archivelog/2012_08_14/thread_1_seq_315.385.791317055 thread=1 sequence=315
archived log file name=+DATA/orcl/archivelog/2012_08_14/thread_1_seq_316.384.791317069 thread=1 sequence=316
archived log file name=+DATA/orcl/archivelog/2012_08_14/thread_1_seq_317.383.791317085 thread=1 sequence=317
archived log file name=+DATA/orcl/archivelog/2012_08_14/thread_1_seq_318.382.791317099 thread=1 sequence=318
archived log file name=+DATA/orcl/archivelog/2012_08_14/thread_1_seq_319.381.791317113 thread=1 sequence=319
archived log file name=+DATA/orcl/archivelog/2012_08_14/thread_1_seq_320.380.791317127 thread=1 sequence=320
archived log file name=+DATA/orcl/archivelog/2012_08_14/thread_1_seq_321.379.791317139 thread=1 sequence=321
archived log file name=+DATA/orcl/archivelog/2012_08_14/thread_1_seq_322.378.791317153 thread=1 sequence=322
archived log file name=+DATA/orcl/archivelog/2012_08_14/thread_1_seq_323.377.791317171 thread=1 sequence=323
archived log file name=+DATA/orcl/archivelog/2012_08_14/thread_1_seq_324.376.791334027 thread=1 sequence=324
unable to find archived log
archived log thread=1 sequence=325
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 08/15/2012 10:40:17
RMAN-06054: media recovery requesting unknown archived log for thread 1 with sequence 325 and starting SCN of 7484710

Note: the last error is fine, before we mount the database, we can use:
alter database mount;
set until scn or set until time
to specify the scn to avoid this error.

11. open database
RMAN> alter database open resetlogs;

database opened

Note: from 11gR2, after open database resetlogs, system will automatically create online redo log file and temp datafile.

12. Verify the data
[oracle@myrh5 trace]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Wed Aug 15 10:47:53 2012

Copyright (c) 1982, 2009, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 – 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

SQL> select * from jehan.test;

COL1
———-
good
best
worst

April 12, 2012

Set Oracle SGA > 256GB

Filed under: [Installation] — Tags: , , — zhefeng @ 2:06 pm

I had a installation request for installing Oracle 11gR2 on a 2TB memory server. The installation failed on DBCA with complains about can’t reach shared memory.

Check the metalink didn’t find any solution. My colleague told me he was having the same issue before. Oracle told him to set SGA less than 256 GB as a “workaround”.

I followed “workaround” and continued my installation. Later I did some research and I found this:

 

Solution

Checking the swap and the kernel parameters, everything was adjusted as per recommended by oracle, investigating the issue further, seems that This is caused by the prelink command. It calculates shared library load addresses, and updates the shared libraries with them. Simplest thing to do is to undo what prelink did, and disable it.
prelink -ua
sed -i ‘s/PRELINKING=yes/PRELINKING=no/’ /etc/sysconfig/prelink

 

From: https://support.oracle.com/CSP/ui/flash.html#tab=KBHome%28page=KBHome&id=%28%29%29,%28page=KBNavigator&id=%28bmDocTitle=Why%20not%20able%20to%20allocate%20a%20more%20SGA%20than%20193G%20on%20Linux%2064?&from=BOOKMARK&bmDocType=HOWTO&bmDocID=1241284.1&viewingMode=1143&bmDocDsrc=KB%29%29

Doc ID: 1241284.1

I haven’t tried it yet. anyone is having the same problem can give a try and let me know.

January 4, 2012

How to configure resource governor in sql 2008 to seperate the classified work load

Filed under: 3. MS SQL Server — Tags: , , , — zhefeng @ 5:30 pm

On our server some big apps always eat up all resource which cause other apps get hang as well.
Try to seperate the traffic between big apps and normal apps on a shared sql instance by implementing resource governor.

Here is the plan for pools. pBigApp takes maxium 60% resource and other apps by default using default pool.

Pool_name MIN% MAX% Calculated_Effective_Max% Calculated_Shared% Comment
internal 0 100 100 0 not applicable to internal pool
default 0 100 100 100 Calculated_Effective_Max%= min(MAX%,100-sum(min%)), calculated_shared%= Calculated_Effective_Max% – Min%
pBigapp 0 60 60 60

Configuration:
1. make sure the resource governor is enabled
select is_enabled from sys.resource_governor_configuration
–if it returns “0”, then you need to enable it
ALTER RESOURCE GOVERNOR RECONFIGURE;

2. Issue a CREATE RESOURCE POOL statement to create a resource pool
USE master;
— Create a resource pool “pBigApp” that sets the MAX_CPU_PERCENT to 60%.
CREATE RESOURCE POOL pBigApp WITH (MAX_CPU_PERCENT = 60);
GO

3. Create a workload group to use new pool “pBigApp”
CREATE WORKLOAD GROUP gASTEC USING pBigApp;
GO

4. Create a classifier function that maps the workload group created in the preceding step to the user of the low-priority login
–Note that any request that does not get classified goes into the ‘Default’ group.
USE master;
CREATE FUNCTION dbo.rgclassifier_MAX_CPU() RETURNS sysname
WITH SCHEMABINDING
AS
BEGIN
DECLARE @workload_group_name AS sysname
IF (SUSER_NAME() = ‘ASTEC’)
SET @workload_group_name = ‘gASTEC’
RETURN @workload_group_name
END;
GO

–another function example with application name classfied
CREATE FUNCTION dbo.rgclassifier_v1() RETURNS sysname
WITH SCHEMABINDING
AS
BEGIN
DECLARE @grp_name sysname
IF (SUSER_NAME() = ‘sa’)
SET @grp_name = ‘GroupAdmin’
IF (APP_NAME() LIKE ‘%MANAGEMENT STUDIO%’)
OR (APP_NAME() LIKE ‘%QUERY ANALYZER%’)
SET @grp_name = ‘GroupAdhoc’
IF (APP_NAME() LIKE ‘%REPORT SERVER%’)
SET @grp_name = ‘GroupReports’
RETURN @grp_name
END;
GO

5. Register the classifier function with Resource Governor.
ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION= dbo.rgclassifier_MAX_CPU);

6. Verify the classification of specific sessions
log in as the user that you specified in your classifier function, and verify the session classification by issuing the following SELECT statement
USE master;
SELECT sess.session_id, sess.login_name, sess.group_id, grps.name
FROM sys.dm_exec_sessions AS sess
JOIN sys.dm_resource_governor_workload_groups AS grps
ON sess.group_id = grps.group_id
WHERE session_id > 50;
GO

Useful scripts:
1. what workload group and resource pool in Resource Governor was assigned to each session
SELECT session_id as ‘Session ID’,
[host_name] as ‘Host Name’,
[program_name] as ‘Program Name’,
nt_user_name as ‘User Name’,
SDRGWG.[Name] as ‘Group Assigned’,
DRGRP.[name] as ‘Pool Assigned’
FROM sys.dm_exec_sessions SDES
INNER JOIN sys.dm_resource_governor_workload_groups SDRGWG
ON SDES.group_id = SDRGWG.group_id
INNER JOIN sys.dm_resource_governor_resource_pools DRGRP
ON SDRGWG.pool_id = DRGRP.pool_id

2. Assigns all new sessions to the default workload group by removing any existing classifier function from the Resource Governor configuration.
ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = NULL);
GO
ALTER RESOURCE GOVERNOR RECONFIGURE;

3.Example of store the classifier function in the master database.
USE master;
GO
SET ANSI_NULLS ON;
GO
SET QUOTED_IDENTIFIER ON;
GO
CREATE FUNCTION dbo.rgclassifier_v1() RETURNS sysname
WITH SCHEMABINDING
AS
BEGIN
— Declare the variable to hold the value returned in sysname.
DECLARE @grp_name AS sysname
— If the user login is ‘sa’, map the connection to the groupAdmin
— workload group.
IF (SUSER_NAME() = ‘sa’)
SET @grp_name = ‘groupAdmin’
— Use application information to map the connection to the groupAdhoc
— workload group.
ELSE IF (APP_NAME() LIKE ‘%MANAGEMENT STUDIO%’)
OR (APP_NAME() LIKE ‘%QUERY ANALYZER%’)
SET @grp_name = ‘groupAdhoc’
— If the application is for reporting, map the connection to
— the groupReports workload group.
ELSE IF (APP_NAME() LIKE ‘%REPORT SERVER%’)
SET @grp_name = ‘groupReports’
— If the connection does not map to any of the previous groups,
— put the connection into the default workload group.
ELSE
SET @grp_name = ‘default’
RETURN @grp_name
END
GO
— Register the classifier user-defined function and update the
— the in-memory configuration.
ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION=dbo.rgclassifier_v1);
GO
ALTER RESOURCE GOVERNOR RECONFIGURE;
GO

4. system views and dm views
sys.resource_governor_configuration: Returns the stored Resource Governor state.
sys.resource_governor_resource_pools: Returns the stored resource pool configuration. Each row of the view determines the configuration of a pool.
sys.resource_governor_workload_groups: Returns the stored workload group configuration.

sys.dm_resource_governor_workload_groups: Returns workload group statistics and the current in-memory configuration of the workload group.
sys.dm_resource_governor_resource_pools: Returns information about the current resource pool state, the current configuration of resource pools, and resource pool statistics.
sys.dm_resource_governor_configuration: Returns a row that contains the current in-memory configuration state for Resource Governor.

Reference:
1. Managing SQL Server Workloads with Resource Governor
http://msdn.microsoft.com/en-us/library/bb933866.aspx

2. Part 1: Anatomy of SQL Server 2008 Resource Governor CPU Demo
http://blogs.technet.com/b/sqlos/archive/2007/12/14/part-1-anatomy-of-sql-server-2008-resource-governor-cpu-demo.aspx

3. Part 2: Resource Governor CPU Demo on multiple CPUs
http://blogs.technet.com/b/sqlos/archive/2008/01/18/part-2-resource-governor-cpu-demo-on-multiple-cpus.aspx

4. How to: Use Resource Governor to Limit CPU Usage by Backup Compression (Transact-SQL)
http://msdn.microsoft.com/en-us/library/cc280384.aspx

5. Resource Governor DDL and System Views
http://msdn.microsoft.com/en-us/library/bb895339.aspx

March 2, 2011

Recreating spfile on ASM storage from pfile

Filed under: [backup and recovery] — Tags: , , , — zhefeng @ 2:46 pm

Sometimes when you strewed up with parameters, you need to use the pfile as stepstone to undo the changes in spfile. How does it happen if your spfile sits on ASM storage? Here is an workaround.

1. try to screw up the db parameters
SQL> show parameter memory

NAME TYPE VALUE
———————————— ———– ——————————
hi_shared_memory_address integer 0
memory_max_target big integer 1520M
memory_target big integer 1520M
shared_memory_address integer 0
SQL> alter system set memory_max_target=0 scope=spfile;
System altered.

2. now bounce the instance, db will complain about the new settings
SQL> shutdown
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORA-01078: failure in processing system parameters
ORA-00837: Specified value of MEMORY_TARGET greater than MEMORY_MAX_TARGET

3. in my case the spfile sits on ASM
ASMCMD> ls -l spfile*
Type Redund Striped Time Sys Name
N spfileorcl.ora => +DATA/ORCL/PARAMETERFILE/spfile.267.744731331

4. what we need to do is creating a pfile from spfile then modify parameter back to valid value, then start db from pfile
1). With db not up, we can create pfile from spfile:
SQL> create pfile from spfile=’+DATA/orcl/spfileorcl.ora’;
2). modify the value in pfile ‘initorcl.ora’
$ vi initorcl.ora
*.memory_max_target=1583349760
3). startup db with pfile
SQL>startup mount –now it will use the pfile

5. create the new spfile to ASM storage from “good” pfile
SQL> create spfile=’+DATA/ORCL/spfileorcl.ora’ from pfile;
File created.

6. watch the file name in ASM storage has been changed, which means we just had a new spfile:
ASMCMD> ls -l spfile*
Type Redund Striped Time Sys Name
N spfileorcl.ora => +DATA/ORCL/PARAMETERFILE/spfile.267.744733351

7. now change the pfile back to be the “bootstrap” of correct spfile
$ cat initorcl.ora
spfile=’+DATA/ORCL/spfileorcl.ora’

8. restart the database, it will pickup the correct spfile again
$ sqlplus / as sysdba
SQL> startup
ORACLE instance started.

Total System Global Area 1586708480 bytes
Fixed Size 2213736 bytes
Variable Size 973080728 bytes
Database Buffers 603979776 bytes
Redo Buffers 7434240 bytes
Database mounted.
Database opened.

SQL> show parameter spfile

NAME TYPE VALUE
———————————— ———– ——————————
spfile string +DATA/orcl/spfileorcl.ora

SQL> show parameter memory

NAME TYPE VALUE
———————————— ———– ——————————
hi_shared_memory_address integer 0
memory_max_target big integer 1520M
memory_target big integer 1520M
shared_memory_address integer 0

September 29, 2010

root.sh failed on 2nd node when installing Grid Infrastructure

Filed under: [RAC] — Tags: , , , — zhefeng @ 12:39 pm

when i was running root.sh for the last step of grid infra installation on second node, it failed (it was success on 1st node):
root.sh failed on second node with following errors
——————————————————-
DiskGroup DATA1 creation failed with the following message:
ORA-15018: diskgroup cannot be created
ORA-15072: command requires at least 1 regular failure groups, discovered only 0

Oracle gives the reason: when you are using multipathing storage for ASM, you have to pre-configure the oracleasm file as below:

On all nodes,

1. Modify the /etc/sysconfig/oracleasm with:

ORACLEASM_SCANORDER=”dm”
ORACLEASM_SCANEXCLUDE=”sd”

2. restart the asmlib by (except 1st node):
# /etc/init.d/oracleasm restart

3. deconfigure the root.sh settings on nodes except 1st node:
$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force

4. Run root.sh again on the 2nd node (or other nodes)

Oracle Metalink Doc:
11GR2 GRID INFRASTRUCTURE INSTALLATION FAILS WHEN RUNNING ROOT.SH ON NODE 2 OF RAC [ID 1059847.1]

September 27, 2010

how to deinstall the failed 11gR2 grid infrastructure

Filed under: [RAC] — Tags: , — zhefeng @ 10:39 am

Two parts are involved: first deconfigure, then deinstall

Deconfigure and Reconfigure of Grid Infrastructure Cluster:

Identify cause of root.sh failure by reviewing logs in $GRID_HOME/cfgtoollogs/crsconfig and $GRID_HOME/log, once cause is identified, deconfigure and reconfigure with steps below – please keep in mind that you will need wait till each step finishes successfully before move to next one:

For Step1 and 2, you can skip node(s) on which you didn’t execute root.sh yet.

Step 1: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force” on all nodes, except the last one.

Step 2: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force -lastnode” on last node. This command will zero out OCR and VD disk also.

Step 3: As root, run $GRID_HOME/root.sh on first node

Step 4: As root, run $GRID_HOME/root.sh on all other node(s), except last one.
Step 5: As root, run $GRID_HOME/root.sh on last node.

Deinstall of Grid Infrastructure Cluster:

Case 1: “root.sh” never ran on this cluster, then as grid user, execute $GRID_HOME/deinstall/deinstall

Case 2: “root.sh” already ran, then follow the step below – please keep in mind that you will need wait till each step finishes successfully before move to next one:

Step 1: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force” on all node, except the last one.

Step 2: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force -lastnode” on last node. This command will zero out OCR and VD disk also.

Step 3: As grid user, run $GRID_HOME/deinstall/deinstall

September 7, 2010

Oracle 10g ASM/RAW storage migration

Filed under: [RAC] — Tags: , , , , , , — zhefeng @ 9:47 am

Objective:
we want to migrate the whole shared storage from old SAN to new SAN without re-installing the whole Oracle RAC

Scenario:
1.Current structure
[Nodes]
## eth1-Public
10.0.0.101 vmrac01 vmrac01.test.com
10.0.0.102 vmrac02 vmrac02.test.com
## eth0-Private
192.168.199.1 vmracprv01 vmracprv01.test.com
192.168.199.2 vmracprv02 vmracprv02.test.com
## VIP
10.0.0.103 vmracvip01 vmracvip01.test.com
10.0.0.104 vmracvip02 vmracvip02.test.com

[Storage]
Both ORACLE_HOME are local:
ORACLE_HOME=/database/oracle/10grac/db
CRS_HOME=/database/oracle/10grac/crs

Shared LUN display (3 partitions, 2*256M for OCR&VOTING, 1*20G for ASM)
Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 32 257008+ 83 Linux
/dev/sdb2 33 64 257040 83 Linux
/dev/sdb3 65 2610 20450745 83 Linux

OCR and Voting are on RAW device: /dev/sdb1 /dev/sdb2

ASM disks
bash-3.1$ export ORACLE_SID=+ASM1
bash-3.1$ asmcmd
ASMCMD> lsdg
State Type Rebal Unbal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Name
MOUNTED EXTERN N N 512 4096 1048576 19971 17925 0 17925 0 DG1/

2. New storage (sdc 10G)
1). new LUN added
[root@vmrac01 bin]# fdisk -l

Disk /dev/sda: 26.8 GB, 26843545600 bytes
255 heads, 63 sectors/track, 3263 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 535 4192965 82 Linux swap / Solaris
/dev/sda3 536 3263 21912660 83 Linux

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 32 257008+ 83 Linux
/dev/sdb2 33 64 257040 83 Linux
/dev/sdb3 65 2610 20450745 83 Linux

Disk /dev/sdc: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

2). Partition the new LUN to 3 partitions
Disk /dev/sdc: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdc1 1 32 257008+ 83 Linux
/dev/sdc2 33 64 257040 83 Linux
/dev/sdc3 65 1305 9968332+ 83 Linux

3). clone data from previous raw disks
**shutdown db and crs first to make sure there is no change for raw disks!
#dd if=/dev/raw/raw1 of=/dev/sdc1
514017+0 records in
514017+0 records out
263176704 bytes (263 MB) copied, 252.812 seconds, 1.0 MB/s

#dd if=/dev/raw/raw2 of=/dev/sdc2
514080+0 records in
514080+0 records out
263208960 bytes (263 MB) copied, 267.868 seconds, 983 kB/s

4).”cheating” the Oracle by re-binding to new device on both nodes
**old binding
Step1: add entries to /etc/udev/rules.d/60-raw.rules
ACTION==”add”, KERNEL==”sdb1″, RUN+=”/bin/raw /dev/raw/raw1 %N”
ACTION==”add”, KERNEL==”sdb2″, RUN+=”/bin/raw /dev/raw/raw2 %N”

Step2: For the mapping to have immediate effect, run below command
#raw /dev/raw/raw1 /dev/sdb1
#raw /dev/raw/raw2 /dev/sdb2

Step3: Run the following commands and add them the /etc/rc.local file.
#chown oracle:dba /dev/raw/raw1
#chown oracle:dba /dev/raw/raw2
#chmod 660 /dev/raw/raw1
#chmod 660 /dev/raw/raw2
#chown oracle:dba /dev/sdb1
#chown oracle:dba /dev/sdb2
#chmod 660 /dev/sdb1
#chmod 660 /dev/sdb2

**new binding on both node
Step1: editing /etc/udev/rules.d/60-raw.rules
ACTION==”add”, KERNEL==”sdc1″, RUN+=”/bin/raw /dev/raw/raw1 %N”
ACTION==”add”, KERNEL==”sdc2″, RUN+=”/bin/raw /dev/raw/raw2 %N”

Step2: mapping immediately
#raw /dev/raw/raw1 /dev/sdc1
#raw /dev/raw/raw2 /dev/sdc2

Step3:permission and edit /etc/rc.local
#chown oracle:dba /dev/raw/raw1
#chown oracle:dba /dev/raw/raw2
#chmod 660 /dev/raw/raw1
#chmod 660 /dev/raw/raw2
#chown oracle:dba /dev/sdc1
#chown oracle:dba /dev/sdc2
#chmod 660 /dev/sdc1
#chmod 660 /dev/sdc2

5). startup crs and oracle db, check the database, everything works fine after switching the raw disks!

3. ASM disk group migration
1). Mark the new disk sdc3 on one node
# /etc/init.d/oracleasm createdisk VOL2 /dev/sdc3
Marking disk “/dev/sdc3” as an ASM disk: [ OK ]

2). scan disk on the other node
[root@vanpgvmrac02 bin]# /etc/init.d/oracleasm scandisks
Scanning system for ASM disks: [ OK ]

3). now verify the new disk was marked on both node
[root@vmrac01 disks]# /etc/init.d/oracleasm listdisks
VOL1
VOL2

[root@vmrac02 bin]# /etc/init.d/oracleasm listdisks
VOL1
VOL2

4). add new disk to DISKGROUP (under asm instance)
$export ORACLE_SID=+ASM1
$sqlplus / as sysdba
sql>alter diskgroup DG1 add disk VOL2
–wait rebalancing
sql>select * from v$asm_operation

5). remove old disk from DISKGROUP
sql>alter diskgroup DG1 drop disk VOL1
–wait until rebalancing finished
sql>select * from v$asm_operation
GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR
———— ————— ———— ———- ———- ———-
EST_WORK EST_RATE EST_MINUTES
———- ———- ———–
1 REBAL RUN 1 1 2
1374 30 45

6). verify the database and asm, everything is ok!

7). clean-up the old disk confiruations
[root@vmrac01 bin]# /etc/init.d/oracleasm deletedisk VOL1
Removing ASM disk “VOL1”: [ OK ]
[root@vmrac01 bin]# /etc/init.d/oracleasm listdisks
VOL2

[root@vmrac02 ~]# /etc/init.d/oracleasm scandisks
Scanning system for ASM disks: [ OK ]
[root@vmrac02 ~]# /etc/init.d/oracleasm listdisks
VOL2

8). wipe-off the partitions for sdb.

Reference:
1. Exact Steps To Migrate ASM Diskgroups To Another SAN Without Downtime. [ID 837308.1]
2. Previous doc “VMRAC installation” task 130.2008.09.12
3. OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE), including moving from RAW Devices to Block Devices. [ID 428681.1]
4. ASM using ASMLib and Raw Devices
http://www.oracle-base.com/articles/10g/ASMUsingASMLibAndRawDevices.php

June 9, 2010

Sth. about checkpoint

Filed under: 1. Oracle, [System Performance tuning] — Tags: , , — zhefeng @ 2:31 pm

reading a article about checkpoint on metalink(Checkpoint Tuning and Troubleshooting Guide [ID 147468.1])

Here are some good points for checkpoint:

Oracle writes the dirty buffers to disk only on certain conditions:
– A shadow process must scan more than one-quarter of the db_block_buffer
parameter.
– Every three seconds.
– When a checkpoint is produced.

A checkpoint is realized on five types of events:
– At each switch of the redo log files.
– When the delay for LOG_CHECKPOINT_TIMEOUT is reached.
– When the size in bytes corresponding to :
(LOG_CHECKPOINT_INTERVAL* size of IO OS blocks)
is written on the current redo log file.
– Directly by the ALTER SYSTEM SWITCH LOGFILE command.
– Directly with the ALTER SYSTEM CHECKPOINT command.

During a checkpoint the following occurs:
– The database writer (DBWR) writes all modified database
blocks in the buffer cache back to datafiles,
– Checkpoint process (ckpt) updates the headers of all
the datafiles to indicate when the last checkpoint
occurred (SCN)

Older Posts »

Create a free website or blog at WordPress.com.