Posted By

g8rpal on 09/30/10


Tagged


Versions (?)

Who likes this?

2 people have marked this snippet as a favorite

Tyster
michanne


Object Explore


 / Published in: SQL
 

URL: http://www.sqlservercentral.com/scripts/Data+exploration/71008/

USAGE First run this script is the database where you want to store this procedure Then you can run the stored procedure with the following statement:

EXEC sp_ObjectExplore 'database.schema.tablename'

You do not need to supply databasename and schema (example: dbo) when table is in same database as this procedure.

If you want additional metrics concerning the distribution of all numeric variables, execute with parameter @distribution='yes'. Example: EXEC sp_ObjectExplore 'database.schema.tablename' , @distribution='yes'

If you are only interested in a subset of columns, you can specify them in the parameter @columns Columnnames must be separated by a comma. You can specify them either with or without brackets [] Columnnames may not contain commas. Example: EXEC sp_ObjectExplore 'database.schema.tablename' , @distribution='yes' , @columns='column1, column2, etc'

PURPOSE Provide summary information and metrics for any column in a given Table/View Metrics included are: * Total number of records in Object (Records) * Number of values in column (Cnt) * Number of unique values in column (CntDist) * Number of NULL values in column (NullValues) * Min and Max value in column (Min/Max) * Average value in column (Avg) * Standard Deviation in column (StDev) * Number of numeric values in column (IsNum) * Number of integer values in column (IsInt) Optional metrics (only computed if procedure is executed with parameter @distribution='yes' * First Quartile/25% (Q1) * Median (Median) * Third Quartile/75% (Q3) * Interquartile range (IQR) * Skew (Skew) * Kurtosis (Kurt)

  1. CREATE PROCEDURE [dbo].[sp_ObjectExplore]
  2. @FullObjectName varchar(200)
  3. , @Distribution varchar(3) = 'No'
  4. , @COLUMNS varchar(max) = NULL
  5. AS
  6. -- ______________________________________________________ Object Explore _________________________________________________________
  7. --
  8. -- August 19th 2010, by Robin van Schaik
  9. -- Version 1.6 BETA
  10. --
  11. -- USAGE
  12. -- First run this script is the database where you want to store this procedure
  13. -- Then you van run the stored procedure with the following statement:
  14. --
  15. -- EXEC sp_ObjectExplore 'database.schema.tablename'
  16. --
  17. -- You do not need to supply databasename and schema (example: dbo) when table is in same database as this procedure.
  18. --
  19. -- If you want additional metrics concerning the distribution of all numeric variables, execute with parameter @distribution='yes'.
  20. -- Example:
  21. -- EXEC sp_ObjectExplore 'database.schema.tablename'
  22. -- , @distribution='yes'
  23. --
  24. -- If you are only interested in a subset of columns, you can specify them in the parameter @columns
  25. -- Columnnames must be separated by a comma. You can specify them either with or without brackets []
  26. -- Columnnames may not contain commas.
  27. -- Example
  28. -- EXEC sp_ObjectExplore 'database.schema.tablename'
  29. -- , @distribution='yes'
  30. -- , @columns='column1, column2, etc'
  31. --
  32. -- PURPOSE
  33. -- Provide summary information and metrics for any column in a given Table/View
  34. -- Metrics included are:
  35. -- * Total number of records in Object (Records)
  36. -- * Number of values in column (Cnt)
  37. -- * Number of unique values in column (CntDist)
  38. -- * Number of NULL values in column (NullValues)
  39. -- * Min and Max value in column (Min/Max)
  40. -- * Average value in column (Avg)
  41. -- * Standard Deviation in column (StDev)
  42. -- * Number of numeric values in column (IsNum)
  43. -- * Number of integer values in column (IsInt)
  44. -- Optional metrics (only computed if procedure is executed with parameter @distribution='yes'
  45. -- * First Quartile/25% (Q1)
  46. -- * Median (Median)
  47. -- * Third Quartile/75% (Q3)
  48. -- * Interquartile range (IQR)
  49. -- * Skew (Skew)
  50. -- * Kurtosis (Kurt)
  51. --
  52. -- VERSIONS
  53. -- 1.0: 01-JUL-2009.
  54. -- 1.1: 06-JUL-2009. Changed logic to determine primary key. Script did not work on columns
  55. -- with multiple indexes (duplicate rows)
  56. -- 1.2: 06-AUG-2009. Min and Max values for date fields converted (back) to yyyy-mm-dd hh:mi:ss(24h)
  57. -- Fixing bug for all numeric columnnames (added brackets [] to query)
  58. -- Prevent executing count function over certain datatypes (like xml, text, etc).
  59. -- 1.3: 31-OCT-2009. Fixed bug. Script did not run when table has a columnname with space(s).
  60. -- 1.4: 01-NOV-2009. Prevent executing min and max function over certain datatypes.
  61. -- Added Average, Standard Deviation and percentage of NULL values.
  62. -- 1.5: 12-NOV-2009. Added IsNum and IsInt variables. Isnum counts the number of records which
  63. -- comply with the IsNumeric() statement (numbers including period, $, hiven, etc.)
  64. -- IsInt counts the number of records that contain numeric values (only numbers).
  65. -- 1.6 BETA: 19-AUG-2010.
  66. -- Added Distribution variables for every numeric value. Quartiles, median, IQR, Skew and Kurtosis.
  67. -- These statistics are only computed when procedure is executed with parameter @distribution='yes'
  68. -- These stats are not computed for columns that are (or appear to be) ID's.
  69. -- Added possibility to run this only for a subset of columns, using @columns parameter.
  70. -- *******************************************************************************************************************************
  71.  
  72. SET NOCOUNT ON
  73.  
  74. /*
  75.  STEP 1: Initialize Procedure
  76. */
  77. DECLARE @Object varchar(200)
  78. DECLARE @Schema varchar(200)
  79. DECLARE @DATABASE varchar(200)
  80.  
  81. -- Break down parameter in Database/Schema/Object
  82. SET @Object = PARSENAME(@FullObjectName,1)
  83. SET @Schema = ISNULL(PARSENAME(@FullObjectName,2),'dbo')
  84. SET @DATABASE = PARSENAME(@FullObjectName,3)
  85.  
  86.  
  87. -- Creating temporary table to store queries used for calculating metrics
  88. CREATE TABLE #ObjectDef
  89. ( ObjectName varchar(200)
  90. , Columnname varchar(128)
  91. , ColumnId int
  92. , Query varchar(max)
  93. , DataType varchar(50)
  94. , MaxLength int
  95. , [Precision] int
  96. , Scale int
  97. , IsNullable tinyint
  98. , IsPrimaryKey tinyint
  99. , IsIdentity tinyint
  100. , SelectedColumn bit
  101. )
  102.  
  103. -- Temporary table used for results/output
  104. CREATE TABLE #ObjectAudit
  105. ( ObjectName varchar(200)
  106. , Columnname varchar(128)
  107. , ColumnId int
  108. -- Column Specifications
  109. , DataType varchar(50)
  110. , MaxLength int
  111. , [Precision] int
  112. , Scale int
  113. , IsNullable tinyint
  114. , IsPrimaryKey tinyint
  115. , IsIdentity tinyint
  116. -- Metrics
  117. , Records bigint -- All records in table
  118. , Cnt bigint -- Number of values in column
  119. , [IsNum] bigint -- Is numeric (including period, $, - etc.)
  120. , [IsInt] bigint -- Is integer
  121. , CntDist bigint -- Number of distinct values in column
  122. , NullValues bigint -- Number of Null values in column
  123. , [Min] nvarchar(4000) -- Min value in column
  124. , [Max] nvarchar(4000) -- Max value in column
  125. , [Avg] decimal(38,5) -- Average value in column
  126. , [StDev] decimal(38,5) -- Standard Deviation
  127. )
  128.  
  129. -- Insert all columns for object in temp table
  130. -- Check appropiate Schema and Database
  131. EXEC(
  132. 'INSERT INTO #ObjectDef (ObjectName, Columnname, ColumnId, DataType, MaxLength,[Precision],Scale,IsNullable,IsPrimaryKey, IsIdentity )
  133. SELECT '''
  134. +@FullObjectName+'''
  135. , b.name
  136. , b.column_id
  137. , type_name(user_type_id) AS DataType
  138. , convert(int, b.max_length) as max_length
  139. , b.[precision]
  140. , b.scale
  141. , b.is_nullable
  142. , isnull(c.PrimaryKey,0)
  143. , b.is_identity
  144. FROM
  145. ' +@DATABASE+'.sys.objects a
  146. INNER JOIN
  147. ' +@DATABASE+'.sys.columns b
  148. ON a.object_id=b.object_id
  149. LEFT OUTER JOIN
  150. (
  151. SELECT
  152. a.object_id
  153. , 1 as PrimaryKey
  154. , c.column_id
  155. FROM
  156. '+@DATABASE+'.sys.indexes a
  157. INNER JOIN
  158. '+@DATABASE+'.sys.objects b
  159. ON a.object_id=b.parent_object_id
  160. AND a.name=b.name
  161. INNER JOIN
  162. '+@DATABASE+'.sys.index_columns c
  163. ON a.index_id=c.index_id
  164. AND b.parent_object_id=c.object_id
  165. WHERE
  166. b.type=''PK''
  167. ) c
  168. ON a.object_id=c.object_id
  169. AND b.column_id=c.column_id
  170. INNER JOIN
  171. ' +@DATABASE+'.sys.schemas d
  172. ON a.schema_id=d.schema_id
  173. WHERE
  174. a.name = '''+@Object+'''
  175. AND d.name = '''+@Schema+'''
  176. '
  177. )
  178.  
  179. -- Parse string of requested Columns
  180. -- If columns are selected the variable SelectedColumn in #ObjectDef will be set to 1
  181. -- All other columns are set to 0.
  182. -- If no columns are selected the variable SelectedColumn will remain NULL
  183. IF @COLUMNS IS NOT NULL
  184. BEGIN
  185. CREATE TABLE #columns
  186. (ColumnName varchar(128))
  187.  
  188. DECLARE @ColumnInsert varchar(max)
  189. DECLARE @ColumnInsertTrim varchar(130)
  190.  
  191. SET @COLUMNS=','+@COLUMNS -- adding comma in able to maintain logic below if only one column name is supplied
  192.  
  193. WHILE CHARINDEX(',',@COLUMNS)>0
  194. BEGIN
  195. SET @ColumnInsert=REVERSE(LEFT(REVERSE(@COLUMNS),charindex(',',REVERSE(@COLUMNS),1)-1))
  196. SET @ColumnInsertTrim=LTRIM(RTRIM(@ColumnInsert))
  197. INSERT INTO #columns (ColumnName)
  198. -- If columnnames are specified between brackets, remove brackets:
  199. SELECT CASE WHEN @ColumnInsertTrim LIKE '[[]%]' THEN SUBSTRING(@ColumnInsertTrim,2,DATALENGTH(@ColumnInsertTrim)-2) ELSE @ColumnInsertTrim END
  200. SET @COLUMNS=REPLACE(@COLUMNS,','+@ColumnInsert,'')
  201. END
  202. UPDATE #ObjectDef SET SelectedColumn=1 WHERE Columnname in (SELECT Columnname FROM #columns)
  203. UPDATE #ObjectDef SET SelectedColumn=0 WHERE SelectedColumn IS NULL
  204. END
  205.  
  206.  
  207. -- Validate input parameters
  208.  
  209. -- If Object does not exist, end script
  210. IF NOT EXISTS (SELECT top 1 ObjectName FROM #ObjectDef)
  211. BEGIN
  212. DROP TABLE
  213. #ObjectDef
  214. , #ObjectAudit
  215. PRINT 'Object '+UPPER(@FullObjectName)+' does not exist! Please enter a valid object name.'
  216. RETURN
  217. END
  218.  
  219. -- If one or more of the specified columns do not exist, end script
  220.  
  221. -- or columns specified in @Columns?
  222. IF @COLUMNS IS NOT NULL
  223. BEGIN
  224. -- If so check one or more columns do not exist in the specified object
  225. IF EXISTS (
  226. SELECT top 1
  227. a.ColumnName
  228. FROM
  229. #Columns a
  230. LEFT OUTER JOIN
  231. #ObjectDef b
  232. ON a.ColumnName=b.ColumnName
  233. WHERE b.ColumnName IS NULL
  234. )
  235. -- Are there missing columns? End Script
  236. BEGIN
  237. -- identify missing columns
  238. SELECT
  239. a.ColumnName
  240. INTO #MissingColumns
  241. FROM
  242. #Columns a
  243. LEFT OUTER JOIN
  244. #ObjectDef b
  245. ON a.ColumnName=b.ColumnName
  246. WHERE
  247. b.ColumnName IS NULL
  248.  
  249. DECLARE @MaxMissingColumn varchar(128) SET @MaxMissingColumn=''
  250. DECLARE @MissingColumns varchar(max) SET @MissingColumns=''
  251. -- Building string to output missing columns to user
  252. WHILE (SELECT COUNT(ColumnName) FROM #MissingColumns)>0
  253. BEGIN
  254. SET @MaxMissingColumn=(SELECT MAX(ColumnName) FROM #MissingColumns)
  255. SET @COLUMNS=REPLACE(@COLUMNS,@MaxMissingColumn,'')
  256. DELETE FROM #MissingColumns WHERE ColumnName=@MaxMissingColumn
  257. SET @MissingColumns=','+UPPER(@MaxMissingColumn)+@MissingColumns
  258. END
  259.  
  260. DROP TABLE
  261. #ObjectDef
  262. , #ObjectAudit
  263. , #Columns
  264. -- output to user:
  265. PRINT 'One or more of the requested columns do not exist in object: ' + UPPER(@FullObjectName)+
  266. CHAR(13)+'Invalid column(s): '+SUBSTRING(@MissingColumns,2,LEN(@MissingColumns)-1)
  267.  
  268. RETURN
  269. END
  270. END
  271.  
  272. /*
  273.  STEP 2: Defining queries for calculating column metrics
  274. */
  275.  
  276. -- First determine which datatypes are suitable for operators count, min, max and avg
  277. -- Add select statement per column in temp table
  278. -- Naming convention for the metrics:
  279. -- [ColumnId]_CNT (number of values)
  280. -- _CNTDIST (distinct values), _MIN, _MAX, _Nulls (number of null values), _AVG
  281.  
  282.  
  283. -- available datatypes
  284. SELECT
  285. name
  286. INTO #datatypes
  287. FROM sys.types
  288.  
  289. ALTER TABLE #datatypes
  290. ADD O_COUNT tinyint
  291. , O_MIN tinyint
  292. , O_NUM tinyint
  293. , O_IsNUM tinyint
  294.  
  295.  
  296. -- COUNT function
  297. UPDATE #datatypes
  298. SET O_COUNT=1
  299. WHERE name IN
  300. (
  301. 'bigint','binary','bit','char','date','datetime','datetime2'
  302. , 'datetimeoffset','decimal','float','hierarchyid','int'
  303. , 'money','nchar','numeric','nvarchar','real','smalldatetime'
  304. , 'smallint','smallmoney','sql_variant','sysname','time'
  305. , 'timestamp','tinyint','uniqueidentifier','varbinary'
  306. , 'varchar'
  307. )
  308.  
  309. -- MIN/MAX function
  310. UPDATE #datatypes
  311. SET O_MIN =1
  312. WHERE name IN
  313. (
  314. 'int','bigint','binary','bit','char','date','datetime'
  315. , 'datetime2','datetimeoffset','decimal','float','hierarchyid'
  316. , 'int','money','nchar','numeric','nvarchar','real'
  317. , 'smalldatetime','smallint','smallmoney','sql_variant'
  318. , 'sysname','time','tinyint','varbinary','varchar'
  319. )
  320.  
  321. -- NUMERIC FUNCTIONS (avg, stdev, etc.)
  322. UPDATE #datatypes
  323. SET O_NUM =1
  324. WHERE name IN
  325. (
  326. 'int','smallint','bigint','tinyint','float','decimal','numeric'
  327. , 'money','smallmoney','real'
  328. )
  329.  
  330. -- ISNUMERIC / ISINT function
  331. UPDATE #datatypes
  332. SET O_ISNUM =1
  333. WHERE name IN
  334. (
  335. 'bigint','binary','bit','char','datetime','decimal','float','int'
  336. , 'money','nchar','numeric','nvarchar','real','smalldatetime'
  337. , 'smallint','smallmoney','sysname','tinyint','uniqueidentifier'
  338. , 'varbinary','varchar'
  339. )
  340.  
  341.  
  342. UPDATE #ObjectDef
  343. SET query=
  344. CASE
  345. WHEN datatype IN (SELECT name FROM #datatypes WHERE o_count=1)
  346. THEN
  347. 'COUNT(['+columnname+']) as ['+CAST(columnid AS varchar)+'_CNT]'
  348. +
  349. ',COUNT(distinct ['+columnname+']) as ['+CAST(columnid AS varchar)+'_CNTDIST]'
  350. ELSE 'NULL as ['+CAST(columnid AS varchar)+'_CNT], NULL as ['+CAST(columnid AS varchar)+'_CNTDIST]'
  351.  
  352. END
  353. -- Min and Max statement not posssible on datatype "Bit"
  354. -- Therefore, convert this datatype to a varchar
  355. +
  356. CASE
  357. WHEN datatype IN (SELECT name FROM #datatypes WHERE o_min=1)
  358. THEN ', MIN('+
  359. CASE
  360. WHEN datatype = 'bit'
  361. THEN 'cast(['+columnname+'] as int)'
  362. WHEN datatype IN ('datetime','datetime2','smalldatetime','date','datetimeoffset','hierarchyid','sql_variant')
  363. THEN 'convert(varchar,['+columnname+'],120)' -- convert date to format yyyy-mm-dd hh:mm:ss
  364. ELSE '['+columnname+']'
  365. END+')'
  366. ELSE ', NULL ' -- MIN function not available for datatype
  367. END +' as ['+CAST(columnid AS varchar)+'_MIN]'
  368. +
  369. CASE
  370. WHEN datatype IN (SELECT name FROM #datatypes WHERE o_min=1)
  371. THEN ', MAX('+
  372. CASE
  373. WHEN datatype = 'bit'
  374. THEN 'cast(['+columnname+'] as int)'
  375. WHEN datatype IN ('datetime','datetime2','smalldatetime','date','datetimeoffset','hierarchyid','sql_variant')
  376. THEN 'convert(varchar,['+columnname+'],120)'
  377. ELSE '['+columnname+']'
  378. END+')'
  379. ELSE ', NULL ' -- MAX function not available for datatype
  380. END +' as ['+CAST(columnid AS varchar)+'_MAX]'
  381. +',SUM(CASE WHEN ['+columnname+'] is null then 1 else 0 end) as ['+CAST(columnid AS varchar)+'_Nulls]'
  382. + CASE
  383. WHEN datatype IN (SELECT name FROM #datatypes WHERE o_num=1)
  384. -- AVG for ID's not particularly usefull, therefore filter them out:
  385. AND IsPrimaryKey=0
  386. AND IsIdentity=0
  387. THEN ', AVG(['+columnname+']/1.0)'
  388. -- AVG function not available for datatype
  389. ELSE ', NULL' END + ' as ['+CAST(columnid AS varchar)+'_AVG]'
  390. + CASE
  391. WHEN datatype IN (SELECT name FROM #datatypes WHERE o_num=1)
  392. -- Standard Deviation for ID's not particularly usefull, therefore filter them out:
  393. AND IsPrimaryKey=0
  394. AND IsIdentity=0
  395. THEN ', STDEV(['+columnname+']/1.0)'
  396. -- STDDEV function not available for datatype
  397. ELSE ', NULL' END + ' as ['+CAST(columnid AS varchar)+'_DEV]'
  398. + CASE
  399. WHEN datatype IN (SELECT name FROM #datatypes WHERE o_isnum=1)
  400. THEN ', SUM(ISNUMERIC(['+columnname+']))'
  401. -- SUM function not available for datatype
  402. ELSE ', NULL' END + ' as ['+CAST(columnid AS varchar)+'_ISNUM]'
  403. + CASE
  404. WHEN datatype IN (SELECT name FROM #datatypes WHERE o_isnum=1)
  405. THEN ', SUM(CASE WHEN ['+columnname+'] like ''%[^0-9]%'' OR ['+columnname+'] IS NULL then 0 else 1 end)'
  406. -- Not available for datatype
  407. ELSE ', NULL' END + ' as ['+CAST(columnid AS varchar)+'_ISINT]'
  408.  
  409.  
  410. -- Concatenating the individual select statement per column to one select statement for entire object
  411. DECLARE @Query varchar(max)
  412. DECLARE @Sql varchar(max)
  413. DECLARE @Sql2 varchar(max)
  414. DECLARE @Sql3 varchar(max)
  415.  
  416. -- Start Cursor c_Query
  417. DECLARE c_Query CURSOR FAST_FORWARD FOR
  418. SELECT Query FROM #ObjectDef WHERE isnull(SelectedColumn,1)<>0
  419.  
  420.  
  421. OPEN c_Query
  422. FETCH NEXT FROM c_Query INTO @Query
  423. WHILE @@FETCH_STATUS = 0
  424.  
  425. BEGIN
  426. SET @sql =cast(@Query AS varchar(max))
  427. -- Adding current statement to previous statement. Last run results in one select statement.
  428. SET @Sql2=cast(@sql AS varchar (max))+','+ISNULL(@Sql2,'')
  429.  
  430. FETCH NEXT FROM c_Query INTO @Query
  431. END
  432.  
  433. CLOSE c_Query
  434. DEALLOCATE c_Query
  435. -- End Cursor c_Query
  436.  
  437. -- Adding SELECT and FROM Statement, resulting in a query
  438. SET @Sql3='select count(*) as records, '+LEFT(ltrim(rtrim(@Sql2)),LEN(ltrim(rtrim(@Sql2)))-1)
  439. +'into ##ObjectSingleRow from '+@FullObjectName
  440.  
  441. -- Execute the final query
  442. -- This results in a one-row table with a column for every combination of metric/column
  443. -- USING Exec statement sets up a new connection, therefore results are added to a
  444. -- Global temporary table (##)
  445. EXEC(@sql3)
  446.  
  447. /*
  448.  STEP 3: Output
  449. */
  450. -- Add columns and column specifications to temp table
  451. INSERT INTO #ObjectAudit
  452. ( ObjectName
  453. , Columnname
  454. , ColumnId
  455. , DataType
  456. , MaxLength
  457. , [Precision]
  458. , Scale
  459. , IsNullable
  460. , IsPrimaryKey
  461. , IsIdentity
  462. )
  463. SELECT
  464. ObjectName
  465. , ColumnName
  466. , ColumnId
  467. , DataType
  468. , MaxLength
  469. , [Precision]
  470. , Scale
  471. , IsNullable
  472. , IsPrimaryKey
  473. , IsIdentity
  474. FROM
  475. #ObjectDef
  476. WHERE
  477. isnull(SelectedColumn,1)<>0
  478.  
  479. DECLARE @COLUMN varchar(5)
  480.  
  481. DECLARE c_update CURSOR FAST_FORWARD FOR
  482. SELECT ColumnId FROM #ObjectAudit
  483.  
  484. -- Start cursor c_update
  485. OPEN c_update
  486. FETCH NEXT FROM c_update INTO @COLUMN
  487. WHILE @@FETCH_STATUS = 0
  488.  
  489. BEGIN
  490. -- Define Update Query for updating metric columns
  491. SET @Sql = 'UPDATE #ObjectAudit
  492. SET
  493. Records=(select records from ##ObjectSingleRow)
  494. , Cnt=(select ['+@COLUMN+'_CNT] from ##ObjectSingleRow)
  495. , IsNum=(select ['+@COLUMN +'_ISNUM] from ##ObjectSingleRow)
  496. , IsInt=(select ['+@COLUMN +'_ISINT] from ##ObjectSingleRow)
  497. , CntDist=(select ['+@COLUMN +'_CNTDist] from ##ObjectSingleRow)
  498. , Min=(select ['+@COLUMN +'_MIN] from ##ObjectSingleRow)
  499. , Max=(select ['+@COLUMN +'_MAX] from ##ObjectSingleRow)
  500. , NullValues=(select ['+@COLUMN+'_Nulls] from ##ObjectSingleRow)
  501. , Avg=(select ['+@COLUMN +'_AVG] from ##ObjectSingleRow)
  502. , StDev=(select ['+@COLUMN +'_DEV] from ##ObjectSingleRow)
  503. WHERE columnid= '+@COLUMN+''
  504. -- Executing above update statement
  505. EXEC(@Sql)
  506.  
  507. FETCH NEXT FROM c_update INTO @COLUMN
  508.  
  509. END
  510.  
  511. CLOSE c_update
  512. DEALLOCATE c_update
  513. -- END CURSOR c_update
  514.  
  515.  
  516. -- DISTRIBUTION
  517. -- Only calculate distribution variables if procedure is executed with parameter 'distribution'
  518. -- Do not calculate for key/ID variables
  519. IF EXISTS (SELECT top 1 ColumnName FROM #ObjectAudit
  520. WHERE
  521. Datatype IN (
  522. SELECT
  523. name
  524. FROM
  525. #datatypes
  526. WHERE o_num=1
  527. )
  528. AND IsPrimaryKey=0
  529. AND IsIdentity=0
  530. -- filter out variables which are probably Key / ID variables:
  531. AND NOT( IsInt=CNT
  532. AND CAST([Max] AS bigint)-CAST([Min] AS bigint)+1=[Cnt]
  533. AND CNTDist=Cnt
  534. )
  535. )
  536. AND LOWER(@Distribution)='yes'
  537. BEGIN
  538.  
  539. ALTER TABLE #ObjectAudit
  540. ADD Q1 decimal(38,5), Median decimal(38,5), Q3 decimal(38,5), IQR decimal(38,5), Skew numeric(38,3), Kurt numeric(38,3)
  541. DECLARE @ColumnName AS varchar(128)
  542.  
  543. DECLARE c_ntiles cursor fast_forward FOR
  544. SELECT
  545. ColumnName
  546. FROM
  547. #ObjectAudit
  548. WHERE
  549. datatype IN (
  550. SELECT
  551. Name
  552. FROM
  553. #datatypes
  554. WHERE
  555. o_num=1
  556. )
  557. AND Cnt>=4 -- to avoid divide by 0 error when calculating kurtosis
  558. AND IsPrimaryKey=0
  559. AND IsIdentity=0
  560. -- filter out variables which are probably Key / ID variables:
  561. AND NOT( IsInt=CNT
  562. AND CAST([Max] AS bigint)-CAST([Min] AS bigint)+1=[Cnt]
  563. AND CNTDist=Cnt
  564. )
  565.  
  566. OPEN c_ntiles
  567.  
  568. FETCH NEXT FROM c_ntiles INTO @ColumnName
  569. WHILE @@FETCH_STATUS = 0
  570.  
  571. BEGIN
  572. SET @SQL = '
  573. DECLARE @CNT as decimal(38,5)
  574. DECLARE @STDEV as decimal(38,5)
  575. DECLARE @AVG as decimal(38,5)
  576. DECLARE @CORR_SKEW as numeric(10,9)
  577. DECLARE @CORR_KURT as numeric(10,9)
  578. DECLARE @SUBFACT as numeric(10,9)
  579. DECLARE @Q1 as int
  580. DECLARE @Q2a as decimal(38,5)
  581. DECLARE @Q2b as decimal(38,5)
  582. DECLARE @Q3 as int
  583.  
  584. SET @CNT=(SELECT Cnt FROM #ObjectAudit WHERE ColumnName='''+@ColumnName+''')
  585. SET @CORR_SKEW=(@CNT)/(@CNT-1)/(@CNT-2)
  586. SET @CORR_KURT=(@CNT)*(@CNT+1)/(@CNT-1)/(@CNT-2)/(@CNT-3)
  587. SET @SUBFACT=3*SQUARE((@CNT-1))/(@CNT-2)/(@CNT-3)
  588.  
  589. SET @STDEV=(SELECT stDEV FROM #ObjectAudit WHERE ColumnName='''+@ColumnName+''')
  590. SET @AVG=(SELECT Avg FROM #ObjectAudit WHERE ColumnName='''+@ColumnName+''')
  591. SET @Q1 = round(0.25*(@cnt+1),0)
  592. SET @Q2a = 0.50*(@cnt+@cnt%2)
  593. SET @Q2b = 0.50*(@cnt+@cnt%2)+(@cnt+1)%2
  594. SET @Q3 = round(0.75*(@cnt+1),0)
  595.  
  596. SELECT ['+@ColumnName+'], row_number() OVER (partition by 1 ORDER BY ['+@ColumnName+']) as rownum
  597. , POWER((['+@ColumnName+']*1.0-@AVG)/@STDEV,3) as Skew
  598. , SQUARE(SQUARE(((['+@ColumnName+']-@AVG)/@STDEV))) as Kurt
  599. INTO #quartiles_rows
  600. FROM '+@FullObjectName+' where ['+@ColumnName+']>0
  601.  
  602.  
  603. SELECT
  604. MAX(CASE rownum WHEN @Q1 THEN ['+@ColumnName+'] ELSE null END) as Q1
  605. , MAX(CASE rownum when @Q2a THEN ['+@ColumnName+'] ELSE null END) as Q2a
  606. , MAX(CASE rownum when @Q2b THEN ['+@ColumnName+'] ELSE null END) as Q2b
  607. , MAX(CASE rownum when @Q3 THEN ['+@ColumnName+'] ELSE null END) as Q3
  608. , SUM(Skew)*@CORR_SKEW as Skew
  609. , SUM(Kurt)*@CORR_KURT-@SUBFACT as Kurt
  610. INTO ##quartiles
  611. FROM #quartiles_rows
  612. '
  613. EXEC (@SQL)
  614.  
  615. UPDATE #ObjectAudit
  616. SET Q1=##Quartiles.Q1
  617. , Median=(##Quartiles.Q2a+##Quartiles.Q2b)/2
  618. , Q3=##Quartiles.Q3
  619. , Skew=##Quartiles.Skew
  620. , Kurt=##Quartiles.Kurt
  621. , IQR=##Quartiles.Q3-##Quartiles.Q1
  622. FROM
  623. ##Quartiles
  624. WHERE
  625. ColumnName=@ColumnName
  626.  
  627. DROP TABLE ##Quartiles
  628.  
  629. FETCH NEXT FROM c_ntiles INTO @columnName
  630.  
  631. END
  632.  
  633. CLOSE c_ntiles
  634. DEALLOCATE c_ntiles
  635.  
  636. END
  637.  
  638.  
  639. -- Output to screen
  640. SELECT *
  641. , CAST(CAST(NullValues AS decimal(38,2))/records*100 AS decimal(38,2)) AS NullPerc
  642. , CASE WHEN CNT>0 THEN CAST(CAST(CNTDist AS decimal(38,2))/CNT*100 AS decimal(38,2)) ELSE NULL END AS DistinctPerc
  643. FROM
  644. #ObjectAudit
  645.  
  646. -- Clean up!
  647. DROP TABLE
  648. #ObjectDef
  649. , #ObjectAudit
  650. , ##ObjectSingleRow
  651. /*
  652.  End Of Script
  653. */
  654.  
  655.  
  656.  
  657.  
  658.  
  659.  
  660.  
  661. GO

Report this snippet  

You need to login to post a comment.