09 Jul

Find columns with non-null data

Have tables with loads of columns and lots of data – mostly null – and want to know which columns across them actually has any data? Try this. Oracle specific though.

First, create a table that contains all the tables and columns you want to check – and some columns for processing output. You’ll monitor this table to check progress, and get the output from here in the end. Lets call this table COLS_TO_CHECK. You must have dba rights for this.

create table COLS_TO_CHECK as (
   select
      col.owner as schema_name,
      col.table_name,
      col.column_name,
      col.nullable,
      'N' as processed,
      ' ' as not_null
   from
      sys.dba_tab_columns col
   where
      col.owner = 'SCHEMA_OWNER_OF_TABLES_TO_CHECK'
      and col.table_name in ('LIST','OF','TABLES')
);

You can check progress later in this table – it will set “processed” to “Y” when done, and “not_null” to either “Y” or “N”, depending on whether that column has any data in it (“Y”) or whether all rows are null (“N”).

To actually check the data, run this PL/SQL (its going to be slow!). If it dies or gets killed, you can just re-run – it will carry on from where it last checked.

declare
   l_test_statement varchar2(32767);
   l_contains_value pls_integer;

   cursor c is
      select schema_name, table_name, column_name, nullable
      from COLS_TO_CHECK -- the table created above
      where processed = 'N';

begin
   for r in c
   loop
      if r.nullable = 'N'
      then
         update COLS_TO_CHECK set processed='Y', not_nul ='Y'
         where schema_name=r.schema_name
            and table_name=r.table_name
            and column_name=r.column_name;
         commit;
      else
         begin
            l_test_statement := 'select 1 from dual where exists (select 1 from ' || r.schema_name || '.' || r.table_name || ' where ' || r.column_name || 'is not null)';
            execute immediate l_test_statement
               into l_contains_value;
            update COLS_TO_CHECK set processed='Y', not_null='Y'
            where schema_name=r.schema_name
               and table_name=r.table_name
               and column_name=r.column_name;
            commit;
         exception
            when no_data_found then
               update COLS_TO_CHECK set processed='Y', not_null='N'
               where schema_name=r.schema_name
                  and table_name=r.table_name
                  and column_name=r.column_name;
               commit;
         end;
      end if;
   end loop;
end;