17
BLS Metadata Repository – Issues and Progress Daniel Gillman US Bureau of Labor Statistics

BLS Metadata Repository – Issues and Progress

  • Upload
    selia

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

BLS Metadata Repository – Issues and Progress. Daniel Gillman US Bureau of Labor Statistics. Outline. BLS Programs Time Series Data Dissemination Metadata Model BLS Repository. Wolfram Data Summit. BLS Programs. 8 Major Program Areas Inflation & Prices Employment Unemployment - PowerPoint PPT Presentation

Citation preview

BLS Metadata Repository – Issues and Progress

Daniel GillmanUS Bureau of Labor Statistics

Outline

BLS Programs Time Series Data Dissemination Metadata Model BLS Repository

9/10/2010Wolfram Data Summit 3Wolfram Data Summit

The BLS Mission

The Bureau of Labor Statistics (BLS) is the principal fact-finding agency for the Federal Government in the broad field of labor economics and statistics. The BLS collects, processes, analyzes, and disseminates essential statistical data to the public, Congress, Federal agencies, State and local governments, business, and labor.

BLS Programs

8 Major Program Areas Inflation & Prices Employment Unemployment Pay & Benefits Spending & Time Use Productivity Workplace Injuries International

9/10/2010Wolfram Data Summit 5Wolfram Data Summit

Time Series Measure or index over time

Index: number relative to fixed point

30 series types Subset by

– Industry– Occupation– Geography (state, county, MSA, etc)

Tables Generated from time series data

9/10/2010 6Wolfram Data Summit

Data Dissemination Web site: http://www.bls.gov 8 major numbers

Unemployment rate (m) Consumer price index (m) Producer price index (m) Employment cost index (q) Average hourly earnings (m) Payroll employment (m) Productivity (q) Import price index (m)

All time series Tables

9/10/2010 7Wolfram Data Summit

Data dissemination

9/10/2010 8Wolfram Data Summit

Data Dissemination Organized by programs Time series in ASCII files by FTP Some tables Crude database search Little metadata

Web site itself Hidden in FTP directories Handbook of Methods

Seasonal adjustment

9/10/2010 9Wolfram Data Summit

Data Dissemination Requires knowing

Organization of BLS Specific surveys or programs Specific series Terms & technical meaning

– E.g., earnings Relies on “Series ID”

Brittle scheme for identifying series Known by power users

9/10/2010 10Wolfram Data Summit

Metadata Supports

Dissemination Support Data.Gov Time series and tables

Does not support Internal processing Describing survey life-cycle Microdata (respondent level)

9/10/2010 11Wolfram Data Summit

Metadata Hard to collect Need “simple” model

Maybe not so easy Basic metadata already on FTP

sites Support finding data by

Traditional means– Series ID, BLS structure

New means– Subject matter

9/10/2010 12Wolfram Data Summit

Metadata Previous BLS focus group study

Users find data by– Time– Place– Subject (title or keywords)

Structure of agencies not known Technical terms not known

Metadata must support this

9/10/2010 13Wolfram Data Summit

Model

Model – Time Series Data Element Classification Concept Naming Convention

9/10/2010 14Wolfram Data Summit

Model

9/10/2010Wolfram Data Summit 15

Format

field name : CHAR VARYINGbegin character : INTEGERnumber of chars : INTEGER

Data Set

Number of records : INTEGERMIME type : CHAR VARYINGlocation : URL

Survey

description : VARCHARhandbook : URL

1

1..n

+is produced by1

+produces1..n

Data Element

unit of measure : CHAR VARYINGdatatype : CHAR VARYING

0..n

1..n

+included in0..n

+includes

1..n

Naming Convention

convention : VARCHAR

Series

0..n

1..n

+is held in

0..n

+holds

1..n

1..n

1

+produces

1..n

+produced by

1

Name

signifier : VARCHAR

0..1

0..n

+is generated by0..1

+generates 0..n

Concept

definition : VARCHAR

1

0..n

1

0..ncharacteristic

1

0..n

1

0..n

statistical unit

Namespace

number of names : INTEGER1

0..n

+represented by1

+represents0..n

value domain

0..1

0..n

+names are specified by0..1

+specifies the names in0..n

0..n

1

+labels

0..n

+labelled1

1

0..n

+is controlled by

1

+controls

0..n

Scheme Level

level name : CHAR VARYINGnodes at level : INTEGERchildren : INTEGER

0..n

1..n

0..n

1..n

Node

0..n

0..1

0..n

0..1representation

Classification Scheme

number of concepts : INTEGERnumber of levels : INTEGERdescription : VARCHAR1..n1..n 1..n1..n structure

Object

begin year : CHARbegin period : CHARend year : CHARend period : CHAR

0..n

0..n

+is labeled by0..n

+labels0..n

label

sub-type sub-type

sub-type

sub-type

sub-type

sub-type

sub-type

sub-type

sub-type

0..n

0..n

classification0..n

0..n

BLS Repository Under development Requirement – fast response Testing –

Flat single table design Using Apache Lucene Solr

– Open source enterprise search Various interface approaches

Visual Basic Java

9/10/2010 16Wolfram Data Summit

BLS Repository Need term map

Common terms to technical terms Definitions for technical terms Concept based management

Link terms to relevant data Manage multi-faceted search Development schedule

Still research project

9/10/2010 17Wolfram Data Summit

Contact Information

Daniel Gillman [email protected]