Upload
selia
View
46
Download
0
Tags:
Embed Size (px)
DESCRIPTION
BLS Metadata Repository – Issues and Progress. Daniel Gillman US Bureau of Labor Statistics. Outline. BLS Programs Time Series Data Dissemination Metadata Model BLS Repository. Wolfram Data Summit. BLS Programs. 8 Major Program Areas Inflation & Prices Employment Unemployment - PowerPoint PPT Presentation
Citation preview
Outline
BLS Programs Time Series Data Dissemination Metadata Model BLS Repository
9/10/2010Wolfram Data Summit 3Wolfram Data Summit
The BLS Mission
The Bureau of Labor Statistics (BLS) is the principal fact-finding agency for the Federal Government in the broad field of labor economics and statistics. The BLS collects, processes, analyzes, and disseminates essential statistical data to the public, Congress, Federal agencies, State and local governments, business, and labor.
BLS Programs
8 Major Program Areas Inflation & Prices Employment Unemployment Pay & Benefits Spending & Time Use Productivity Workplace Injuries International
9/10/2010Wolfram Data Summit 5Wolfram Data Summit
Time Series Measure or index over time
Index: number relative to fixed point
30 series types Subset by
– Industry– Occupation– Geography (state, county, MSA, etc)
Tables Generated from time series data
9/10/2010 6Wolfram Data Summit
Data Dissemination Web site: http://www.bls.gov 8 major numbers
Unemployment rate (m) Consumer price index (m) Producer price index (m) Employment cost index (q) Average hourly earnings (m) Payroll employment (m) Productivity (q) Import price index (m)
All time series Tables
9/10/2010 7Wolfram Data Summit
Data Dissemination Organized by programs Time series in ASCII files by FTP Some tables Crude database search Little metadata
Web site itself Hidden in FTP directories Handbook of Methods
Seasonal adjustment
9/10/2010 9Wolfram Data Summit
Data Dissemination Requires knowing
Organization of BLS Specific surveys or programs Specific series Terms & technical meaning
– E.g., earnings Relies on “Series ID”
Brittle scheme for identifying series Known by power users
9/10/2010 10Wolfram Data Summit
Metadata Supports
Dissemination Support Data.Gov Time series and tables
Does not support Internal processing Describing survey life-cycle Microdata (respondent level)
9/10/2010 11Wolfram Data Summit
Metadata Hard to collect Need “simple” model
Maybe not so easy Basic metadata already on FTP
sites Support finding data by
Traditional means– Series ID, BLS structure
New means– Subject matter
9/10/2010 12Wolfram Data Summit
Metadata Previous BLS focus group study
Users find data by– Time– Place– Subject (title or keywords)
Structure of agencies not known Technical terms not known
Metadata must support this
9/10/2010 13Wolfram Data Summit
Model
Model – Time Series Data Element Classification Concept Naming Convention
9/10/2010 14Wolfram Data Summit
Model
9/10/2010Wolfram Data Summit 15
Format
field name : CHAR VARYINGbegin character : INTEGERnumber of chars : INTEGER
Data Set
Number of records : INTEGERMIME type : CHAR VARYINGlocation : URL
Survey
description : VARCHARhandbook : URL
1
1..n
+is produced by1
+produces1..n
Data Element
unit of measure : CHAR VARYINGdatatype : CHAR VARYING
0..n
1..n
+included in0..n
+includes
1..n
Naming Convention
convention : VARCHAR
Series
0..n
1..n
+is held in
0..n
+holds
1..n
1..n
1
+produces
1..n
+produced by
1
Name
signifier : VARCHAR
0..1
0..n
+is generated by0..1
+generates 0..n
Concept
definition : VARCHAR
1
0..n
1
0..ncharacteristic
1
0..n
1
0..n
statistical unit
Namespace
number of names : INTEGER1
0..n
+represented by1
+represents0..n
value domain
0..1
0..n
+names are specified by0..1
+specifies the names in0..n
0..n
1
+labels
0..n
+labelled1
1
0..n
+is controlled by
1
+controls
0..n
Scheme Level
level name : CHAR VARYINGnodes at level : INTEGERchildren : INTEGER
0..n
1..n
0..n
1..n
Node
0..n
0..1
0..n
0..1representation
Classification Scheme
number of concepts : INTEGERnumber of levels : INTEGERdescription : VARCHAR1..n1..n 1..n1..n structure
Object
begin year : CHARbegin period : CHARend year : CHARend period : CHAR
0..n
0..n
+is labeled by0..n
+labels0..n
label
sub-type sub-type
sub-type
sub-type
sub-type
sub-type
sub-type
sub-type
sub-type
0..n
0..n
classification0..n
0..n
BLS Repository Under development Requirement – fast response Testing –
Flat single table design Using Apache Lucene Solr
– Open source enterprise search Various interface approaches
Visual Basic Java
9/10/2010 16Wolfram Data Summit
BLS Repository Need term map
Common terms to technical terms Definitions for technical terms Concept based management
Link terms to relevant data Manage multi-faceted search Development schedule
Still research project
9/10/2010 17Wolfram Data Summit