Methodology

Methodology


Overview

SAGE Stats presents annual location-based statistics drawn from government and private publications and databases. On a quarterly basis, editors update data series with the latest years of data available. The editors have harvested and refined the data, and calculated key data measures from raw data when necessary. All source information and explanatory notes appear in the Source tab, which also contains links to the original data source. Great care has been taken in the gathering and editing of the data. Nonetheless, there are a few issues that users should consider as they access this collective database.

The most recent data may not be 2014 or even 2013, depending on how quickly the government or private organization releases the figures. Users are always welcome to follow the links found in the Source tab to look for data that has yet to be added to the site.

Some data series were discontinued, most often due a lack of continued data availability. Even now, the cutting of some government statistics programs may cause some data series to be discontinued. However, we felt that it was best to include these data series in this database so that as much data as possible are available to users.

The notes and source material are available year-by-year. But, hyperlinks regularly change paths, and as a consequence, there are many web addresses with no longer valid addresses. To help users and to avoid broken links, the editors have included, for each data series, a link to the original dataset, plus a link to the source’s top-level website, which can change less often. If a data series has continued into the present, its most recent links should in most cases be up-to-date.

The figures found in this database follow a basic "style." The numbers require no additional calculations to convert them from millions, thousands, etc. Locations for which no figure was available will have either an "NA" for Not Available or simply a blank.

Care should be used in comparing dollar figures from one year to another. Most dollar figures are in nominal (or unadjusted) dollars. Some, however, are real dollars having been adjusted for inflation. The adjusted figures are usually identified in the headline of the table and always identified in the notes indicating the chained dollars to which they were adjusted.

For ease of use, we have kept years of data in the same "data series" when the methodology has changed subtly. We encourage users, when making inferences over time, to look at footnotes for any such subtle changes. If there was a major change in methodology, we have sometimes broken these into separate data series.

All statistics are an approximation to a certain extent. The larger the quantity being counted, the more likely that some data are missed or that the figure changes rapidly. These figures are the best available data for the specific point in time at which they were collected. Care should always be used in comparing data from one year to another, but especially when data from different sources are compared. These figures can reveal important and informative trends, but all figures require a thorough understanding of the original source, methods, and limitations.

Another common difference that users should be aware of is in the death rate figures. Many are listed as "age-adjusted," while others are straight raw rates. These figures are explained in their footnotes, but one should not be compared to the other without an understanding of the differences.

Median Value is included to give users a basis for comparing any particular location in question to an "average" location for this metric. The median value is calculated only for areas reporting data, which means it is the midpoint of locations with data, not all locations. The median value often gives a good reflection of how the data changes over time, but it should be noted: a) changes in the number of locations reporting data could affect the median and (b) the median is sometimes 0, which indicates that more than half of the counties reporting data reported zeroes.


About Place Codes

For metro area, county, city, ZIP Code, and ZCTA statistics users will find a column labeled “Place Code” in the table view and in downloaded files. For the most part, these numbers refer to standardized place identifiers created by the federal government. By using standardized codes, data from SAGE Stats can be more easily compared to data outside of the site. Please consult the following for more details:

  • Metro Areas: Statistics at the metro area-level include five-digit Federal Information Processing Standard (FIPS) codes. Be careful not to confuse these with ZIP Codes.
  • Counties: Statistics at the county-level include five-digit FIPS codes. The first two digits are the state code, with the following three digits representing the county within the state. Be careful not to confuse these with ZIP Codes.
  • Cities: Statistics at the city-level include geographic identifier codes used by the Census Bureau. Codes beginning “16000” indicate census “places.” Codes beginning in “06000” indicate county subdivisions. These prefixes are followed by further zeroes, “US”, the two-digit state FIPS code, and then the five-digit city identifier. For example, the city of Wilmington, NC, is designated by the following place code: 1600000US3774440. The important digits break down thusly: 16000 [indicates that it is a census place], 37 [North Carolina’s FIPS code], 74440 [Wilmington’s particular place code].
  • ZIP Codes and ZIP Code Tabulation Areas (ZCTA): Values in the “Place Code” column are the ZIP Codes/ZCTAs themselves.

About States

What you need to know:

  • All fifty U.S. states and the District of Columbia are included at the state level; hence, the ranks given are 1-51 rather than 1-50.
  • Two-digit state labels appear on the national map; where they could not be read due to the small area of some East Coast states, the labels stand off to the right or above with lines pointing to the state being referenced and terminating in a dot within the state indicated.
  • On the map, the District of Columbia appears to the right of the East Coast in order to make it easier to find.
  • U.S. state regional assignments are based on the four census regions defined by the Census Bureau: Northeast, Midwest, South, and West. Census divisions are not included on SAGE Stats.

About Cities

What you need to know:

  • Do not include all cities/towns in the United States.
  • Include approximately 1700 “Core Cities” that are fully featured on the site, with maps and location pages.
  • Are represented as dots on the map (non-Core Cities will not appear on the map).
  • Usually have a Census 2010 population over 75,000 or are a principal city of a Metro Area.
  • Sometimes are the location of major universities or military bases.
  • Include incorporated places, Census-designated places, and some county subdivisions where the latter function like cities
  • Use the geographic identifier codes employed by the Census Bureau. Codes beginning “16000” indicate census “places.” Codes beginning in “06000” indicate county subdivisions. Sorting by Place Code is equivalent to sorting by type (subdivision or place), then alphabetically by state and by city name.

Two words of warning:

  • Take into account the different population densities as well as populations of cities when comparing them; filtering tables by population and population densities can help.
  • While some data are collected at the city level, much more data may be available on a particular topic at the metro area or county level. Check the a city’s page for the associated counties and metro areas or use the zip code finder.

About Counties

What you need to know:

  • Include all current counties and county equivalents (such as parishes in Louisiana) in the 50 United States.
  • Also includes combined counties and county equivalents or independent cities (such as Maui County, HI (including Kalawao County, HI)) as defined by the Bureau of Economic Analysis. These counties are exclusively used for data provided by Woods & Poole in order to maintain geographic consistency across historical and projected years. A complete list is available here.
  • Have their own location pages, which include such data as neighboring counties, similar counties, and suggested data series.
  • Include data for historic counties (such as Clifton Forge, Virginia), but these will not appear on the map or location pages.
  • Can be filtered by the USDA Economic Research Services “county typology codes,” which classify counties in such ways as “Persistent Poverty County.”
  • Vary widely in terms of land area and population size, which can cause certain counties (such as Los Angeles County) to rise to the top of many data series.
  • Use five digit FIPS (Federal Information Processing Standards) codes. The first two digits are the state code, with the following three digits representing the county within the state. Sorting by Place Code is equivalent to sorting alphabetically first by state, then by county.

One word of warning:

Take into account the population densities when comparing counties; filtering tables by population and population densities can help.


About Metro Areas

What you need to know:

  • Defined by the Office of Management and Budget (OMB).
  • Named after the largest city or cities at the center (up to three cities in a name).
  • Surround an urban area of 10,000 people or more.
  • Metro area boundaries are updated roughly every 10 years and may change names even more frequently. Interpret time trends with caution.
  • Identified by Federal Information Processing Standard (FIPS) codes also used by the OMB.
  • Each has its own location page, which you may find in the Location browse.
  • Sorting the data table by Place Code is equivalent to sorting Metro Areas alphabetically by largest city name.
  • Breaks in the line graphs indicate possible changes in the defined boundaries of the Metro Area.

Two words of warning:

  • Metro Area definitions change over time! Use the Location page to look at the changing boundaries.
  • The “year” of a Metro Area definition may not match the year of data. For instance, a source may retroactively apply the Metro Area boundaries from 2013 to their historic data that was collected from the 1980s. In addition to the data year on the timebar, check the boundary year in the “Source” tab (it’s listed after “Uses Geographic Definitions from”).

Frequently Asked Questions about Metro Areas:

Which Metro Areas are included in SAGE Stats? A list is available via the Location browse. We include all basic Metro Areas as currently defined by the Office of Management and Budget (OMB) for cities in the 50 U.S. states, as well as historic Metro Areas for which we have data. We do not include Metro Areas in Puerto Rico or other U.S. territories, or in other countries.

How do I find data for a Metro Area in SAGE Stats? You have two options. You can travel to a Metro Area Location page and click on “Find Data Series.” This list is organized by topic and category/subcategory, and it will display only data series with data for that specific Metro Area. Alternatively, you can travel to a data series page in which you are interested (for example, Murders (Metro Area)) and see if your preferred Metro Area is included.

How do I find a Metro Area Location page? You have many options—search your zip code, city, county, or state to identify the appropriate metro area or utilize the Location browse option which organizes geographies by place type (State, County, Metro Area, City, and ZIP Code) then by state. If a Metro Area is in more than one state, such as the Chattanooga, TN-GA Metro Area, you will find it under both states.

How do I compare Metro Areas? You may compare metro areas directly on the U.S. map by hovering over the first selected area to generate a data label, single clicking to make it stationary, and then repeating the process for a second, third, or more metro areas. The “Compare” tab also allows a Table and Chart comparison among metro areas across all available years. The default chart view shows the “National Median,” the midpoint of the Metro Area data (does not include Metro Areas without data for that data series).

To compare multiple Metro Areas within a county, choose “Compare Metro Areas.” To compare all Metro Areas across two data series, choose “Compare Data Series: Scatterplot.” This will also give you all Metro Areas for the two data series in the table view. To compare up to six data series, choose “Compare Data Series: Line Graph.” This compare feature requires you to choose one place for the line graph (we obviously couldn’t have 300 lines for each data series), but the corresponding table view displays all Metro Areas for all six data series.

Understanding Metro Area Basics:

What is a Metro Area, officially? In general, a “metro area” is a combined number of whole counties that include or commute into one or more core urban cities. However, there are multiple levels of metro areas as defined by The Office of Management and Budget (OMB). In SAGE Stats, we refer to all the levels below as “metro areas” in order to simplify the terminology.

Note: Metropolitan and Micropolitan Statistical Areas are types of Core Based Statistical Areas.

Type of Metro Area Urban Area Population Requirements Geographic Building Blocks Obsolete?
Core Based Statistical Area (CBSA) types:
Metropolitan Stat. Area (MSA) 50,000
Counties
No
Micropolitan Stat. Area (µSA) at least 10,000, less than 50,000
Counties
No
Metropolitan Division (M.D.)
2.5 million
Counties within MSAs
No
Combined Statistical Area (CSA)
10,000
Two or more CBSAs
No
New England City and Town Area (NECTA)
10,000
Cities and towns
No
New England City and Town Area Division (NECTAD)
2.5 million
Cities or towns within NECTAs
No
Combined New England City and Town Areas (CNECTA)
10,000
Two or more NECTAs
No
Consolidated Metropolitan Statistical Areas
(CMSA)
1 million
Two or more PMSAs
Yes
Primary Metropolitan Statistical Areas (PMSA)
1 million

Counties
Yes

How many cities can be within a Metro Area? The name of a Metro Area can only include up to three cities, but more cities can be within the Metro Area. To find additional cities, look at the Associated Cities section of a Metro Areas location page. When two large cities are nearby (e.g., Baltimore and Washington), the OMB uses several “levels” of Metro Areas to interrelate them.

Where does SAGE Stats get its Metro Areas? Primarily, from the Office of Management and Budget (OMB), which defines these areas for the use of statistical agencies. Definitions have changed over time, so please use caution in interpreting trends in Metro Areas. Because of the many different levels of metro areas the OMB uses, SAGE Stats uses the generic label “Metro Area.”

Do Metro Areas change over time? Yes, metro area boundaries are updated every 10 years. For a detailed explanation, see our “Changes Over Time” section. Here’s the shorter answer: If you go to a Metro Area location page, you can see how a Metro Area has changed boundaries. You’ll notice that some Metro Areas in SAGE Stats have separate maps for 1950-1992, 1993-2002, 2003-2012, and 2013-present; these represent major boundary changes in Metro Areas. The most recent update was in 2013, but the largest change occurred in 2003, when the old system of four-digit place codes was eliminated and replaced with a five-digit place code system.

How big are Metro Areas? Metro Areas contain either a core “urban area” of 50,000 or more population (officially known as Metropolitan Statistical Areas or MSAs) or a core “urban area” between 10,000 and 50,000 (officially known as Micropolitan Statistical Areas or µSA—that’s the Greek letter pronounced “mew”). Both MSAs and µSA’s are called “Metro Areas” in SAGE Stats, because they are more similar than dissimilar.

Why do some Metro Areas end with (end 2012)? When a Metro Area goes defunct, with no sufficient similar Metro Area to take its place, we mark the year its boundary definition ended. You may notice that many of these defunct Metro Areas include “PMSA” or “CMSA” in their names – these two types of Metro Areas are no longer used by the OMB.

What are PMSAs and CMSAs? The short answer is, Primary Metropolitan Statistical Areas and Consolidated Metropolitan Statistical Area. These were two levels of Metro Areas that existed from 1981 to 2003 and are now obsolete and no longer used by the government to collect statistical data; however they are included in SAGE Stats when the data is available.

Changes over Time in Metro Areas:

Do Metro Areas change over time? Yes. The OMB has released new Metro Area definitions in 2013, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 1999, 1993, 1990, 1983, 1981, 1973, 1971, 1963, 1960, and 1950. Bold and italics indicate major updates, which include boundary changes that are reflected in SAGE Stats. The other updates generally include name changes, code changes, and the creation of new areas. The largest change occurred in 2003, when the old system of four-digit place codes was eliminated and replaced with a five-digit place code system. Every Metro Area code changed from 2002 to 2003, without exception, and many new Metro Areas came into being with the advent of Micropolitan Statistical Areas (µSAs).

How do Metro Areas change over time? As far as their boundaries are concerned, Metro Areas can gain or lose counties. Sometimes new Metro Areas are born and occasionally they are also eliminated. Most changes occur three years after the decennial census (the most recent update was released in 2013). However, the names and codes can change year to year based on the relative population of the largest cities.

Why do Metro Areas change over time? Because Metro Areas are based on commuting patterns, the OMB changes their definitions as workers change their commutes. Furthermore, as populations change, more cities and towns meet the definition of having a Metro Area. While it would be easier to look at time-series data without these changes, these changes reflect what is happening.

What Metro Area definitions does SAGE Stats use for its data? It depends on the data source. Some data series have Metro Area boundaries that change with the data. Other data series apply the present 2013 Metro Area boundaries to re-calculate all of their historical data. You can find out what geographic boundaries are being used for a particular year of data by going to the “Source” tab and looking for the words “Uses Geographic Boundaries from.” This geographic boundaries year applies only a single year of the data; you can change which year is displaying using the year dropdown in the notes or the timebar.

How do changing Metro Areas show up in SAGE Stats? On the “Chart” view, we have “broken” the line graph view to show when boundaries might have changed. Changing metro boundaries are also viewable on the Location page for that specific metro area. Because we believe there is value in seeing how a Metro Area has changed over time, we have linked all Metro Areas that can be linked. Sometimes this is not possible, and some Metro Areas have become defunct and are marked with (end YEAR), such as (end 2012). But a linked set of Metro Areas may have different boundaries and therefore large changes in various data measures.

Special Cases and More Details about Metro Areas:

What is a HUD FMR? A Department of Housing and Urban Development (HUD) Fair Market Rent (FMR) area is an alternative Metro Area designed to capture differences in cost-of-living that feed differences in HUD subsidies for low-income. These “Metro Areas” do not necessarily follow the same definitions as standard Metro Areas; in fact, some can be individual counties. In order to include these data, we have added HUD FMR areas as “hidden” Metro Areas; they appear in the table and chart view of certain HUD data series, but not in the Location browse or on the map.

Metro Areas within Metro Areas. Or CBSA, MSA, µSA, CSA, M.D., PMSA, CMSA—what does it all mean? First, any Metro Area in SAGE Stats is treated as a “Metro Area;” the only difference is that the name may include a qualifier like PMSA, CMSA, or M.D., and that some unique Metro Area types may not have maps.

Part 1, current definitions: Core-Based Statistical Area (CBSA) is a blanket term for the two classifications of Metropolitan Statistical Area (MSA) and Micropolitan Statistcal Area (µSA). MSA’s and µSA’s are functionally equivalent; MSA’s just have larger urban area cores than µSA’s. For SAGE Stats, we have opted for “Metro Area” as the catch-all term rather than CBSA both because it is more intuitive and because the term “CBSA” came into use only in 2003 and does not refer to Metro Areas prior to the overhaul of the system instituted in 2003.

For the most part, SAGE Stats only includes MSAs. You may notice some data series that include M.D.s—notably, Metro Area crime—but these M.D.s will not appear on the map; they will appear only in the table and chart views. It does not presently include data for CSAs.

Part 2, past definitions: From 1983 to 2002, there was a different structure in place. Some Metro Areas were simply Metropolitan Statistical Areas. Others had two levels: Primary Metropolitan Statistical Areas (PMSAs), and Consolidated Metropolitan Statistical Areas (CMSAs). PMSAs were included within CMSAs. All PMSAs were included within CMSAs and all CMSAs were made up of PMSAs.

From 1983 to 2002, there was a different structure in place. Some Metro Areas were simply Metropolitan Statistical Areas. Others had two levels: Primary Metropolitan Statistical Areas (PMSAs), and Consolidated Metropolitan Statistical Areas (CMSAs). PMSAs were included within CMSAs. All PMSAs were included within CMSAs and all CMSAs were made up of PMSAs.

Prior to 2003, Metro Areas in New England had these town-specific boundaries that cut across counties. These county-crossing boundaries include MSAs, PMSAs, and CMSAs (see above for more detail). Because pre-2003 Metro Area boundaries did not conform to county boundaries, county-level data is insufficient to generate Metro Area data for the entire country for those older definitions.

In what Metro Area does Pawnee, IN fall? Pawnee is the basis for the faux documentary television show “Parks and Recreation.” Its fictional county Wamapoke County is not listed in any Metro Area boundaries. However, with a population of 79,218, it is of similar size to Muncie, IN, which has its own Metro Area entitled “Muncie, IN,” consisting solely of Delaware County, IN. The only other town consistently mentioned on the show is Eagleton, IN, which has a population of only 9,500, which is not large enough to appear in the Metro Area name. Therefore, we can conclude that the Metro Area would most likely be the Pawnee, IN Metropolitan Statistical Area, consisting solely of Wamapoke County. Its Metro Area code would have to fall between 37700 (Pascagoula, MS Metropolitan Statistical Area) and 37740 (Payson, AZ Micropolitan Statistical Area). So, we will say the Pawnee, IN CBSA code would be 37720. We can only guess where Pawnee, IN would rank among Metro Areas; but we do know that Muncie, IN Metro Area had an estimated median annual wage of $30,010 in 2013, #292of the 395 Metro Areas listed; a one-bedroom fair market rent in 2015 of $510.00 (443 of 523), and only 1 occupational fatality in 2011, the lowest total among the 294 Metro Areas reporting data. All of these measures are available on the Location Page under “Suggested Data Series.”


About ZIP Codes and ZCTAs

What you need to know:

ZIP Codes vs. ZCTA Codes

ZIP Codes ZCTA Codes
Over 42,000 ZIP codes Over 32,000 ZCTA codes
Created by the U.S. Postal Service
Created by the U.S. Census Bureau
Point-based geography (addresses)
Areal-based geography
Do not have geographic boundaries
Do have geographic boundaries
Based on mail service routes
Based on the most frequently occurring ZIP code within a group of aggregated census blocks
Updated continuously
Updated every decade after the decennial census

The majority of assigned ZCTA codes are the same as the ZIP code. The first available ZCTA codes were developed in 2000 for the decennial census and were updated in 2010. The 2010 ZCTAs differ from the 2000 ZCTAs in three ways:

  1. 2010 ZCTAs do not cover 100% of the country.
  2. 2010 ZCTAs are all 5 digits.
  3. Large areas of unpopulated land and water bodies (such as national parks or lakes) do not have an assigned ZCTA code.

ZIP and ZCTA Codes in Business Stats

  • More than 40,700 ZCTA and ZIP codes are available.
  • Each has an individual location page that includes additional geography data such as neighboring ZIP codes and surrounding counties and metro areas.
  • These codes are as of 2015 and are provided by GreatData.com, a group that specializes in manually verifying all U.S. Postal ZIP codes and removing any duplicate codes. They are updated on an annual basis.
  • ZIP codes are depicted as dots on our data series maps to better represent the population center of a ZIP code area.
  • ZIP code map rendering allows for a detailed view of the data while also providing an approximation of the ZCTA geography.
  • Map displays are a snapshot of all ZIP codes in 2015. Maps do not change continually.
  • The ZIP and ZCTA geographic level is identified in every data series title. For example, “Total Number of Establishments (ZIP)” contains ZIP-level data. “Percent of Unemployed Persons (Five Year Average) (ZCTA)” contains ZCTA-level data.

Two words of warning:

  • ZIP and ZCTA codes are not directly comparable. Remember that a single ZCTA may contain multiple ZIP codes – by comparing ZIP and ZCTA codes to one another you may be inadvertently comparing the same data to itself.
  • Consider the population sizes of certain areas before comparing ZIP or ZCTA codes. Filtering by population size may help you identify geographies with similar sample sizes.

About Data from Woods & Poole Economics, Inc.

Woods & Poole Economics has been providing demographic and economic projections for decades. All of their variables date back to at least the early 1990s, with some going as far back as 1969, and are projected out to 2050. In SAGE Business Stats, their statistics are available at the state-, metropolitan statistical area-, and county-level.

Given the specialized nature of these statistics, comprehensive methodological documentation is available here: Woods & Poole Methodology Notes for 2016 Projections.


About the North American Industry Classification System (NAICS) Codes

What you need to know:

  • System by which the industry of a business establishment is classified based on its primary business activity.
  • Created and adopted in 1997 by the U.S., Canada, and Mexico.
  • Reviewed every five years since its adoption to keep up with changes in business economy (e.g. emerging industries). NAICS codes were updated and revised in 2002, 2007, and 2012. The next revision year is 2017.
  • Specifically designed to allow for comparability among the business economies in the U.S., Canada, and Mexico.
  • NAICS is a 2- through 6-digit hierarchical classification system, offering five levels of detail. Each digit in the code is part of a series of progressively narrower categories, the more digits in the code the greater the classification detail.
    • 2-digit NAICS indicates the industry sector
    • 3-digit NAICS indicates the industry subsector
    • 4-digit NAICS indicates the industry group
    • 5-digit NAICS indicates the industry
    • 6-digit NAICS indicates country-specific industries
    • Note: the 6-digit NAICS codes are not standardized across all three countries
NAICS Sector Sector Description
11 Agriculture, Forestry, Fishing and Hunting
21 Mining, Quarrying, and Oil and Gas Extraction
22 Utilities
23 Construction
31-33 Manufacturing
42 Wholesale Trade
44-45 Retail Trade
48-49 Transportation and Warehousing
51 Information
52 Finance and Insurance
53 Real Estate and Rental and Leasing
54 Professional, Scientific, and Technical Services
55 Management of Companies and Enterprises
56 Administrative and Support and Waste Management and Remediation Services
61 Educational Services
62 Health Care and Social Assistance
71 Arts, Entertainment, and Recreation
72 Accommodation and Food Services
81 Other Services (except Public Administration)
92 Public Administration

Why are there five different levels of industries? That sounds complicated.The 2- to 6-digit hierarchy is organized by level of industry detail – the longer the code, the more specific the industry classification. The 2-digit level represents the broadest industry classification and the 6-digit level offers the most specific classification.

What do you mean by “establishment”? “Establishment” is the actual physical location of a business. Businesses with more than one establishment (e.g. Apple) are assigned a NAICS code for each domestic establishment based on that establishment’s primary business activity.

How is the “primary business activity” of an establishment determined? Primary business activity is determined by the establishment’s production costs and/or capital investment. The establishment’s revenue and value of goods may also be used as determining factors.

What about U.S. companies that have business locations abroad? Only U.S. domestic establishments are assigned a NAICS code; if a company has domestic and foreign establishments, only the domestic establishments are assigned a NAICS code.

Exactly who or what assigns a NAICS code to a business establishment? There is not one federal agency that is solely responsible for assigning NAICS codes. The U.S. Census Bureau and other federal agencies record and assign NAICS codes for their own internal purposes.

More questions? For more detail, visit the Census Bureau.

Two words of warning:

  • Be careful when comparing data from recent and past years – sometimes the indicated NAICS code for a data series will change, usually with the year in which the system was last updated. For instance, 2009 revenue data may use the 2007 NAICS codes; however, 2013 revenue data will most likely use the more recent 2012 NAICS codes.
  • The 6-digit NAICS codes are not standard or comparable across the U.S, Canada, and Mexico. There is only comparability in definition and code for the 2- to 5-digit NAICS codes.

About the Standard Occupational Classification System (SOC) Codes

What you need to know:

  • The SOC is a system by which the paid jobs of workers are classified by occupation.
  • It was created in 1980 and revised in 2000 and 2010. The next revision is for 2018.
  • The U.S. Bureau of Labor Statistics and the U.S. Census Bureau are charged with collecting and reporting data on total U.S. employment across the full spectrum of SOC major groups, which is then coded by federal statistical agencies. No one agency is responsible for assigning SOC codes.
  • It is used by federal statistical agencies for the purpose of collecting, calculating, analyzing or disseminating data.
  • SOC is a 2-through 6-digit hierarchical classification system, offering four levels of detail. Each digit in the code is part of a series of progressively narrower categories, and the more digits in the code signify greater classification detail.
    • The first two digits indicate major occupation group
    • The next two digits indicate broad employment title
    • The fifth digit indicates sub employment type
    • The sixth digit indicates detailed occupation
SOC Major Occupation Group Group Description
11 Management Occupations
13 Business and Financial Opations Occupations
15 Compute and Mathematical Occupations
17 Architecture and Engineering Occupations
19 Life, Physical and Social Science Operations
21 Community and Social Service Occupations
23 Legal Occupations
25 Eduaction,Training and Library Occupations
27 Arts, Design, Entertainment, Sports and Media Occupations
29 Healthcare Practiitons and Technical Occupations
31 Healthcare Support Occupations
33 Protective Service Occupations
35 Food Preparation and Service Related Occupations
37 Buidling and Ground Clearning and Maintenance
39 Personal Care and Service Operations
41 Sales and Related Occupations
43 Office and Administrative Support Occupations
45 Farming, Fishing and Forestry Occupations
47 Construction and Extraction Occupations
49 Installation, Maintenance and Repain Occupations
51 Production Occupations
53 Transportation and Material Moving Occupations
55 Miliatry Specific Occupations

Why are there five different levels of occupations? That sounds complicated. The 2- to 6-digit hierarchy is organized by level of employment detail – the longer the code, the more specific the employment classification. The 2-digit level represents the broadest employment classification and the 6-digit level offers the most specific classification.

What do you mean by employment or occupation? ‘Employment’ or ‘Occupation’ solely refers to the paid work of an individual pertaining to their trade or profession. Thus, the SOC classification excludes occupations unique to volunteers.

How is the “primary occupational activity” of an individual determined? Occupations are classified based on work performed and, in some cases, on the skills, education, and/or training needed to perform the work at a competent level.

What if an individual’s job can be classified into more than one occupation? When workers with a single job could be coded in more than one occupation classification, they are classified in the occupation that requires the highest level of skill. If there is no measurable difference in skill requirements, workers are coded in the occupation in which they spend the most time.

What is the difference between an occupation and a job? An occupation is a category of jobs that are similar with respect to the work performed and the skills possessed by the incumbents. A job is the specific set of tasks performed by an individual worker.

Words of warning:

  • Be careful when comparing data from recent and past years – sometimes the indicated SOC code will change. The 2010 revision introduced new occupational categories and hence the SOC codes for the years 2000 to 2009 are different from the SOC codes for 2010 and onwards.

About the Cooperative Patent Classification (CPC) System

What you need to know:

  • The CPC is a system for classifying patents according to industry.
  • It was developed by the European Patent Office (EPO) and the United States Patent and Trademark Office (UPSTO) and has been in use since 2013.
  • It harmonizes the work of the EPO and UPSTO and is more detailed than the International Patent Classification (IPC) to improve patent searching.
    • Section indicated by the first letter A through to H and Y.
    • Class indicated by the next two numerical digits
    • Subclass indicated the next one letter
    • Main Group indicated by the next one to three digits
    • Sub group indicated by next two to six numerical digits after an oblique stroke (/)
CPC section Sector Title Description
A Human Necessities This section covers patents to do with agriculture, food stuffs and tobacco, personal or domestic articles, health and amusement
B Operations and Transport This section covers patents to do with separating, mixing, shaping, printing and transporting materials as well as microstructural technology and nanotechnology.
C Chemistry and Metallurgy This section covers patents to do with chemistry i.e. the production of chemicals and compounds as well as metallurgy (the production of metals and alloys).
D Textiles and Paper This section covers patents to do with paper making, production of cellulose, yarns, woven fabrics and looms.
E Fixed Constructions This section covers patents to do with building, earth drilling and mining.
F Mechanical engineering, lighting, heaitng, weapons and blasting This section covers patents to do with engines or pumps, engineering in general, lighting, heating, weapons and blasting (fireworks and the explosive composition of ammunition).
G Physics This section covers patents to do with instruments (any measuring tool) and nucleonics (nuclear physic and engineering).
H Electricity This section covers patents to do with basic electric elements, generation, conversion or distribution of electric power, basic electronic circuitry, electric communication technique and electric technique not otherwise provided for.
Y Emerging Cross-Sectional Technologies This section is for general tagging of new technological developments and cross sectional technologies spanning over sections of the IPC.

How often will a new version of CPC be released and will there be version numbers? It is envisaged that revision will be take place more quickly than is the case with IPC. The exact frequency of revisions is not yet known, but it is most likely that there will be multiple revisions per year. Each version will have a version date.

Why are there variations in the length of code? The number of digits in the ‘main group’ and ‘sub group’ classification depend on the level of detail. The longer the code the more detailed the classification.

How are patents classified? The CPC classifications are prepared by EPO classification experts in each technical field.

Will U.S. patents be classified in the CPC? Yes, both European and U.S. patents will be classified in the CPC to provide one uniform classification system.

Words of warning:

  • The title descriptions in the CPC are revised regularly (5 times in 2016) and hence it is important to check the code against the most recent CPC patent scheme of classification.