This topic describes how to use population census data to build a statistical model. You can use the model to analyze the impact of academic degrees on income based on attributes such as the age, job type, and education level.
|age||The age of the person.||DOUBLE|
|workclass||The job type of the person.||STRING|
|fnlwgt||The ID of the person.||STRING|
|education||The education level of the person.||STRING|
|education_num||The years of education that the person receives.||DOUBLE|
|maritial_status||The marital status of the person.||STRING|
|occupation||The job of the person.||STRING|
|relationship||The family relationship of the person.||STRING|
|race||The race of the person.||STRING|
|sex||The gender of the person.||STRING|
|capital_gain||The capital gain of the person.||STRING|
|capital_loss||The capital loss of the person.||STRING|
|hours_per_week||The weekly working hours of the person.||DOUBLE|
|native_country||The nationality of the person.||STRING|
|income||The income of the person.||STRING|
- Go to the Machine Learning Studio console.
- Log on to the PAI console.
- In the left-side navigation pane, choose .
- On the PAI Visualization Modeling page, find the project in which you want to create an experiment and click Machine Learning in the Operation column.
- Create an experiment.
- In the left-side navigation pane, click Home.
- In the Templates section, click Create below Population Census.
- In the New Experiment dialog box, set the experiment parameters. You can use the default values of the
Parameter Description Name The name of the experiment. Default value: Population Census. Project The project in which you want to create the experiment. You cannot change the value of this parameter. Description The description of the experiment. Default value: Use machine learning algorithms to achieve population census and analyze the correlation between the income and education level. Save To The directory for storing the experiment. Default value: My Experiments.
- Click OK.
- Optional:Wait about 10 seconds. Then, click Experiments in the left-side navigation pane.
- Optional:Click Population Census_XX under My Experiments.My Experiments is the directory for storing the experiment that you created and Population Census_XX is the name of the experiment. In the experiment name, _XX is the ID that the system automatically creates for the experiment.
- View the components of the experiment on the canvas. The system automatically creates
the experiment based on the preset template.
Area No. Description 1 The Data source-Population statistics component reads the dataset from MaxCompute. 2 The Whole Table Statistics-1, Data Pivoting-1, and Histogram (Multiple Columns)-1 components generate statistical results. Then, you can determine whether the data follows a Poisson distribution or a Gaussian distribution and whether the data is continuous or discrete. Machine Learning Studio can visualize data analysis results. After the experiment is run, right-click Histogram (Multiple Columns)-1 on the canvas and select View Analytics Report to view the distribution of the input data. 3 The components in this area analyze the impact of academic degrees on income.
- Data preprocessing
The SQL Script-1 component converts the values of the income field to 0 or 1. 0 indicates an annual income of less than or equal to USD 50,000. 1 indicates an annual income of more than USD 50,000.
- Filtering and mapping
The Filtering and Mapping components divide data into three groups based on the following academic degrees: Doctor's degree, Master's degree, and Bachelor's degree. The Filtering and Mapping components support SQL statements. You can set filter criteria as needed. For example, click Filter-PHD on the canvas. In the right-side Fields Setting pane, set the Filter Criteria parameter to
education='Doctorate'to filter out the persons with Doctor's degrees.
- Statistical results
The Percentile components calculate the income proportions of persons with each academic degree.
- Data preprocessing
- Run the experiment and view the result.
- In the top toolbar of the canvas, click Run.
- After the experiment is run, right-click Percentile-1 on the canvas and select View Analytics Report.
- In the Percentile dialog box, click the icon in the upper-right corner to view the line chart of income distribution for
persons with doctor's degrees.As shown in the preceding figure, about 25% of persons with doctor's degrees earn an annual income of less than or equal USD 50,000. These persons are represented by the points with the value of 0 in the line chart.Note You can drag the slider below the line chart to view the entire income distribution for the persons with Doctor's degrees.
- Repeat the preceding steps to view the income distributions of persons with Master's
degrees and Bachelor's degrees. The following table shows the aggregate results.
Academic degree Proportion of persons with an annual income of more than USD 50,000 Doctor's degree 75% Master's degree 56% Bachelor's degree 42%