Sunday, August 2, 2020

Association Rule mining with Weka


coinpayu

Data set selection

I selected Adult data set from UCI data set collection (https://archive.ics.uci.edu/ml/ datasets/Adult). The donor of this data set is Ronny Kohavi and Barry Becker from Data Mining and Visualization, Silicon Graphics. This is a multivariate data set with 32561 instances and 15 attributes. The purpose of selecting this data set is to identify and understand the factors affecting to the income of a person.

 

Data set attribute information

Attribute

Information

age

continuous

workclass

Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.

fnlwgt

continuous

education

Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.

education-num

continuous

marital-status

Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.

occupation

Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.

relationship

 Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.

race

White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.

sex

Female, Male.

capital-gain

continuous

capital-loss

continuous

hours-per-week

continuous

native-country

United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

 

 

Data Pre-processing

1.     Remove missing values

Before apply any rules to the data set, we have to pre process the data set. First thing is checking for missing values. In my data set I found there were few missing values for some attributes such as workclass, occupation and native-country. Opened the data set in Microsoft Excel and used MS Excel’s Filter functions to remove rows with missing values (in this case cells with value “?”). After filtering 2399 instances were removed. So 30162 instances were used for processing.

2.     Remove un-necessary attributes

Closely examined the data set and identify attributes which will not use for processing. This will reduce the complexity of the data set and made easy to apply rules. In above data set identified 4 attributes (fnlwgt, education-num, capital-gain and capital-loss) which are less useful for processing and remove them from the data set.  Now there are 11 attributes in the dataset.

3.     Discretization

Because of Association Rule mining, all numerical attributes should be removed. Used Discretization function in Weka to convert numeric attributes as “age” and “hours-per-week” in the dataset to categorical data. 3 bins were rerated for each “age” and “hours-per-week” attributes and weka automatically assign corresponding values in to relevant bin using discretize filter. Now all 11 attributes are nominal attributes and my data set is now ready for apply rules.

Weka created 3 bins for “age” as {'\'(-inf-41.333333]\'','\'(41.333333-65.666667]\'','\'(65.666667-inf)\''} and 3 bins for “hours-per-week” as {'\'(-inf-33.666667]\'','\'(33.666667-66.333333]\'','\'(66.333333-inf)\''}. To increase the readability of the data set and the readability of the results after applying the association rules to the dataset, replaced the labels of the “age” with {'0_41','42_65','66_MAX'} and replaced the labels of “hours-per-week” with {'0_33','34_66','67_MAX'}.

I selected income (which has values >50K, <=50K.) as class variable.

 

Applying Association rules (Apriori Algorithm)

In weka Associate tab selected Apriori algorithm. In Apriori configuration set the algorithm to mine class association rules and changed the number of rules to 40. Other configurations left as default.

 

Results

=== Run information ===

 

Scheme:       weka.associations.Apriori -N 40 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -A -c -1

Relation:     dataset-weka.filters.unsupervised.attribute.Discretize-B3-M-1.0-R1-precision6-weka.filters.unsupervised.attribute.Discretize-B3-M-1.0-R9-precision6

Instances:    30162

Attributes:   11

              age

              workclass

              education

              marital-status

              occupation

              relationship

              race

              sex

              hours-per-week

              native-country

              income

=== Associator model (full training set) ===

 

 

Apriori

=======

 

Minimum support: 0.15 (4524 instances)

Minimum metric <confidence>: 0.9

Number of cycles performed: 17

 

Generated sets of large itemsets:

 

Size of set of large itemsets L(1): 21

Size of set of large itemsets L(2): 70

Size of set of large itemsets L(3): 104

Size of set of large itemsets L(4): 61

Size of set of large itemsets L(5): 19

Size of set of large itemsets L(6): 3

 

Best rules found:

 

 1. age=0_41 workclass= Private marital-status= Never-married 7358 ==> income= <=50K 7125    conf:(0.97)

 2. age=0_41 workclass= Private marital-status= Never-married native-country= United-States 6674 ==> income= <=50K 6457    conf:(0.97)

 3. age=0_41 workclass= Private marital-status= Never-married race= White 6149 ==> income= <=50K 5944    conf:(0.97)

 4. age=0_41 workclass= Private marital-status= Never-married race= White native-country= United-States 5703 ==> income= <=50K 5505    conf:(0.97)

 5. age=0_41 marital-status= Never-married 8733 ==> income= <=50K 8414    conf:(0.96)

 6. age=0_41 marital-status= Never-married native-country= United-States 7963 ==> income= <=50K 7666    conf:(0.96)

 7. age=0_41 marital-status= Never-married race= White 7230 ==> income= <=50K 6954    conf:(0.96)

 8. age=0_41 marital-status= Never-married race= White native-country= United-States 6737 ==> income= <=50K 6473    conf:(0.96)

 9. workclass= Private marital-status= Never-married 8025 ==> income= <=50K 7706    conf:(0.96)

10. workclass= Private marital-status= Never-married native-country= United-States 7270 ==> income= <=50K 6972    conf:(0.96)

11. workclass= Private marital-status= Never-married race= White 6688 ==> income= <=50K 6403    conf:(0.96)

12. age=0_41 workclass= Private marital-status= Never-married hours-per-week=34_66 5136 ==> income= <=50K 4915    conf:(0.96)

13. workclass= Private marital-status= Never-married race= White native-country= United-States 6194 ==> income= <=50K 5920    conf:(0.96)

14. age=0_41 marital-status= Never-married sex= Male 4904 ==> income= <=50K 4684    conf:(0.96)

15. marital-status= Never-married 9726 ==> income= <=50K 9256    conf:(0.95)

16. age=0_41 marital-status= Never-married hours-per-week=34_66 6165 ==> income= <=50K 5861    conf:(0.95)

17. marital-status= Never-married native-country= United-States 8876 ==> income= <=50K 8435    conf:(0.95)

18. age=0_41 marital-status= Never-married hours-per-week=34_66 native-country= United-States 5571 ==> income= <=50K 5289    conf:(0.95)

19. marital-status= Never-married race= White 8036 ==> income= <=50K 7622    conf:(0.95)

20. age=0_41 marital-status= Never-married race= White hours-per-week=34_66 5065 ==> income= <=50K 4803    conf:(0.95)

21. marital-status= Never-married race= White native-country= United-States 7489 ==> income= <=50K 7092    conf:(0.95)

22. workclass= Private marital-status= Never-married hours-per-week=34_66 5693 ==> income= <=50K 5390    conf:(0.95)

23. workclass= Private marital-status= Never-married hours-per-week=34_66 native-country= United-States 5103 ==> income= <=50K 4820    conf:(0.94)

24. marital-status= Never-married sex= Male 5414 ==> income= <=50K 5107    conf:(0.94)

25. marital-status= Never-married sex= Male native-country= United-States 4900 ==> income= <=50K 4612    conf:(0.94)

26. marital-status= Never-married hours-per-week=34_66 6994 ==> income= <=50K 6552    conf:(0.94)

27. marital-status= Never-married hours-per-week=34_66 native-country= United-States 6330 ==> income= <=50K 5916    conf:(0.93)

28. marital-status= Never-married race= White hours-per-week=34_66 5727 ==> income= <=50K 5339    conf:(0.93)

29. marital-status= Never-married race= White hours-per-week=34_66 native-country= United-States 5296 ==> income= <=50K 4925    conf:(0.93)

30. age=0_41 workclass= Private sex= Female 5279 ==> income= <=50K 4859    conf:(0.92)

31. age=0_41 relationship= Not-in-family 5038 ==> income= <=50K 4622    conf:(0.92)

32. age=0_41 sex= Female native-country= United-States 5853 ==> income= <=50K 5325    conf:(0.91)

33. age=0_41 sex= Female 6382 ==> income= <=50K 5805    conf:(0.91)

34. workclass= Private relationship= Not-in-family 5899 ==> income= <=50K 5343    conf:(0.91)

35. workclass= Private sex= Female 7642 ==> income= <=50K 6921    conf:(0.91)

36. age=0_41 workclass= Private education= HS-grad 5076 ==> income= <=50K 4591    conf:(0.9)

37. workclass= Private sex= Female native-country= United-States 6926 ==> income= <=50K 6264    conf:(0.9)

38. workclass= Private relationship= Not-in-family native-country= United-States 5397 ==> income= <=50K 4872    conf:(0.9)

39. workclass= Private relationship= Not-in-family race= White 5130 ==> income= <=50K 4627    conf:(0.9)

40. age=0_41 race= White sex= Female 5156 ==> income= <=50K 4649    conf:(0.9)

 

 

Interesting rules

·        age=0_41 workclass= Private marital-status= Never-married native-country= United-States 6674 ==> income= <=50K 6457    conf:(0.97)

·        age=0_41 marital-status= Never-married sex= Male 4904 ==> income= <=50K 4684    conf:(0.96)

·         age=0_41 workclass= Private education= HS-grad 5076 ==> income= <=50K 4591    conf:(0.9)

·         age=0_41 workclass= Private sex= Female 5279 ==> income= <=50K 4859    conf:(0.92)

 

 

Rule Evaluation

by analyzing above selected rules we can identify younger people get an income less than or equal to $50k and young people who educated up to high school level and work in private sector get income less than or equal to $50k. 4th selected rule says that there is 92% confidence in young females who work in private sector get income less than or equal to $50k.

 

 

Use of rules

 

Using those rules can identify that young employees need some capacity building programs to enhance their work experience and gain more income in their young age. Also, female employees need to encourage for achieve successful carrier.


Thursday, May 14, 2020

How Computer Hard Disks Save data


coinpayu

We use hard disk drives (HDD) to store digital data. Not only in computers, CCTV DVRs, modern photocopy machines and etc. use hard disks to store data. 
In this article I will show you how Magnetic disk hard drives work, their Geometry, and how data store in that kind of a hard disk.
If You open the front cover of a Magnetic disk hard drive, you can see something like figure 01. The organization of components may differ from one hard disk model to another, but the components and their functionality is same for all.

Figure 01

Usually Computers save data in binary form (1 and 0). This type of single binary digit equals to 1 Bit. We know 8 Bits equals to 1 byte and 1024 bytes equals to 1 kilo byte and vice versa. Data is stored in the platters. Those data can be read from the read/ write head and pass them to the logic board. When data is received, that data will be placed in platters by the read/ write head. this is the functionality of a hard disk in very simple way. Now lets dig deep.
Platter is a CD shaped round object which is made with most of the time aluminium. A magnetic media is coated on this kind of disks. A hard disk may contain one or more platters. It depends on the capacity and the model of the hard drive.The platter or platters in a hard drive are attached to a motor and they spin in high RPM(Rounds per minutes). The RPM value of a hard disk also may differ from one hard disk to another.
The data actually record in a platter as magnetic patterns. The coated magnetic media on a disk act as microscopic magnetic metal grid something like in Figure 02.

Figure 02

These tiny grids arrange in to groups to store the magnetic pattern of the data. Figure 03.

Figure 03

This type of group is known as a Bit.

Figure 04

Each group has its own magnetization out of one from two possible states.Figure 05 shows how groups get magnetize and how the whole group indicate the state of its magnetization.
Figure 05

This is the mechanism the Magnetic disk hard drive stores 0s and 1s. The magnetization state of a group represent either 0 or 1. See figure 06.

Figure 06

The functionality of the read/ write header is simple. It read and write data from and to the platters. Header is capable of magnetize groups (which stores one Bit) with one of the possible state out of the two states. 

So for the simplicity think like this. With an elector magnet we can change the magnet field direction by changing the DC current direction to the elector magnet. The header in the HDD uses this feature. Basically 1 and 0 in computer hardware id either on or off. When binary signals received to the interface board it convert them to a format which header can handle. Then current passes through the header in one direction to magnetize a group with one of the possible state out of the two states, and when current passes in other direction through header it magnetize a group with other state. figure 07 and 08.

Figure 07
Figure 07

Figure 08
Figure 08

This is the mechanism of a Hard disk of saving data. In another article we will discuss how Solid State Drives Save data. We will discuss about the sectors later in another article.


Thursday, May 7, 2020

Object Tracking Automatic Camera


coinpayu


Chapter 1

Introduction

Because of the advance in surveillance systems, object tracking has been an active research topic in the computer vision community over the last two decades as it is an essential prerequisite for analyzing and understanding video data. Tracking is usually performed in the context of higher-level applications, which in turn require the location or shape or color of the object in every captured frame. Accordingly, several assumptions should be considered to constrain the tracking problem for a particular application.

A great deal of interest in the field of object tracking has been generated due to (i) the recent evolution of high-speed computers, (ii) the availability of high quality and inexpensive sensors (video cameras), and (iii) the increasing demand for an automated real-time video analysis. Tracking an object is to get its spatial-tempo information by estimating its trajectory in the image plane as it moves around a scene, which in turn helps to study and predict its future behavior. There are three main steps in video analysis: detection of interesting moving objects, tracking those objects from frame to frame, and analysis of the recognized object tracks to detect their behavior. Some examples for object tracking include the tracking of anomalies movements of people in security systems, tracking of rockets in a military defense system, customers in a market, monitoring child or patient from remote location, players in a sport game, lane detection and obstacles detection systems in semi-automated vehicles and cars in a street. Tracking these objects helps in military defense, goods arranging, sport strategies, security field and traffic control, respectively. Basically, tracking can be done by one or multiple types of sensors. Radar, sonar, infrared, laser radar, and video cameras are the common used tracking sensors.

Enhancement of object tracking systems is building up a pan tilt moving cameras based on the movements of the detected object by combining the object tracking and computer vision technologies with microcontrollers. These systems are capable of continue the tracking even though the object runs away from the boundaries of the normal still camera. Most of modern surveillance systems use pan tilt moving tracking cameras to keep eye on specific object to a wide range. Basically these systems are capable of track a specific object along X axis from 0 degrees to 180 degrees and also along Y axis from 0 degrees to 180 degrees. This is a huge range when comparing to a still tracking camera system.

1.1. Background

Manufacturing of object tracking pan tilt cameras are the biggest technology revolution in surveillance systems. Most of existing systems are designed with advanced hardware technologies, and use complex computer vision software solution. So these systems are much expensive and too complex to handle. Most of the designers use static video camera for the manufacturing of surveillance systems, because of the high cost for the technology and the complexity. The goal of Object Tracking Automatic Camera is to introduce a low cost, more efficient and accurate solution for existing object tracking systems.

1.2. Motivation

Object tracking pan tilt camera is one of the most highlighted research areas in the last two decades. So I need to do my final year research project with interesting new topic that there are no exactly low cost solutions, which combine computer vision object tracking with hardware controlling, in the world. The reason is then I should not be in the frame because there is no international standard for the research. I could be able to develop my own algorithms for control the hardware by using the signals passing from the PC, which are generated by processing the input video from the video camera.

1.3. Goals

An Object Tracking Automatic Camera introduces a regular object tracking system with an enhancement of, making the camera to rotate along X and Y axis. It is a low cost solution for costly pan tilt moving object tracking systems. This system is designed to improve the current object tracking systems in the following ways.

  1. Track the object which is pointed by user by color
  2. Always place the moving object at the center of the screen, by rotating the camera to the direction where the object moves
  3. Save the video for future analysis

Achievement in brief

This dissertation presents a pan tilt moving object tracking camera system. This system takes a video input from the camera. Desktop application shows the captured video to the user. Then he/she can point any object in the video. Then application processes the video and isolates the pointed object. It calculates the position of the object. Send the position to the Microcontroller. Microcontroller rotates the servo motor mechanism, which the camera is attached to keep the pointed object at the center of the display screen. These all steps in project successfully completed.

1.5. Structure of the dissertation

The main purpose of this dissertation is presenting the overall description on the topic of “Object Tracking Automatic camera”. It is organized in the following chapters

  • Chapter 1 - Introduction gives the general overview of the proposed Object Tracking Automatic camera. And it includes the project background, motivation, goals and achievement in brief.
  • Chapter 2 - Literature survey. There are some researches and project done on the topic object tracking and object tracking with single pan tilt camera. This chapter gives a summary of those projects.
  • Chapter3 - The purpose of the chapter three is giving a description of methodology that used to develop the system
  • Chapter 4 - The design of the object tracking automatic camera.
  • Chapter 5 - Description of the implementation of the object tracking automatic camera.
  • Chapter 6 - Results of the system by presenting by the experiment. And also discuss the limitation of final developed system
  • Chapter 7 - Includes Conclusions and discusses the improvement could be achieved in the future research.

Chapters : 1 2 3 4 5 6 7

Featured Post

Data recovery of CCTV DVR systems which have Proprietary OS and Proprietary file systems - Literature Review

Abstract The purpose of this study is to explore some different ways of extracting data from closed-circuit television (CCTV) Digital video...

Popular Posts