After reading the chapter by Capri (2015) on manual data collection. Answer the following questions:
What were the traditional methods of data collection in the transit system?
Why are the traditional methods insufficient in satisfying the requirement of data collection?
Give a synopsis of the case study and your thoughts regarding the requirements of the optimization and performance measurement requirements and the impact to expensive and labor-intensive nature.
In an APA7 format answer all questions above. There should be headings to each of the questions above as well. Ensure there are at least two-peer reviewed sources to support your work. The paper should be at least 2 pages of content (this does not include the cover page or reference page).
In: Data Mining ISBN: 978-1-63463-738-1
Editor: Harold L. Capri 2015 Nova Science Publishers, Inc.
Chapter 1
TRANSIT PASSENGER ORIGIN INFERENCE
USING SMART CARD DATA AND GPS DATA
Xiaolei Ma1, Ph.D. and Yinhai Wang
2
, Ph.D.
1
School of Transportation Science and Engineering,
Beihang University, Beijing, China
2
Department of Civil and Environmental Engineering,
University of Washington, Seattle, WA, US
ABSTRACT
To improve customer satisfaction and reduce operation costs, transit
authorities have been striving to monitor their transit service quality and
identify the key factors to attract the transit riders. Traditional manual
data collection methods are unable to satisfy the transit system
optimization and performance measurement requirement due to their
expensive and labor-intensive nature. The recent advent of passive data
collection techniques (e.g., Automated Fare Collection and Automated
Vehicle Location) has shifted a data-poor environment to a data-rich
environment, and offered the opportunities for transit agencies to conduct
comprehensive transit system performance measures. Although it is
possible to collect highly valuable information from ubiquitous transit
data, data usability and accessibility are still difficult. Most Automatic
Fare Collection (AFC) systems are not designed for transit performance
monitoring, and additional passenger trip information cannot be directly
Email: [emailprotected]
C
o
p
y
r
i
g
h
t
2
0
1
4
.
N
o
v
a
S
c
i
e
n
c
e
P
u
b
l
i
s
h
e
r
s
,
I
n
c
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
M
a
y
n
o
t
b
e
r
e
p
r
o
d
u
c
e
d
i
n
a
n
y
f
o
r
m
w
i
t
h
o
u
t
p
e
r
m
i
s
s
i
o
n
f
r
o
m
t
h
e
p
u
b
l
i
s
h
e
r
,
e
x
c
e
p
t
f
a
i
r
u
s
e
s
p
e
r
m
i
t
t
e
d
u
n
d
e
r
U
.
S
.
o
r
a
p
p
l
i
c
a
b
l
e
c
o
p
y
r
i
g
h
t
l
a
w
.
EBSCO Publishing : eBook Collection (EBSCOhost) – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS
AN: 956104 ; Ma, Xiaolei, Capri, Harold L..; Data Mining: Principles, Applications and Emerging Challenges
Account: s8501869.main.ehost
Xiaolei Ma and Yinhai Wang 2
retrieved. Interoperating and mining heterogeneous datasets would
enhance both the depth and breadth of transit-related studies. This study
proposed a series of data mining algorithms to extract individual transit
riders origin using transit smart card and GPS data. The primary data
source of this study comes from the AFC system in Beijing, where a
passengers boarding stop (origin) and alighting stop (destination) on a
flat-rate bus are not recorded on the check-in and check-out scan. The bus
arrival time at each stop can be inferred from GPS data, and individual
passengers boarding stop is then estimated by fusing the identified bus
arrival time with smart card data. In addition, a Markov chain based
Bayesian decision tree algorithm is proposed to mine the passengers
origin information when GPS data are absent. Both passenger origin
mining algorithms are validated based on either on-board transit survey
data or personal GPS logger data. The results demonstrates the
effectiveness and efficiency of the proposed algorithms on extracting
passenger origin information. The estimated passenger origin data are
highly valuable for transit system planning and route optimization.
Keywords: Automated fare collection system, transit GPS, passenger origin
inference, Bayesian decision tree, Markov chain
INTRODUCTION
According to the Census of 2000 in the United States, approximately 76%
people chose privately owned vehicles to commute to work in 2000 (ICF
consulting, 2003). Recent studies conducted by the 2009 American
Community Survey indicate 79.5% of home-based workers drive alone for
commuting (McKenzie and Rapino, 2009). Many developing countries, e.g.,
China, also rely on privately owned vehicles to commute. For example, more
than 34% of the Beijing residents chose cars as their primary travel mode
while only 28.2% chose transit in 2010 (Beijing Transportation Research
Center, 2012). Public transit has been considered as an effective
countermeasure to reduce congestion, air pollution, and energy consumption
(Federal Highway Administration, 2002). According to 2005 urban mobility
report conducted by Texas Transportation Institute (2005), travel delay in
2003 would increase by 27 percent without public transit, especially in those
most congested metropolitan cites of U.S., public transit services have saved
more than 1.1 billion hours of travel time. Moreover, public transit can help
enhance business, reduce city sprawl through the transit oriented development
(TDO). During certain emergency scenarios, public transit can even act as a
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Transit Passenger Origin Inference Using Smart Card Data 3
safe and efficient transportation mode for evacuation (Federal Highway
Administration, 2002). Based on the aforementioned reasons, it is of critical
importance to improve the efficiency of public transit system, and promote
more roadway users to utilize public transit. To fulfill these objectives, transit
agencies need to understand the areas where improvements can be further
made, and whether community goals are being met, etc. A well-developed
performance measure system will facilitate decision making for transit
agencies. Transit agencies can evaluate the transit ridership trends with fare
policy changes and identify where and when better transit service should be
provided. In addition, transit agencies are also required to summarize transit
performance statistics for reporting to either the National Transit Database
(Kittelson & Associates et al., 2003), or the general public who are interested
knowing how well transit service is being provided. Nevertheless, developing
a set of structured performance measures often requires a large amount of data
and the corresponding domain knowledge to process and analyze these data.
These obstacles create challenges for transit agencies to spend time and effort
undertaking. Traditionally, transit agencies heavily rely on manual data
collection methods to gather transit operation and planning data (Ma et al.,
2012). However, traditional data collection methods (e.g., travel diary, survey,
etc.) are fairly costly and difficult to implement at a multiday level due to their
low response rate and accuracy. Transit agencies have spent tremendous
manpower and resource undertaking manual data collections, and consumed a
significant amount of energy and time to post-process the raw data. With
advances in information technologies in intelligent transportation systems
(ITS), the availability of public transit data has been increasing in the past
decades, which has gradually shifted public transit system into a data-rich
paradigm. Automatic Fare Collection (AFC) system and Automatic Vehicle
Track (AVL) system are two common passive data collection methods. AFC
system, also known as Smart Card system, records and processes the fare
related information using either contactless or contact card to complete the
financial transaction (Chu, 2010). There exist two typical types of AFC
systems: entry-only AFC system and distance-based AFC system. In the entry-
only AFC system, passengers are only required to swipe their smart cards over
the card reader during boarding, while passengers need to check in and check
out during both their boarding and alighting procedures for the distance-based
AFC system. AVL and AFC technologies hold substantial promise for transit
performance analysis and management at a relative low cost. However,
historically, both AVL and AFC data have not been used to their full
potentials. Many AVL and AFC systems do not archive data in a readily
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Xiaolei Ma and Yinhai Wang 4
utilized manner (Furth, 2006). AFC system is initially designed to reduce
workloads of tedious manual fare collections, not for transit operation and
planning purposes, and thereby, certain critical information, such as specific
spatial location for each transaction, may not be directly captured. AVL
system tracks transit vehicles geospatial locations by Global Positioning
System (GPS) at either a constant or varying time interval. The accuracy of
GPS occasionally suffers from signal loss due to tall building obstructions in
the urban area (Ma et al., 2011). Both of the AFC system and AVL system
have their inherent drawbacks in monitoring transit system performance, and
require analytical approaches to eliminate the erroneous data, remedy the
missing values, and mine the unseen and indirect information.
The remainder of this paper is organized as follows: transit smart card data
and GPS data are described in the section 2. Based on these data sets, a data
fusion method is initially proposed to integrate with roadway geospatial data
to estimate transit vehicles arrival information. And then, a Bayesian decision
tree algorithm is presented to estimate each passengers boarding stop when
GPS data are unavailable. Considering the expensive computational burden of
decision tree algorithms, Markov-chain property is taken into account to
reduce the algorithm complexity. On-board survey and GPS data from the
Beijing transit system are used to test and verify the proposed algorithms.
Conclusion and future research efforts are summarized at the end of this paper.
RESEARCH BACKGROUND
Data from AFC system and AVL system are the two primary sources in
this study. Beijing Transit Incorporated began to issue smart cards in May 10,
2006. The smart card can be used in both the Beijing bus and subway systems.
Due to discounted fares (up to 60% off) provided by the smart card, more than
90% of the transit riders pay for their transit trips with their smart cards in
2010 (Beijing Transportation Research Center, 2010). Two types of AFC
systems exist in Beijing transit: flat fare and distance-based fare. Transit riders
pay at a fixed rate for those flat fare buses when entering by tapping their
smart cards on the card reader. Thus, only check-in scans are necessary. For
the distance-based AFC system, transit riders need to swipe their smart cards
during both check-in and check-out processes. Transit riders need to hold their
smart cards near the card reader device to complete transactions when entering
or exiting buses. Smart card can be used in Beijing subway system as well,
where passengers need to tap their smart card on top of fare gates during
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Transit Passenger Origin Inference Using Smart Card Data 5
entering and existing subway stations. Both boarding and alighting
information (time and location) are recorded by the fare gates. Although transit
smart card exhibits its superiority on its convenience and efficiency, there are
still the following issues to prevent transit agencies fully taking advantages of
smart card for operational purposes:
Passenger boarding and alighting information missing
Due to a design deficiency in the smart card scan system, the AFC system
on flat fare buses does not save any boarding location information, whereas
the AFC system stores boarding and alighting location, except for boarding
time information on distance-based fare buses. Key information stored in the
database includes smart card ID, route number, driver ID, transaction time,
remaining balance, transaction amount, boarding stop (only available for
distance-based fare buses), and alighting stop (only available for distance-
based fare buses).
Massive data sets
More than 16 million smart card transactions data are generated per day.
Among these transactions, 52% are from flat-rate bus riders. These smart card
transactions are scattered in a large-scale transit network with 52386 links and
43432 nodes as presented in figure 1:
Figure 1. Beijing Transit GIS Network.
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Xiaolei Ma and Yinhai Wang 6
Limited external data with poor quality
Only approximate 50% of transit vehicles in Beijing are equipped with
GPS devices for tracking. GPS data are periodically sent to the central server
at a pre-determined interval of 30 seconds. However, the collected GPS data
suffer from two major data quality issues: (1) vehicle direction information is
missing; (2) GPS points fluctuation (Lou, et al., 2009). Map matching
algorithms are needed to align the inaccurate GPS spatial records onto the road
network. In addition, most of transit routes are not designed to have fixed
schedules because of high ridership demands, and only certain routes with a
long distance or headway follow schedules at each stop (Chen, 2009). The
above characteristics of the Beijing AFC and AVL systems create more
challenges to process and mine useful information.
It is noteworthy that the AFC system used in Beijing is not a unique case.
Most cities in China also employ the similar AFC system where passengers
origin information is absent, such as Chongqing City (Gao and Wu, 2011),
Nanning City (Chen, 2009), Kunming City (Zhou et al., 2007). In other
developing countries, such as Brazil, AFC system does not record any
boarding location information as well (Farzin, 2008). Therefore, a solution for
passenger boarding and alighting information extraction is beneficial to those
transit agencies with imperfect SC data internationally.
TRANSIT PASSENGER ORIGIN INFERENCE
Because smart card readers in the flat-rate buses do not record passengers
boarding stops, it is desired to infer individual boarding location using smart
card transaction data. In this section, two primary approaches are presented to
achieve this goal. Approximately 50% transit vehicles are equipped with GPS
devices in Beijing entry-only AFC system. Therefore, a data fusion method
with GPS data, smart card data and GIS data is firstly developed to estimate
each buss arrival time at each stop and infer individual passengers boarding
stop. And then, for those buses without GIS devices, a Bayesian decision tree
algorithm is proposed to utilize smart card transaction time and apply
Bayesian inference theory to depict the likelihood of each possible boarding
stop. In order to expand the usability of proposed Bayesian decision tree
algorithm in large-scale datasets, Markov chain optimization is used to reduce
the algorithms computational complexity. Both two transit passenger origin
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Transit Passenger Origin Inference Using Smart Card Data 7
inference algorithms are validated using external data (e.g., on-board survey
data and GPS data).
Passenger Origin Inference with GPS Data
In the first step, a GPS-based arrival information inference algorithm is
presented to estimate the arrival time for each transit stop, and then, the
inferred stop-level arrival time will be matched with the timestamp recorded in
AFC system. The temporally closest smart card transaction record will be
assigned with each known stop ID. The logic flow chart is demonstrated in
Figure 2. The major data processing procedure will be detailed below.
Figure 2. Flow Chart for Passenger Origin Inference with GPS Data.
Bus Arrival Time Extraction
Three primary data sources are involved in the passenger information
extraction: vehicle GPS data; transit stop spatial location data; and flat-fare-
based smart card transaction data. A transit GIS network contains the
geospatial location of each stop for any transit routes. The GPS device
mounted in the bus can record each buss location and timestamp every 30
seconds, but the data quality of collected GPS records is not satisfying: No
directional information is recorded in Beijing AVL system; GPS points are off
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Xiaolei Ma and Yinhai Wang 8
the roadway network due to the satellite signal fluctuation. Data preprocessing
is required prior to bus arrival time estimation. A program is written to parse
and import raw GPS data into a database in an automatic manner. Key fields
of a GPS record are shown in Table 1.
Table 1. Examples of GPS raw data
Vehicle ID Date time Latitude Longitude Spot speed Route ID
00034603
2010-04-07
09:28:57
39.73875 116.1355 9.07 00022
00034603
2010-04-07
09:29:27
39.73710 116.1358 14.26 00022
00034603
2010-04-07
09:29:58
39.73592 116.1357 19.63 00022
00034603
2010-04-07
09:30:28
39.73479 116.1357 0 00022
00034603
2010-04-07
09:30:58
39.73420 116.1357 3.52 00022
The first step is to estimate the bus arrival time for each stop by joining
GPS data and the stop-level geo-location data. A buffer area can be created
around each particular stop for a certain transit route using the GIS software.
Within this area, several GPS records are likely to be captured. However,
identifying the geospatially closest GPS record to each particular stop is
challenging since there could be a certain number of unknown directional GPS
records within the specified buffer zone. Thanks to the powerful geospatial
analysis function in GIS, each link (i.e., polyline) where each transit stop is
located is composed of both start node and end node, and this implies that the
directional information for each GPS record is able to infer by comparing the
link direction and the direction changes from two consecutive GPS records.
With the identified direction, the distance from each GPS point to this
particular stop can be calculated, and the timestamp with the minimum
distance will be regarded as the bus arrival time at the particular stop. Figure 2
visually demonstrates the above algorithm procedure. Inbound stop represents
the physical location of a particular transit stop, and this stop is snapped to a
transit link, whose direction is regulated by both a start node and an end node.
By comparing the driving direction from GPS records with the link direction,
the nearest GPS records to this particular stop can be identified, and marked by
the red five-pointed star on the map. The timestamp associated with this five-
pointed star will be considered as the arrival time for this inbound stop. The
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Transit Passenger Origin Inference Using Smart Card Data 9
merit of the bus arrival time estimation algorithm lies in its efficiency. Rather
than searching all the GPS data to identify the traveling direction for each stop,
the proposed algorithm shrinks down the searching area, and filters out those
unlikely GPS data. The operation greatly alleviates the computational burden,
and is relatively easy to implement in the large-scale datasets, which is
particularly critical to process the tremendous amount of datasets within an
acceptable time period.
Figure 3. Boarding Time Estimation with GPS Data and Transit Stop Location Data.
Passenger Boarding Location Identification with Smart Card Data
For each smart card data transaction record, the boarding stop can be
estimated by matching the recorded timestamp and the identified bus arrival
time. As presented in Figure 4, for each smart card transaction record, the
transaction time is compared with the inferred bus arrival time at each stop.
This record will be assigned to a particular stop where the bus arrival time is
the most temporally closed with its transaction time. Since passengers begin to
embark the bus at a relative short time interval, this data fusion method is able
to capture almost all missing boarding stops.
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Xiaolei Ma and Yinhai Wang 10
Figure 4. Boarding Stop Identification with Bus Arrival Time.
In addition, because all the arrival time for all stops of a particular transit
route can be estimated, the average travel time between two adjacent stops can
be calculated as well. This speed statistics is not only critical for transit
performance measures, but also provides prior information for passenger
origin inference when GPS data are absent.
Validation
Compared with bus arrival time, door opening time can be more
accurately matched with smart card transaction time. This is because each bus
may not exactly stop at each transit stop for passenger boarding. The inferred
bus arrival time is subject to incur errors when it is used to match with smart
card data. To validate the accuracy of the proposed data fusion algorithm for
passenger origin inference, on-board transit survey was undertaken to collect
bus door opening time and arrival location for each stop of route 651 on
January, 13th, 2013. Hand holding GPS devices were used to track the
geospatial location of moving buses every 15 seconds. The survey duration
was from 8:00 AM to 1: 00 PM, and a total of 75 bus door opening time was
manually recorded. These bus door opening time records were then compared
with smart card transactions from 417 passengers, and these estimated stops
can be considered as the ground-truth data. By comparing the ground-truth
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Transit Passenger Origin Inference Using Smart Card Data 11
data with the results from the proposed GPS data fusion approach, 406
boarding stops were accurately inferred and 11 boarding stops differ from the
ground-truth data within one-stop-error range. The proposed algorithm
demonstrates its accuracy as high as 97.4%.
Passenger Origin Inference with Smart Card Data
There are still a fair amount of buses without GPS devices, and thus the
bus arrival time at each transit stop is not directly measured. However, most
passengers scan their cards immediately when boarding and almost all
passengers should complete the check-in scan before arriving to the next stop.
This indicates that the first passengers transaction time can be safely assumed
as the group of passengers boarding time at the same stop. The challenge is
then to identify the bus location at the moment of the SC transaction so that we
can infer the onboard stop for that passenger. However, this is not easy
because the SC system for the flat-rate bus does not record bus location. We
know the time each transaction occurred on a bus of a particular route under
the operation of a particular driver, but nothing else is known from the SC
transaction database. Nonetheless, we are able to extract boarding volume
changes with time and passengers who made transfers. By mining these data
and combining transit route maps, we may be able to accomplish our goal.
Therefore, a two-step approach is designed for passenger origin data
extraction: smart card data clustering and transit stop recognition. To
implement the proposed algorithm in an efficient manner, a Markov Chain
based optimization approach is applied to reduce the computational
complexity.
Smart Card Data Clustering
Transaction Data Classification
First of all, we need to sort SC transactions by the transit vehicle number.
This results in a list of SC transactions in the vehicle for the entire period of
operations for each day. During the operational period, the vehicle may have
two to ten round-trip runs depending on the round-trip length and roadway
condition. At a terminal station, a transit vehicle may take a break or continue
running. So there is no obvious signal for the end of a trip (a trip is defined as
the journey from one terminus to the other terminus). Meanwhile, there are a
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Xiaolei Ma and Yinhai Wang 12
varying number of passengers at each stop, including some stops with no
passengers.
For stops with several passengers boarding, all transactions can be
classified into one group based on interval between their transactions. Thus,
the clustered SC transactions can be represented by a time series of check-in
passenger volumes at stops as shown in Table 2.
Table 2. Examples of Clustered SC transactions
Transaction
Cluster No.
Stop ID
Stop
Name
Total
Transactions
Transaction
Timestamp
Time
Difference
1 Unknown Unknown 18 5:26:36 0:14:26
2 Unknown Unknown 9 5:41:02 0:03:16
3 Unknown Unknown 11 5:44:18 0:04:35
4 Unknown Unknown 27 5:48:53 0:01:00
In Table 2, total transactions indicate the total boarding passengers in one
stop; transaction timestamp is recorded as the time when the first passenger
boards in this stop, and time difference means the elapsed time between the
boarding time at this stop and next stop with boarding passengers. Unlike most
entry-only AFC systems in the United States, stop name and ID from each
transaction are unknown in Beijings AFC system. Most buses in service
follow the predefined order of stops, however, it is still possible that there is
no passenger boarding in a specific stop, and thus two consecutive SC
transaction clusters do not necessarily correspond to two physically
consecutive stops. Obviously, this further complicates the situation and the
algorithm needed is indeed to map each cluster into the corresponding
boarding stop ID.
In summary, the smart card data clustering algorithm contains three steps
as follows:
Step 1: All transaction data for each bus are sorted by the transaction
timestamp in an ascending order.
Step 2: For two consecutive records, if their transaction time difference is
within 60 sec, then, these two transactions are included in one cluster;
otherwise, another cluster is initiated.
Step 3: If the transaction time difference for two consecutive records is
greater than 30 min or driver changing occurs, it is likely that the bus has
arrived in terminus, and for this bus, one bus trip has completed. Next record
will be the beginning for the next bus trip.
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Transit Passenger Origin Inference Using Smart Card Data 13
The result of the clustering process is several sequences of clustered
transactions. Each sequence may contain one or more trips of the transit
vehicle. For particular routes, due to the limited space in terminus or busy
transit schedule, bus layover time may be too short to be used as a separation
symbol for trips. Such buses may have a very long clustered sequence that
makes the pattern discovery process very challenging. Furthermore, unfamiliar
passengers or passengers boarding from the check-out doors (this happens for
very crowded buses) may take longer than 60 seconds to scan their cards. The
delayed transaction may cause cluster assignment errors. Again, this adds extra
challenge to the follow-up passenger origin extraction process.
Transaction Cluster Sequence Segmentation
Beijing has a huge transit network with nearly 1,000 routes. It is quite
common to see passengers transfer between transit routes. Through transfer
activity analysis, we can further segment the clustered transaction sequence
into shorter series to reduce the uncertainty in passenger OD estimation (Jang,
2010). Two key principles used in the transfer stop identification are:
(1) We assume the alighting stop in the previous route is spatially and
temporally the closest to the boarding stop for the next route. This is
reasonable because most passengers choose the closest stop for transit
transfer within a short period of time (Chu, 2008). Assume a
passenger k makes a transfer from route i to route j within n minutes.
If route i is a distance-based-rate bus line or a subway line, then we
can identify the transfer station that is also the boarding stop of route
j. Even if both routes are flat-rate bus routes, if the transferring
location is unique, we can still use the transfer information to identify
the transfer bus stop ID and name. In this study, the transfer time
duration n is 30 minutes, and the maximum distance between two
transfer stops is 300 meters.
(2) We assume that both the alighting time and the boarding time for each
particular stop is similar. In this case, we can substitute a passenger
boarding stop with another passenger alighting stop. Assume a
passenger k makes a transfer from route i to route j. If route j is a
subway line, where both its boarding location and time are available,
then we can estimate the passenger ks alighting stop of route i, and
this alighting stop can be also considered as the boarding stop for
those passengers who get on the bus at the same time.
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Xiaolei Ma and Yinhai Wang 14
Walk distance between the two stops should be taken into account for
inferring the time when the flat-rate bus arrives at the transfer stop. However,
several possible boarding stops may exist due to the unknown direction in the
flat-rate smart card transaction, and thus additional data mining techniques are
needed to find the boarding stop with the maximum likelihood. These data
mining techniques will be detailed in the next section.
Based on the identified transfer stops, we can further segment the
transaction cluster sequence into shorter cluster series. Each series is bounded
by either the termini or the identified bus stops. The segmented series of
transaction clusters will be used as the input for the subsequent transit stop
inference algorithm.
Data Mining for Transit Stop Recognition
Bayesian Decision Tree Inference
If we treat each segmented series of transaction cluster as an unknown
pattern, this unknown pattern can be considered as a sample of the sequential
stops on the bus route. If every stop has boarding passengers, this unknown
pattern is identical to the known bus stop sequence. Also, since distance and
speed limit between stops are known, travel time between stops is highly
predictable if there is no traffic jam. In reality, however, there may have
varying distribution of passengers boarding at any given stop and roadway
congestion may cost unpredictable delays. Therefore, the unknown pattern
recognition is a very challenging issue. Once the unknown pattern is
recognized, the boarding stop for any passenger becomes clear.
Bayesian decision tree algorithm is one of the widely used data mining
techniques for pattern recognition (Janssens et al.,
SHOW MORE…
WK1 STUDENT REPLIES
PLEASE SEE ATTACHMENT…….
STUDENT REPLIES
BELOW IS THE INSTUCTIONS ON HOW THE PROFESSOR WANTS THE STUDENT REPLIES ANSWERED BACK TO PLEASE FOLLOW THESE DIRECTIONS DOWN BELOW. AND IN YOUR STUDENT REPLIES DO NOT USE THE WORD
I AGREE OR NICE JOB
THE PROFESSOR DONT WANT TO SEE THOSE WORDS IN THE STUDENT REPLIES. ALSO, BOTH STUDENT REPLIES NEED TO BE 250 OR MORE WORD COUNT USING TEXTBOOK AND OUTSIDE REFERENCES REMEMBER YOU CAN PULL REFERENCES FROM THE ATTACHED READING THAT I PREVOUSLY ATTACHED FOR YOU THIS WEEK. THANK YOU AND LET ME KNOW IF YOU NEED ANYMORE CLARIFICATION ON THIS ASSIGNMENT…
MAKE SURE YOU PUT THE NAME WITH EACH STUDENT SO I WILL KNOW WHICH ONE GOES WITH WHO.
Respond to at least one of your colleagues’ postings. Respond in one or more of the following ways:
Ask 1 probing question.
Share an insight from having read your colleague’s posting.
Offer and support an opinion.
Validate an idea with your own experience.
Make a suggestion.
Expand on your colleague’s posting.
STUDENT REPLY #1Taylor Carrion
Danny Benson was a 15-year-old 4-foot-9 male attending George Warshaw High School he was pretty much to himself and shy and quiet. He spent most of his time in the library with his head in books. He did not really have many friends except for one kid named Tony Danielson. Due to him being a loner and to some people a nerd and small for his age he would get bullied and harassed by multiple classmates. He tried to get help on multiple occasions and the school was not much help their bullying protocols were not as good or strict. So, one day he figured he would plan to shoot up the school to get revenge on the classmates and staff who did not help him in his time of need and to show them he is not to be messed with or that he has the power. So, on Friday afternoon is when he got his revenge, he walked into school somehow getting by their blind spots so no one could see what happened and started to shoot the classmates while at lunch and then headed to the office and started to shoot the staff as well. He killed 15 and injured 26 people and then finally became surrounded by the police after trying to escape even after trying to shoot at the police as well he gave and killed himself as he already knows that was the best thing that he could do now as he was surrounded. Danny would be considered a mass murderer because he killed more than 3 people and went on a killing spree at that specific time in his school. He did this out of revenge for being bullied and harassed by classmates on multiple occasions. There were no patterns and no repeated shootings it was just one mass killing. He was a revenge-oriented mass killer under a subgroup of school shooters. In the PDF Mass Murderer (n.d.), Bullying and teasing is most likely the main motivation behind the students violence. He was to get back at others who teased and bullied him. It was also stated, This was Dannys case where he plotted his revenge based on the teasing and bullying of his former classmates and the staff that did not help as much.
Next, Vlad Kauffer was a middle-aged Caucasian man in his late 40s an intelligent guy who worked in finance at a high-end bank. From the start of his childhood specifically with his mom who was not around much but open his eyes to a very sexual life and would be out late nights with random strangers due to her being a prostitute causing Vlad to take care of himself at an incredibly early age and caused anger against his mom and woman who reminded him of his mom. When she was around, she was an alcoholic and was very domineering. Vlad would stalk young women prostitutes convince them in his car take them back to his place torture, rape, and then kill them, and would dispose of the bodies by cutting them into pieces and spreading the pieces out in various locations. He wanted to rid the world of prostitutes just like his mom. He would be considered a serial killer based on the fact of the patterns, I.e., serial occasions of specific profiles of victims and killings. This was multiple victims in different periods of time and on multiple compassions compared to one big mass occasion. He would be considered mission-oriented and out for revenge. As stated by Fox & Levin (2005), The motive of power and control encompasses what earlier typologies have termed the mission-oriented killer (Holmes 7 Deburger, 1988), whose crimes are designed to further a cause. Through killing, he claims an attempt to rid the world of filth and evil, such as by killing prostitutes or the homeless. He wanted control of his life and wanted to rid anyone in that field of women who were prostitutes that reminded him of his neglectful and alcoholic mom.
Reference
Fox, J. A., & Levin, J. (2005). Defining multiple murder. In Extreme killing: Understanding serial and mass murder (pp. 1525). Thousand Oaks, CA: Sage Publications, Inc. Extreme Killing: Understanding Serial and Mass Murder. Copyright 2005 by Sage Publications via the Copyright Clearance Center.
Mass Murder. (n.d.).
https://class.waldenu.edu/bbcswebdav/courses/USW1.17952.202310/Mass%20Murder.pdf
STUDENT REPLY #2 Nicole Holmes
A serial murderer is the killing of 2 or more victims by the same offender in separate events.
Jeffery Dahmer is an example of a serial murderer.
He started killing in 1978 at the age of 18.
Dahmer killed at least 17 people that authorities found out about.
John Wayne Gracy was another serial murderer who was convicted of 33 counts of murder, with one of his younger victims being as young as 15.
Jeffery Dahmer would lure young boys to his home, sometimes acting as if he wanted to be friends with them. Only to add to his body count.
Some say he would sodomize the deceased victims.
Dahmer can be labeled as a serial murderer because he killed multiple people at different times in separate events. Dahmer also fits the description of a serial murderer due to the cooling-off periods he had. There were no killings known about on a daily basis.
Of course, profiles are not suitable in all cases, even in some murder cases (Holmes & Holmes, 1992, 2000). They are usually more efficacious in cases where the unknown perpetrator has displayed indications of psychopathology (Geberth, 2006; Holmes & Holmes, 2000). Crimes most appropriate for psychological profiling are those where discernable patterns are able to be deciphered from the crime scene or where the fantasy/motive of the perpetrator is readily apparent.
I think this is a great example of the Jeffery Dahmer case. His documentary shows that Dahmer first started his unusual acts with a manikin.
What differentiates mass murder from serial is also the timing and number of murders. Serial killers commit murder over long periods of time. Sometimes in different locations like that of Dahmer.
His killings were based on sexual homicide and sadistic sexual assaults.
While mass murders kill within a single time frame.
Serial killers differ in their motives for killing. Dahmer was a visionary in his plot for murder.
Reference
Holmes, R. and Holmes, S. (2008) Profiling Violent Crimes. 4th edn. SAGE Publications. PROFESSOR REPLY
PLEASE ANSWER THE PROFESSOR QUESTION BELOW BASED OFF OF YOUR WEEK 1 DISCUSSION THATS BELOW
Serial Killer and Mass Murderer
A serial killer entails a person who assassinates three or more individuals within a period exceeding a month, with resting time between murders. In this case, the murders are separate events that result from a psychological pleasure or thrill (Holmes & Holmes, 2009). Serial killers lack guilt and empathy, becoming egocentric individuals. The killers remain psychologically motivated and organized to commit murder. Serial killers employ a sanity mask to appear charming and ordinary while hiding their actual psychopathic tendencies. For instance, Ted Bundy was an appealing serial killer who methodically planned out murder (Stone, 2019). He would fake injuries to seem harmless to victims. He committed about thirty murders between 1974 and 1978 before his capture.
Mass murderers slay many people, usually at the same time, within a single location. For instance, James Holmes attacked and shot at a Colorado movie theater (Allely, 2020). As a result, he injured fifty-eight people and twelve individuals, making him a mass murderer. A psychiatry professor from Columbia argues that mass murderers comprise dissatisfied people with few friends and poor social skills. Generally, mass murderers motives are less apparent compared to serial killers. Professor Stone claims that males facilitate most mass murder cases, with most of them lacking clinical psychotic. Instead of remaining a sociopath like serial killers, mass murderers are distrustful persons with acute social and behavioral syndromes. Comparable to serial assassins, mass murderers exhibit psychopathic inclinations, including being uncompassionate, cruel, and manipulative. Nevertheless, most mass assassins are loners or social nonconformists whose actions result from triggers by some overpowering events.
Generally, mass murderers and serial killers often demonstrate similar manipulation characteristics and lack of empathy. Factors that distinguish the two involve the sum of murders as well as timing. Mass murderers assassinate people in a single time frame and location. On the other hand, serial killers often murder in different places and over a long period.
References
Allely, C. S. (2020). The contributory role of psychopathology and inhibitory control in the case of mass shooter James Holmes. Aggression and violent behavior, 51, 101382.
Holmes, R. M., & Holmes, S. T. (2009). Profiling violent crimes: An investigative tool (4th ed.). Thousand Oaks, CA: Sage Publications, Inc.
Stone, M. H. (2019). The place of psychopathy along the spectrum of negative personality types. In Psychoanalysts, psychologists and psychiatrists discuss psychopathy and human evil (pp. 82-105). Routledge.
PROFESSOR REPLY QUESTION
GO BACK TO WK1 ALL ATTACHED READING TO ANSWER BACK TO THIS QUESTION
In Chapter 1 of your textbook the author lists several crimes that are most suitable for the profiling. What are some of these crimes? Week 1 Test for Understanding
This 10-question, objective Test for Understanding will assess how well you understand and can apply the information in this week’s Learning Resources.
To prepare for the Test for Understanding:
Review the assigned Learning Resources.
About the Test for Understanding:
PLEASE HIGHLIGHT THE CORRECT ANSWER IN RED AND IF YOU GO BACK TO ALL THE READING, I HAVE POSTED FOR WEEK 1 IN THE LAST 2 ASSIGNMENT YOU SHOULD BE ABLE TO FIND THESE ANSWERS
QUESTION 1
As a result of a sexual fantasy, a man kills a series of women over a period of time to demonstrate his control over the victims. What is most likely the motivating reason for this murder?
Terror
Loyalty
Revenge
Power
QUESTION 2
The biggest difference between serial murderers and mass murderers is:
The number of victims
The lapse in time in between killings
The motivation for the killings
The selection of the victims
QUESTION 3
Suppose a criminal profiler is assisting law enforcement in the interrogation of potential suspects. This would most likely be an example of which of the three major goals of criminal profilers?
To provide the criminal justice system with a social and psychological assessment of the offender
To provide the criminal justice system with a psychological evaluation of the belongings found in the possession of the offender
To provide interviewing suggestions and strategies
To provide the criminal justice system with a hypothesis about where the potential serial murderer lives
QUESTION 4
A man kills seven people at his place of employment and then takes his own life. He would be considered a ___________.
mass murderer
serial murderer
spree murderer
suicide murderer
QUESTION 5
Typologies of mass and serial murderers are useful in constructing a profile of a murderer. In general, one of the most important characteristics of a crime scene in determining the type of murderer is __________.
the time of the murder
the victims’ characteristics
the weapons used in the killing
the location of the bodies
QUESTION 6
The serial killer Dennis Rader, known as the BTK Strangler, was unique among serial killers because:
He killed mostly women.
He killed some of the victims in their homes.
He killed victims near his place of residence.
Some of his killings were separated by long periods of time.
QUESTION 7
Inductive reasoning and deductive reasoning are the two major ways criminal profilers construct profiles of potential suspects. With ______________ logic, a criminal profiler conducts a thorough analysis of a crime scene and then based on the analysis, constructs an image of the unknown murderer.
inductive
deductive
inductive and deductive
deducible
QUESTION 8
Proponents of criminal profiling recognize that profiling is part art, but they also recognize that it is grounded in science. Which of the following statements reveals the science aspect of criminal profiling?
Criminal profilers use their intuition to create profiles.
Criminal profilers rely on their hunches to create profiles.
Criminal profilers rely on empirical research from criminology, sociology, and psychiatry to create profiles.
Criminal profilers rely on guesswork to create profiles.
QUESTION 9
Advocates of criminal profiling recognize that criminal profiling is appropriate for which of the following types of crimes?
Sexual homicide
Child molestation
Armed robbery
Burglary
QUESTION 10
Using typologies is at times difficult because not all serial and mass murderers fall neatly into one typology or another. Assume you have to classify the Virginia Tech killer into one particular type of mass murderer. All that you know about the killer is that he had a stockpile of weapons at his disposal. What type of mass murderer is the Virginia Tech killer?
The family annihilator
The disgruntled employee
The disciple
The pseudocommando 10/15/22, 5:24 PM SafeAssign Originality Report
https://class.waldenu.edu/webapps/mdb-sa-BBLEARN/originalityReportPrint?course_id=_17013007_1&paperId=5938498949&&attemptId=77dd46c7-2c5d-b92a-e149-d3f3383ccced&course_id=_17013007_1 1/8
USW1.17952.202310 – CRJS-3010-1-PROF SERIAL AND MASS MURD-2022-FALL-QTR-TERM-WKS-7-THRU-12-(10/10/2022-11/20/2022)-PT5
Assignment – Week 1
Jennifer Green
on Sat, Oct 15 2022, 6:14 PM
100% highest match
Submission ID: 77dd46c7-2c5d-b92a-e149-d3f3383ccced
Attachments (1)
WK1ASSGN GREEN J.docx
2
Criminal Profiling
Jennifer Green Walden University CRJS 3010 – 1
Brent Paterline
October 15, 2022
Historical Influences in Profiling
1 CRIMINAL PROFILING IS CRUCIAL WHEN LOOKING INTO CRIME SCENES. CRIMINAL
PROFILING HAS BEEN EVOLVING MORE AND MORE OVER TIME. CRIMINAL PROFILING IN
THE MODERN ERA IS MORE SCIENCE THAN ART. MOST MODERN PROFILING
TECHNIQUES ARE BASED ON SCIENCE. THEY DIFFER SIGNIFICANTLY FROM THE WAYS IN
WHICH PEOPLE ARE CURRENTLY PROFILED. HOWEVER, THE EVOLUTION OF THE
PRESENT-DAY PROFILING METHODS WAS IMPACTED BY THE PAST PROFILING
PRACTICES. 2 ONE OF THE CURRENT PROFILING TECHNIQUES, FOR INSTANCE, IS THE
GATHERING OF MEDICAL EVIDENCE. 1 IN THIS, A MEDICAL EXPERT IS HIRED TO
IDENTIFY THE OFFENDER’S PSYCHOLOGICAL AND BEHAVIORAL TRAITS (FRANCESE,
2019). TRADITIONAL PROFILING METHODS INCLUDED BEHAVIORAL PROFILING. IT IS
THE ONE THAT HAD AN IMPACT ON THE PRACTICE OF ACQUIRING CLINICAL EVIDENCE
PROFILING. FURTHERMORE, THE CURRENT METHODS OF PROFILING ARE GUIDED BY
THE CRITERIA FOR HISTORICAL PROFILING. Consider the idea that criminals may possess certain
physical traits. 2 THESE TRAITS ARE USED TO CATEGORIZE CRIMINALS INTO DIFFERENT
GROUPS (CHIFFLET, 2015). 1 THESE TRAITS CAN BE USED BY THE CRIMINAL PROFILER
TO DISTINGUISH BETWEEN MASS MURDERERS AND SERIAL KILLERS. 2 IN ORDER TO
IDENTIFY CRIMINAL PROPENSITIES, HISTORICAL PROFILING INCLUDED ASSESSMENTS
OF PERSONALITY AND MENTAL CAPACITY. 1 WITH THE EXCEPTION OF APPLYING
SCIENTIFIC TECHNIQUES, THE ASSESSMENT IS CARRIED OUT IN ACCORDANCE WITH
EXISTING PROFILING STANDARDS.
WK1ASSGN GREEN J.docx
Word Count: 664
Attachment ID: 5938498949
100%
http://safeassign.blackboard.com/
Highlight
10/15/22, 5:24 PM SafeAssign Originality Report
https://class.waldenu.edu/webapps/mdb-sa-BBLEARN/originalityReportPrint?course_id=_17013007_1&paperId=5938498949&&attemptId=77dd46c7-2c5d-b92a-e149-d3f3383ccced&course_id=_17013007_1 2/8
2 ROLES AND RESPONSIBILITIES OF PROFILERS
CRIMINAL PROFILERS PLAY A CRITICAL ROLE IN CRIMINAL INVESTIGATIONS. THEY
HELP LAW ENFORCEMENT PROFESSIONALS TO APPREHEND OFFENDERS. THE CRIMINAL
PROFILES THEY DEVELOP MAKE ARRESTING THE CRIMINALS EASIER. IN DOING THIS,
CRIMINAL PROFILER PERFORMS MANY ROLES AND RESPONSIBILITIES. FIRST, THEY
REVIEW THE EVIDENCE LYING AT THE CRIME SCENES. THEY REVIEW IT TO GET MANY
DETAILS ABOUT THE CRIME. THIS ALSO HELPS TO IDENTIFY THE CRIMINAL BEHAVIOR
OF THE CRIMINALS (FRANCESE, 2019). SECONDLY, THEY FIND AS MUCH INFORMATION
AS THEY CAN ABOUT SUSPECTS. THEY USE THIS INFORMATION TO DEVELOP A CRIMINAL
PROFILE. THIRDLY, CRIMINAL PROFILERS STUDY THE CRIME SCENE TO DETERMINE
THE BEHAVIOR PATTERNS OF THE OFFENDERS (HOLMES & HOLMES 2009).
CONSEQUENTLY, THEY WRITE THE REPORT, COMPILE DATA AND MAKE CONCLUSIONS
THAT THEY PROVIDE IN COURTS AS TESTIMONY. LAST BUT NOT LEAST, THESE
PROFESSIONALS ADVISE THE LAW ENFORCEMENT PROFESSIONALS ON THE
TECHNIQUES THEY SHOULD USE TO PURSUE CRIMINALS.
DETECTION AND APPREHENSION OF SERIAL AND MASS MURDERERS
Through their roles, goals, and responsibilities, criminal profilers help in the detection and apprehension of
criminals. 2 HOWEVER, HOW THEY DO THIS MIGHT AID THE APPREHENSION OF SERIAL
KILLERS AND MASS MURDERERS. IF THEY DEVELOP VAGUE CRIMINAL PROFILES FOR
SERIAL KILLERS AND MASS MURDERS, IT WILL BE HARD TO MAKE APPREHENSION.
WITH SUCH A PROFILE, LAW ENFORCEMENT OFFICERS CANNOT EASILY NARROW DOWN
THE LIST OF SUSPECTS TO IDENTIFY THE SUSPECT WHO COMMITTED THE MASS
MURDER OR A SERIES OF MURDERS (HOLMES & HOLMES 2009). MOREOVER, THEIR
CREATION OF A PROFILE WITH SCANT INFORMATION CAN SLOW THE APPREHENSION OF
SERIAL KILLERS AND MASS MURDERS. FOR EXAMPLE, IF THE PROFILE USES
PSYCHOLOGICAL DESCRIPTIONS ONLY WITHOUT PHYSICAL DESCRIPTIONS LIKE
HEIGHT, IT MAY BE DIFFICULT FOR THE POLICE TO APPREHEND THE SUSPECT.
HOWEVER, DEVELOPING A DETAILED PROFILE WITH SPECIFIC INFORMATION WILL AID
THE APPREHENSION OF THESE CRIMINALS. SUCH A PROFILE CONSISTS OF BEHAVIORAL,
PSYCHOLOGICAL, AND PHYSICAL INFORMATION ABOUT THE CRIMINALS (FRANCESE,
2019). PROFILERS WHO DO IN-DEPTH RESEARCH ABOUT SERIAL CRIMINALS AND MASS
MURDERERS WILL BE ABLE TO GIVE THIS INFORMATION IN A MORE DETAILED WAY.
THE MORE THE INFORMATION IS AVAILABLE, THE HIGHER THE CHANCES OF
DETECTING THE SUSPECTS AND APPREHENDING THEM.
References
Chifflet, P. (2015). 2 QUESTIONING THE VALIDITY OF CRIMINAL PROFILING: AN EVIDENCE-
BASED APPROACH. AUSTRALIAN & NEW ZEALAND JOURNAL OF CRIMINOLOGY, 48(2),
238255. HTTPS://DOI.ORG/10.1177/0004865814530732 HOLMES, R. 1 M., & HOLMES, S. T.
(2009). 3 PROFILING VIOLENT CRIMES: AN INVESTIGATIVE TOOL. Sage Publications.
3 THOUSAND OAKS, CA: 1 SAGE PUBLICATIONS, INC.FRANCESE, S. (2019). 2 CRIMINAL
PROFILING THROUGH MALDI MS BASED TECHNOLOGIES – BREAKING BARRIERS