๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
IT/HADOOP

[23์ผ์ฐจ] Hadoop WordCount . DepartureDelayCount

by GWLEE 2022. 7. 20.

๐Ÿš€2022-07-20 ๐Ÿš€

ํ•˜๋‘กํŒŒ์ผ ์‹œ์Šคํ…œ ๋ฃจํŠธ ๋ฐ‘์— tmp.. ls ํ•˜๋ฉด no such file์ด ๋‚˜์˜ค๋Š” ์ด์œ ๋Š” ls ํ•˜๋ฉด ์—†๋Š” ์ด์œ ๋Š” ์‚ฌ์šฉ์ž๊ณ„์ •์ด ์—†์–ด์„œ ๋ฃจํŠธ๋กœ ์ ‘๊ทผํ•˜๋‹ˆ๊น,, ls๊ฐ€ ์‚ฌ์šฉ์ž ๊ณ„์ •์œผ๋กœ ์ ‘๊ทผํ•˜๊ฒŒ ๋˜์–ด์„œ ๊ณ„์ •์„ ์ง์ ‘ ๋งŒ๋“ค์–ด์ค˜์•ผํ•œ๋‹ค.

๊ธฐ๋ณธ์ ์œผ๋กœ ํ•˜๋‘ก์ด๋ž‘ linux๋ช…๋ น์–ด๊ฐ€ ๋น„์Šท.. ์œ ์‚ฌํ•˜๋‹ค


 

ubuntu22 ๋กœ๊ทธ์ธ

 

mapred historyserver start& ์„œ๋ฒ„ ์‹คํ–‰

2๋ฒˆ์งธ ์ฐฝ์œผ๋กœ ๋“ค์–ด๊ฐ€๊ธฐ

 

jps์ œ์™ธํ•œ 7๊ฐœ ๋œจ๋Š” ๊ฑฐ ํ™•์ธ

 

linux-> firefox -> localhost:50070

 

jps ๊ฐœ์ˆ˜ํ™•์ธ

start-yarn.sh ์‹œ์ž‘

 

vi . ~/.bashrc ์ˆ˜์ • 

 

hdfs dfs -ls / -> ์•ˆ๋‚˜์˜ค๋‹ˆ๊น 

hdfs dfs -mkdir /user ์œ ์ €๋ฅผ ๋”ฐ๋กœ ๋งŒ๋“ค์–ด์„œ ์„ค์ •ํ•ด์ค€๋‹ค.

hdfs dfs -mkdir /user /gyuwon ์ด๋ ‡๊ฒŒ! 

hdfs dfs -ls / ๋‹ค์‹œ ์น˜๋ฉด 

hdfs dfs -ls /user ์œ ์ €๊ฐ€ ๋‚˜์˜จ๋‹ค.

 

 

 

 

rm test๊ฐ€ ์™œ ์•ˆ์ง€์›Œ์งˆ๊นŒ,,,? dir... ํŒŒ์ผ์ด๋ผ์„œ ใ… 

 

 

getmerge ํŒŒ์ผ ์ €์žฅํ•œ๋‹ค.

 

 

 

yarn jar ~/hadoop-3.2.3/share/hadoop/mapreduce-examples-3.2.3.jar wordcount sample-input sample-output 

 

 

jar -tf WordCount-0.1.jar

 

hdfs dfs -cat sample-output/part-t-00000

 ์ž‘์—… ์Šค์ผ€์ค„๋ง ์‹œ์Šคํ…œ์—์„œ ์‚ฌ์šฉ๋˜์–ด ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ์ถœ๋ ฅ๋  ๋•Œ

์ด ๋””๋ ‰ํ† ๋ฆฌ์˜ ๋‚ด์šฉ์— ๋Œ€ํ•œ ํ›„์† ์ฒ˜๋ฆฌ๊ฐ€ ์‹œ์ž‘๋  ์ˆ˜ ์žˆ์Œ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. reducer 00000

 

 

 

 

cat ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ

text - cat์˜ ๊ธฐ๋Šฅ + zip์œผ๋กœ ๋ฌถ์–ด์žˆ๋”๋ผ๋„ ์••์ถ•์„ ์น˜๊ณ  ๋“ค์–ด๊ฐ€์„œ ๋ณด์—ฌ์ค€๋‹ค.

 

 

 

 

 

 

 

jar ํŒŒ์ผ ํ™•์ธ 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 

 

 

 

 

 

https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-core/3.2.3

์—ฌ๊ธฐ ๋“ค์–ด๊ฐ€์„œ maven ๋‚ด์šฉ ๋Œ์–ด์˜ค๊ธฐ 

 

Maven Repository: org.apache.hadoop » hadoop-mapreduce-client-core » 3.2.3

Apache Hadoop MapReduce Core Note: There is a new version for this artifact org.apache.hadoop hadoop-mapreduce-client-core 3.2.3 // https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-core implementation group: 'org.apache.hadoop',

mvnrepository.com

Wordcount/pom.xml

 

 

 

์ ์‹ฌ...


 

alt + F5 update

์—…๋ฐ์ดํŠธ Alt + F5 ๋ˆ„๋ฅธ๋‹ค

 



WordCount

 

 

 

 

 


 


WordCount.java

 

 

 


 

WordCountReducer.java

 

maven ํŒŒ์ผ jar ํŒŒ์ผ๋กœ ๋ฐ”๊พธ๊ธฐ

clean ์ด๋ž‘ build 

 

install ํ•˜๊ณ   success ํ™•์ธํ•˜๊ธฐ 

 

scp....... ๊ธฐ์–ตํ•ด๋‘์ž........ 

 

 

jar -tf WordCount-0.1.jar

 

jar -tf WordCount-0.1.jar

 

yarn jar ./WordCount-0.1.jar com.gyuone.driver.WordCount sample-input sample-output

 

 

hdfs dfs -ls sample-output

 


๐ŸŒŸ jar ํŒŒ์ผ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ• ๐ŸŒŸ

      ใ„ด  ๊ธฐ์–ตํ•˜๊ธฐ...... 

 

jar ํŒŒ์ผ ์‹คํ–‰

hdfs dfs -cat sample-output/part-r-00000

 


AirPerformance

 

 


 

 

pom.xml

clean -> update 

 

Alt+F5 Update.. ์—…๋ฐ์ดํŠธ ํ•˜๊ณ  ์ ์šฉํ•˜๊ธฐ

 


 

 

 

AirlinePerformanceParser.java

package com.gyuone.common;

import org.apache.hadoop.io.Text;

public class AirlinePerformanceParser {
	private int year;
	private int month;
	private int day;

	private int arriveDelayTime = 0;
	private int departureDelayTime = 0;
	private int distance = 0;

	private boolean arriveDelayAvailable = true;
	private boolean departureDelayAvailable = true;
	private boolean distanceAvailable = true;

	private String uniqueCarrier;

	// "YEAR","MONTH","DAY_OF_MONTH","DAY_OF_WEEK","FL_DATE",
	// "UNIQUE_CARRIER","TAIL_NUM","FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN",
	// "ORIGIN_STATE_ABR","DEST_AIRPORT_ID","DEST","DEST_STATE_ABR","CRS_DEP_TIME",
	// "DEP_TIME","DEP_DELAY","DEP_DELAY_NEW","DEP_DEL15","DEP_DELAY_GROUP",
	// "TAXI_OUT","WHEELS_OFF","WHEELS_ON","TAXI_IN","CRS_ARR_TIME",
	// "ARR_TIME","ARR_DELAY","ARR_DELAY_NEW","ARR_DEL15","ARR_DELAY_GROUP",
	// "CANCELLED","CANCELLATION_CODE","DIVERTED","CRS_ELAPSED_TIME","ACTUAL_ELAPSED_TIME",
	// "AIR_TIME","FLIGHTS","DISTANCE","DISTANCE_GROUP","CARRIER_DELAY",
	// "WEATHER_DELAY","NAS_DELAY","SECURITY_DELAY","LATE_AIRCRAFT_DELAY",

	public AirlinePerformanceParser(Text text) {
		try {
			String[] columns = text.toString().split(",");
			year = Integer.parseInt(columns[0]);
			month = Integer.parseInt(columns[1]);
			month = Integer.parseInt(columns[1]);
			uniqueCarrier = columns[5]; // string ์ด๋ผ ํŒŒ์‹ฑ ์•ˆํ•จ

			if (!columns[16].equals("")) { // DEP_DELAY ์˜ index๊ฐ€ 16์ด๋‹ค.
				departureDelayTime = (int)Float.parseFloat(columns[16]); // ๊ฐ•์ œ ํ˜•๋ณ€ํ™˜
			} else {
				departureDelayAvailable = false;
			}

			if (!columns[26].equals("")) { // ARR_DELAY์˜ index๊ฐ€ 26์ด๋‹ค.
				arriveDelayTime = (int)Float.parseFloat(columns[26]); // ๊ฐ•์ œ ํ˜•๋ณ€ํ™˜
			} else {
				arriveDelayAvailable = false;
			}

			if (!columns[37].equals("")) { // DISTANCE์˜ index๊ฐ€ 37์ด๋‹ค.
				distance = (int)Float.parseFloat(columns[37]); // ๊ฐ•์ œ ํ˜•๋ณ€ํ™˜
			} else {
				distanceAvailable = false;
			}

		} catch (Exception e) {
			// TODO: handle exception
		}
	}

	public int getYear() {
		return year;
	}

	public int getMonth() {
		return month;
	}

	public int getDay() {
		return day;
	}

	public int getArriveDelayTime() {
		return arriveDelayTime;
	}

	public int getDepartureDelayTime() {
		return departureDelayTime;
	}

	public int getDistance() {
		return distance;
	}

	public boolean isArriveDelayAvailable() {
		return arriveDelayAvailable;
	}

	public boolean isDistanceAvailable() {
		return distanceAvailable;
	}

	public String getUniqueCarrier() {
		return uniqueCarrier;
	}

}

DepartureDelayCount.java

 

getInstance conf ๊ฐ์ฒด์˜ ์ธ์Šคํ„ด์Šค๊ฐ€ ์˜ค์ง 1๊ฐœ๋งŒ ์ƒ์„ฑ๋˜๋Š” ํŒจํ„ด

 


DelayCountReducer.java

 


DepartureDelayCountMapper.java

 


 

jar ๋“ค์–ด๊ฐ€ ์žˆ๋Š” ๊ฑฐ ํ™•์ธํ•˜๊ธฐ..

jar ์‹คํ–‰์‹œํ‚ค๊ธฐ

 

yarn jar ./AirPerformance-0.1.jar com.gyuone.driver.DepartureDelayCount air-input dep-delay-count

 

 

 

 

hdfs dfs -ls

hdfs dfs -ls dep-delay-count

hdfs dfs -cat dep-delay-count/part-r-00000 

์ด๋ฆ„ ๋ฐ”๊พธ๊ธฐ  -cp ์ด์šฉํ•ด์„œ ... delay-count -> dep-delay-count

 

hdfs dfs -cp delay-count dep-delay-count

hdfs dfs -ls dep-delay-count 

 

 

 โ— ์ข…๋ฃŒํ•˜๋Š” ๋ฐฉ๋ฒ• โ—

 

 

jps์—์„œ JobHistoryServer์— ๋“ค์–ด๊ฐ€๋Š” ์ˆซ์ž = 4767

kill -9 4767

stop-yarn.sh

stop-dfs.sh

jps

ls -l

 

 

sudo shutdown -h

 

๋Œ“๊ธ€