[JS] ์์ด ์ถ์ฝ์ด ๊ด๋ จ ์ ํธ๋ฆฌํฐ ํจ์ ๋ชจ์
๋ฐ์ํ
์ฐธ๊ณ ๋ด์ฉ
์์ด์์ Contraction(์ถ์ฝ/๋จ์ถํ)๊ณผ Abbreviation(์ฝ์ด/์ถ์ฝ์ด)๋ ๋ค๋ฅธ ๊ฐ๋ ์ด๋ค.
- Contraction: ๋ ๋จ์ด๋ฅผ ํ๋๋ก ์ค์ด๊ธฐ ์ํด ์ผ๋ถ ๋ฌธ์๋ฅผ ์๋ตํ๊ณ ์ํฌ์คํธ๋กํผ๋ก ๋์ฒดํ ํํ
e.g.I will
→I'll
,do not
→don't
- Abbreviation: ๋จ์ด๋ ๊ตฌ์ ์ผ๋ถ ๋ฌธ์๋ง์ ์ฌ์ฉํ์ฌ ์ค์ธ ํํ. ์ํฌ์คํธ๋กํผ๋ฅผ ์ฌ์ฉํ์ง ์๋๋ค
e.g.United States
→U.S.
,Doctor
→Dr.
์ถ์ฝ์ ๋ถ๋ฆฌ ๊ธฐ์ค์์ ์ ์ธํ๋ ์ ๊ท์
๋จ์ด ๋ฌธ์๊ฐ ์๋ ๋ฌธ์์ด ๊ธฐ์ค์ผ๋ก ๋ถ๋ฆฌ
// ๋จ์ด ๋ฌธ์๊ฐ ์๋ ๋ฌธ์์ด๊ณผ ์ผ์น
const NonWordCharPattern = /(\W)/g;
const sentence = "I'll make coffee and I've done my homework.";
// ๋จ์ด ๋ฌธ์๊ฐ ์๋ ๋ฌธ์์ด ์์ชฝ์ ๊ณต๋ฐฑ ์ถ๊ฐ e.g. '.' → ' . '
const replaced = sentence.replace(NonWordCharPattern, ' $1 ');
// "I ' ll make coffee and I ' ve done my homework . "
replaced.split(/\s+/);
// ['I', "'", 'll', 'make', 'coffee', 'and', 'I', "'", 've', 'done', 'my', 'homework', '.', '']
\W
๋ฉํ๋ฌธ์๋ ๊ณต๋ฐฑ์ ํฌํจํ ๋จ์ด๋ฌธ์(0-9a-zA-Z_
)๊ฐ ์๋ ๊ฒ์ ๊ฐ๋ฆฌํจ๋ค$1
์ ์ฒซ๋ฒ์งธ ์บก์ฒ ๊ทธ๋ฃน(์๊ดํธ)์ ๊ฐ๋ฆฌํด. ์ ์์์์ ์ผํ,
์ ๊ณต๋ฐฑ์ ํ ๋ฒ์ฉ ์ฐธ์กฐ'Hello, World'.replace(/(\W)/g, ' $1 ')
→'Hello , World'
- ์ฒซ๋ฒ์งธ
$1
์ฐธ์กฐ๊ฐ :,
→,
- ๋๋ฒ์งธ
$1
์ฐธ์กฐ๊ฐ : ๊ณต๋ฐฑ 1๊ฐ → ๊ณต๋ฐฑ 3๊ฐ
- ์ฒซ๋ฒ์งธ
๋จ์ด ๋ฌธ์๊ฐ ์๋๊ฑฐ๋, ์ํฌ์คํธ๋กํผ(')๊ฐ ์๋ ๋ฌธ์์ด ๊ธฐ์ค์ผ๋ก ๋ถ๋ฆฌ
// ๋จ์ด ๋ฌธ์๋ ์ํฌ์คํธ๋กํผ(')๊ฐ ์๋ ๋ชจ๋ ๋ฌธ์์ ์ผ์น
const NonWordCharPattern = /([^\w'])/g;
// ๋จ์ด ๋ฌธ์๋ ์ํฌ์คํธ๋กํผ๊ฐ ์๋ ๋ฌธ์์ด ์์ชฝ์ ๊ณต๋ฐฑ ์ถ๊ฐ e.g. ' ' → ' '
const replaced = "I'll make coffee and I've done my homework.".replace(NonWordCharPattern, " $1 ");
// "I'll make coffee and I've done my homework . "
replaced.split(/\s+/);
// ["I'll", 'make', 'coffee', 'and', "I've", 'done', 'my', 'homework', '.', '']
/([^\w'])/g
: ๋จ์ด ๋ฌธ์๋ ์ํฌ์คํธ๋กํผ('
)๊ฐ ์๋ ๋ชจ๋ ๋ฌธ์์ ์ผ์น (๊ณต๋ฐฑ, ์ผํ ๋ฑ)[]
: ๋ฌธ์ ๊ทธ๋ฃน. ๋๊ดํธ์ ์๋ ๋ฌธ์์ด ์ค ํ๋๋ผ๋ ์ผ์นํ๋ฉด ๋งค์นญ[^]
: ๋ถ์ ๋ฌธ์ ๊ทธ๋ฃน. ๋๊ดํธ์ ์์์ด ์บ๋ฟ(^
) ์ผ ๋ ๋๊ดํธ์ ํด๋นํ์ง ์๋ ๋ฌธ์์ด๋ง ๋งค์นญ
split(/\s+/)
: ํ๋ ์ด์์ ์ฐ์๋ ๊ณต๋ฐฑ์ ๊ธฐ์ค์ผ๋ก ๋ถ๋ฆฌ\s
: ๊ณต๋ฐฑ ๋ฌธ์+
: 1๋ฒ ์ด์ ์ผ์น'Hello , World'.split(/\s+/)
→['Hello', ',', 'World']
- ์ฒซ๋ฒ์งธ ๊ณต๋ฐฑ์ผ๋ก ๋ถ๋ฆฌ ํ
['Hello', ', World']
- ๋๋ฒ์งธ ๊ณต๋ฐฑ์ผ๋ก ๋ถ๋ฆฌ ํ
['Hello', ',' ,'World']
- ์ฒซ๋ฒ์งธ ๊ณต๋ฐฑ์ผ๋ก ๋ถ๋ฆฌ ํ
๋ฌธ์ฅ๋ด ์ถ์ฝ๋ ๋จ์ด์ ์์น ์ธ๋ฑ์ค๋ฅผ ๋ฐํํ๋ findContrIndexes
const ContractionPattern = /\b\w+'\w*\b/;
const findContrIndexes = (arr: string[]) => {
return arr.reduce((acc: number[], cur, i) => {
return ContractionPattern.test(cur) ? acc.concat(i) : acc;
}, []);
};
findContrIndexes([
"I'll",
'make',
'coffee',
'and',
"I've",
'done',
'my',
'homework',
'.',
]);
// ๋ฐํ๊ฐ [0, 4]
I've
, can't
๋ฑ ๋ค์ํ ์ถ์ฝ ์ผ์ด์ค๋ฅผ ์๋ณํ๊ธฐ ์ํด /\b\w+'\w*\b/
์ ๊ท์ ์ฌ์ฉ. \w+'
๋ ์ํฌ์คํธ๋กํผ('
) ๊ธฐ์ค ์์ ์๋ ๋ถ๋ถ์ด๊ณ , \w*
๋ ๋ค์ ์๋ ๋ถ๋ถ.
\b
: ๋จ์ด ๊ฒฝ๊ณ(์ ํน์ ๋ค์ ๋ค๋ฅธ ๋จ์ด ๋ฌธ์๊ฐ ๋ฑ์ฅํ์ง ์๋ ์์น)\w+'
: 1๊ฐ ์ด์์ ์ฐ์๋ ๋จ์ด ๋ฌธ์ ๋ค์ ์ํฌ์คํธ๋กํผ๊ฐ ์๋ ๋ฌธ์์ ์ผ์น\w*
: 0๊ฐ ์ด์์ ์ฐ์๋ ๋จ์ด ๋ฌธ์
๋ฌธ์ฅ๋ด ๊ฐ ๋จ์ด์ ์ถ์ฝ ์ ๋ณด๋ฅผ ๋ฐํํ๋ generateContrMap
๋๋ณด๊ธฐ
export type ExpandedToken = {
id: string;
token: string;
};
export type Contraction = {
originalToken: string; // ์๋ณธ ๋จ์ด
isContr: boolean; // ์ถ์ฝํ ์ฌ๋ถ
expandedTokens: ExpandedToken[]; // ์ถ์ฝ์ ํด์ ํ ๋ฌธ์์ด์ด ๋ด๊ธด ๋ฐฐ์ด
autoExpand: boolean; // ์ถ์ฝ ์๋ ํด์ ์ฌ๋ถ
};
const ContractionPattern = /\b\w+'\w*\b/;
const makeExpandedContrToken = (token = '') => ({
id: uuidv4(),
token,
});
const makeExpandedContrTokens = (count = 2, token = '') => {
return Array.from({ length: count }, () => makeExpandedContrToken(token));
};
const generateContrMap = (tokens: string[]): Contraction[] => {
return tokens.map((word) => {
const isContr = ContractionPattern.test(word);
return {
originalToken: word,
isContr,
expandedTokens: isContr ? makeExpandedContrTokens() : [],
autoExpand: false,
} satisfies Contraction;
});
};
generateContrMap([
"I'll",
'make',
'coffee',
'and',
"I've",
'done',
'my',
'homework',
'.',
]);
// generateContrMap ํจ์ ๋ฐํ๊ฐ Contraction[]
[
{
originalToken: "I'll",
isContr: true,
expandedTokens: [{ id: '...', token: '' }, { id: '...', token: '' }],
autoExpand: false,
},
{
originalToken: 'make',
isContr: false,
expandedTokens: [], // isContr ์์ฑ์ด false์ด๋ฉด ํญ์ ๋น๋ฐฐ์ด
autoExpand: false,
},
// ...
];
isContr
์์ฑ์ดtrue
์ด๋ฉด ์ถ์ฝ์ ํด์ ํ ๋ฌธ์์ด์ด ๋ด๊ธด ๋ฐฐ์ด์expandedTokens
์์ฑ์ ํ ๋นisContr
์์ฑ์ดfalse
์ด๋ฉดexpandedTokens
์์ฑ์ ํญ์ ๋น ๋ฐฐ์ดexpandedTokens[n].token
๊ธฐ๋ณธ๊ฐ์''
๋น ๋ฌธ์์ด
์ถ์ฝ์ ํผ์น ๋ฌธ์์ด์ ๋ฐํํ๋ getTokensWithExpandedContr
const getTokensWithExpandedContr = (contrMap: Contraction[]) => {
return contrMap.reduce((result: string[], item: Contraction) => {
if (item.isContr) {
const expendedTokens = item.expandedTokens
.map(({ token }) => token.trim())
.filter((token) => token.length > 0);
if (expendedTokens.length) return result.concat(expendedTokens);
}
return result.concat(item.originalToken);
}, []);
};
// ์ถ์ฝ ํผ์น๊ธฐ ์
["I'll", 'make', 'coffee', 'and', "I've", 'done', 'my', 'homework', '.']
// ์ถ์ฝ ํผ์น๊ธฐ ํ (getTokensWithExpandedContr ํจ์ ๋ฐํ๊ฐ)
['I', 'will', 'make', 'coffee', 'and', 'I', 'have', 'done', 'my', 'homework', '.']
์ถ์ฝ ๋จ์ด๋ฅผ ํผ์ณ์ฃผ๋ expandContractions
const CONTRACTIONS: Record<string, string> = {
"'ll": ' will',
"'ve": ' have',
"'re": ' are',
"'d": ' would',
"'m": ' am',
"'s": ' is',
"can't": 'cannot',
"couldn't": 'could not',
"shouldn't": 'should not',
"won't": 'will not',
"wouldn't": 'would not',
"doesn't": 'does not',
"don't": 'do not',
"didn't": 'did not',
"n't": ' not',
"ain't": 'am not', // or 'is not', 'are not', 'has not', 'have not' based on context
"aren't": 'are not',
"wasn't": 'was not',
"weren't": 'were not',
"hasn't": 'has not',
"haven't": 'have not',
"isn't": 'is not',
"it's": 'it is', // or 'it has' based on context
"i'm": 'I am',
"i've": 'I have',
"i'd": 'I would', // or 'I had' based on context
"i'll": 'I will',
"you're": 'you are',
"you've": 'you have',
"you'd": 'you would', // or 'you had' based on context
"you'll": 'you will',
"let's": 'let us',
"he's": 'he is', // or 'he has' based on context
"she's": 'she is', // or 'she has' based on context
"they're": 'they are',
"they've": 'they have',
"they'd": 'they would', // or 'they had' based on context
"they'll": 'they will',
};
const ContractionsPattern = new RegExp(
Object.keys(CONTRACTIONS).join('|'),
'g',
);
/* ContractionsPattern ๋ฐํ๊ฐ
/'ll|'ve|'re|'d|'m|'s|can't|couldn't|shouldn't|won't|wouldn't|.../g
*/
const expandContractions = (sentence: string, separator = ' ') => {
return sentence
.replace(ContractionsPattern, (match) => CONTRACTIONS[match])
.split(separator);
};
expandContractions("I'll make coffee and I've done my homework.");
// ['I', 'will', 'make', 'coffee', 'and', 'I', 'have', 'done', 'my', 'homework.']
replace
๋ฉ์๋ 2๋ฒ์งธ ์ธ์replacement
ํจ์๋ ํจํด์ ์ผ์นํ๋ ๋ฌธ์์ด์ ๋ฐ๊ฒฌํ ๋๋ง๋ค ํธ์ถ- ์ ์์ ๊ธฐ์ค
match
ํ๋ผ๋ฏธํฐ๋ก"'ll"
๋ฐ"'ve"
๊ฐ ์ ๋ฌ๋ผ์ 2๋ฒ ํธ์ถ
๊ธ ์์ ์ฌํญ์ ๋ ธ์ ํ์ด์ง์ ๊ฐ์ฅ ๋น ๋ฅด๊ฒ ๋ฐ์๋ฉ๋๋ค. ๋งํฌ๋ฅผ ์ฐธ๊ณ ํด ์ฃผ์ธ์
๋ฐ์ํ
'๐ช Programming' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
๋๊ธ
์ด ๊ธ ๊ณต์ ํ๊ธฐ
-
๊ตฌ๋
ํ๊ธฐ
๊ตฌ๋ ํ๊ธฐ
-
์นด์นด์คํก
์นด์นด์คํก
-
๋ผ์ธ
๋ผ์ธ
-
ํธ์ํฐ
ํธ์ํฐ
-
Facebook
Facebook
-
์นด์นด์ค์คํ ๋ฆฌ
์นด์นด์ค์คํ ๋ฆฌ
-
๋ฐด๋
๋ฐด๋
-
๋ค์ด๋ฒ ๋ธ๋ก๊ทธ
๋ค์ด๋ฒ ๋ธ๋ก๊ทธ
-
Pocket
Pocket
-
Evernote
Evernote
๋ค๋ฅธ ๊ธ
-
[Markdown] GitHub ๋งํฌ๋ค์ด ์์ฑ ๊ฟํ ๋ชจ์
[Markdown] GitHub ๋งํฌ๋ค์ด ์์ฑ ๊ฟํ ๋ชจ์
2024.05.23 -
[JS] ์๋ฐ์คํฌ๋ฆฝํธ ES2023 ๋ถ๋ณ์ฑ ๋ฐฐ์ด ๋ฉ์๋ ํบ์๋ณด๊ธฐ
[JS] ์๋ฐ์คํฌ๋ฆฝํธ ES2023 ๋ถ๋ณ์ฑ ๋ฐฐ์ด ๋ฉ์๋ ํบ์๋ณด๊ธฐ
2024.05.23 -
[Algorithm] ๋ณต์กํ DOM ์์ ๋ก ๋ณด๋ DFS ํ์ ์๊ณ ๋ฆฌ์ฆ
[Algorithm] ๋ณต์กํ DOM ์์ ๋ก ๋ณด๋ DFS ํ์ ์๊ณ ๋ฆฌ์ฆ
2024.05.22 -
[Algorithm] ๋ฐ์ดํฐ ์ถ๊ฐ, ์ญ์ , ์ ๋ ฌ๋ก ๋ณด๋ BFS / DFS ํ์ ์๊ณ ๋ฆฌ์ฆ
[Algorithm] ๋ฐ์ดํฐ ์ถ๊ฐ, ์ญ์ , ์ ๋ ฌ๋ก ๋ณด๋ BFS / DFS ํ์ ์๊ณ ๋ฆฌ์ฆ
2024.05.21