Explain HW3 v HW4 Implications for CT?

scottf200 · Aug 30, 2023

Baldey said:
Tesla does not use any kind of image post-processing, its sensor data directly into the ML model. So when the sensor changes, it is an even bigger difference for the machine learning models to deal with.

FSD is trained on video captured from cars using HW3. Since HW4 has better cameras that look totally different, the neural nets will need to be re-trained on these cameras once enough data has been accumulated.

Your comments don't make any sense because HW4 with better cameras are getting FSD ... but you don't think the images are post processed. How'd they get HW4 vehicles on FSD then.

charliemagpie · Aug 30, 2023

V12 is End to End AI.

It is hard to fathom that the car will learn hundreds of millions of permutations and will have more experience and be a better driver, forward or in reverse, than any human.

And it will continue to improve exponentially.

Doing a pirouette while it's hitching up will not be beyond its capability, lol.

Change is coming so fast. If we are right now, still pondering whether it can simply hitch a trailer, I think we are going to be shocked.

Baldey · Aug 31, 2023

scottf200 said:
Your comments don't make any sense because HW4 with better cameras are getting FSD ... but you don't think the images are post processed. How'd they get HW4 vehicles on FSD then.

Sorry, i'm not sure if i understand your comment either

There must be some confusion.. I did say "any kind of post-proccessing" , so i think the confusion may have been my bad. What i meant is that they disable any post processing done by the manufacturer, and do their own processing of the RAW CMOS sensor data.

I think i remember Musk saying something about having a lot more sensor data available than you see in a typical image, when asked about low light visibility. He said the FSD network is trained on raw sensor data, and not images that humans would see. You could construct an image from that data, but the FSD network "sees" on a level below that.

Here is a good explanation on what happens to the light hitting a digital sensor, in 10 steps before the jpeg. I am not sure which step the FSD network is trained on, but i am pretty sure it is 4, or maybe even 3.
https://photo.stackexchange.com/questions/1455/what-is-raw-technically

TyPope · Aug 31, 2023

So... My Model Y with HW4 now has FSD. There's no reason the Cybertruck won't have FSD on release. We can stop beating this dead horse now.

Baldey · Aug 31, 2023

Here is a 3 hour video from Andrew Karpathy himself (we definitely live in a simulation, with a name like that) on some of the details of how they process camera inputs:

CYBRSMTH · Aug 31, 2023

cvalue13 said:
Can folks more agile with the Tesla HW3/4 and FSD safe help to fill in the gaps on this xwitter discussion going around, in terms of its purported effects on the CT?

Interested in the best for/against implications?

It seems to suggest that because CT will come with HW4, it won’t be shipping with FSD anytime soon?

The quantity and quality of the cameras on HW4 would be worth waiting for FSD. You’ve got a front-facing camera for the first time and two extra cameras on the side. Plus the quality upgrade of HW4 cameras as seen on YouTube videos, like Wham Bam Teslacam are measurable. Plus you still have Auto Pilot.

charliemagpie · Aug 31, 2023

The cameras have better resolution and can see further.

Elon did say the AI can measure the amount of light protons..

I guess AI can determine an object by the way the light proton bounces off it.

(Seems as Sci-fi as much as how Wi-Fi can be used to see through walls.)

..In the least, I figure HW4 can process light better to make the image clearer in the dark.

firsttruck · Sep 1, 2023

CYBRSMTH said:
The quantity and quality of the cameras on HW4 would be worth waiting for FSD. You’ve got a front-facing camera for the first time and two extra cameras on the side. Plus the quality upgrade of HW4 cameras as seen on YouTube videos, like Wham Bam Teslacam are measurable. Plus you still have Auto Pilot.

There is evidence that in the Cybertruck front bumper area there now might be a front camera.

But I do not remember anything about Cybertruck having two extra side cameras.

Do you have a link to info about two extra side cameras?

flowerlandfilms · Sep 1, 2023

I think it's a reasonable analogy (for those of a certain vintage) to compare it to when PC's went colour capable. You could run the old black and white software on the new hardware in some cases, and it got the job done, but it took some time before those same programs became available in versions that supported more than monochrome, and had additional capability that took advantage of the new hardware.

CYBRSMTH · Sep 1, 2023

firsttruck said:
There is evidence that in the Cybertruck front bumper area there now might be a front camera.

But I do not remember anything about Cybertruck having two extra side cameras.

Do you have a link to info about two extra side cameras?

Ah, I got this wrong. There are fewer cameras on HW4 for the Model 3 and Y. Not sure about CT. I was basing this off of the latest HW4 camera comparison on videos like Dirty Tesla. Maybe there was talk about two cameras on the side scuttle to create more of a 360 view, but it was just wishful thinking.

Deleted member 17810 · Sep 1, 2023

Baldey said:
Here is a 3 hour video from Andrew Karpathy himself (we definitely live in a simulation, with a name like that) on some of the details of how they process camera inputs:

*ML = Machine Learning | FSD= Full Self Driving |

This talk, goes against your premise that the raw data is fed into the FSD ML.

Their patents, their explanation all say that the RAW video feed is both cropped, downsampled, and processes before it reaches the FSD ML.

The primary reason being that it's too much data to process RAW camera data. It's also the reason they are building DOJO, the bandwidth transmission can't keep up with the sensor capture yet.

What they are doing is using ML to crop/down sample RAW data into visual coherent data.

The entire premise of the new FSD is that "it's what humans see" .

As camera sensors go up in resolution, they need more processing power. Old camera data in old teslas were downsampled as to not bottle neck the systems. -> dojo ->

redacted

cvalue13 · Sep 1, 2023

Kahpernicus said:
As camera sensors go up in resolution, they need more processing power. Old camera data in old teslas were downsampled as to not bottle neck the systems. -> dojo

And important to make a distinction between what DOJO/NVIDIA clusters are doing - training the model - vs what the vehicle’s onboard hardware is doing - executing the results of that training.

the vehicle’s on board hardware is computing decisions at the edge, locally. this allows its compute to use a lot of the sensor data (because transmission fidelity, latency, and cost, are all fine).

separately, I expect that at some point that large amount of local data is by some suite of rules selected and prioritized for feeding via the cloud back to DOJO/NVIDIA clusters for training instances

that transmission is costly, low fidelity, and high latency (likely hours or days later, when the vehicle is next connected to stable and sufficient wifi)

I’d be interested to hear from anyone with further expertise on how Tesla handles its edge vs cloud based relationship between sensors

my company is in this space

Deleted member 17810 · Sep 1, 2023

cvalue13 said:
And important to make a distinction between what DOJO/NVIDIA clusters are doing - training the model - vs what the vehicle’s onboard hardware is doing - executing the results of that training.

the vehicle’s on board hardware is computing decisions at the edge, locally. this allows its compute to use a lot of the sensor data (because transmission fidelity, latency, and cost, are all fine).

separately, I expect that at some point that large amount of local data is by some suite of rules selected and prioritized for feeding via the cloud back to DOJO/NVIDIA clusters for training instances

that transmission is costly, low fidelity, and high latency (likely hours or days later, when the vehicle is next connected to stable and sufficient wifi)

I’d be interested to hear from anyone with further expertise on how Tesla handles its edge vs cloud based relationship between sensors

I just typed a bunch of stuff that says, the high def stuff gets uploaded and the selective downsampled get used realtime.

but yes the words describe the actions of the motions of the electronics and visual capture devices to later transmit through electronic means.

the entire point of the visual only training is to remove the ahem fuck ups between depth perception, just like people's eyes.

my company is in this space

I hope it's not in the blindspot.

Baldey · Sep 1, 2023

Kahpernicus said:
*ML = Machine Learning | FSD= Full Self Driving |

This talk, goes against your premise that the raw data is fed into the FSD ML.

Their patents, their explanation all say that the RAW video feed is both cropped, downsampled, and processes before it reaches the FSD ML.

The primary reason being that it's too much data to process RAW camera data. It's also the reason they are building DOJO, the bandwidth transmission can't keep up with the sensor capture yet.

What they are doing is using ML to crop/down sample RAW data into visual coherent data.

The entire premise of the new FSD is that "it's what humans see" .

As camera sensors go up in resolution, they need more processing power. Old camera data in old teslas were downsampled as to not bottle neck the systems. -> dojo ->

redacted

Not sure what you mean, or why you are defining basic acronyms.

Karpathy starts off the talk by describing the particular type of RegNet they've termed Hydra, which is a collection of neural nets that extract features from an image at the bottom layer, and try to make sense of those features in a spacial and temporal way at the top. I only have a basic ML certification from udemy, and this PDF on regnets broke my brain. Feel free to attempt: https://arxiv.org/pdf/2101.00590.pdf

He clearly shows that raw data is going directly into the ResNet:

Tesla Cybertruck Explain HW3 v HW4 Implications for CT? 1693579440441

So i fail so see where in this presentation he contradicts my point? Do you have any evidence to back up your claim that they pre-proccess raw data before sending it to the ML models?

I think we might be on the same page, but with a misunderstanding. you yourself said "What they are doing is using ML to crop/down sample RAW data into visual coherent data." . That is basically that i am saying, though this has nothing to do with cropping, downsampling, or compressing. I don't know the specifics, but i am pretty sure they have plenty of bandwidth to process around 8 raw streams in real time. But yes, they use multiple neural nets to translate the raw camera feeds into a vector space, from which they can make driving decisions.

Deleted member 17810 · Sep 1, 2023

It's in the weeds but this is post processing from the sensors:

you said:
Karpathy starts off the talk by describing the particular type of RegNet they've termed Hydra, which is a collection of neural nets that extract features from an image at the bottom layer

*etiquette = conforming to the audience

It's also etiquette* to explain acronyms before delving in.

Explain HW3 v HW4 Implications for CT?

scottf200

Well-known member

charliemagpie

Well-known member

Baldey

Well-known member

TyPope

Well-known member

Baldey

Well-known member

CYBRSMTH

Well-known member

charliemagpie

Well-known member

firsttruck

Well-known member

flowerlandfilms

Well-known member

CYBRSMTH

Well-known member

Deleted member 17810

Guest

cvalue13

Well-known member

Deleted member 17810

Guest

Baldey

Well-known member

Deleted member 17810

Guest

Similar threads